Add vacuum_freeze_table_age GUC option, to control when VACUUM should
[PostgreSQL.git] / doc / src / sgml / maintenance.sgml
blob3c88a52b20fef2d45e1a55922ac6bbe1dd69143d
1 <!-- $PostgreSQL$ -->
3 <chapter id="maintenance">
4 <title>Routine Database Maintenance Tasks</title>
6 <indexterm zone="maintenance">
7 <primary>maintenance</primary>
8 </indexterm>
10 <indexterm zone="maintenance">
11 <primary>routine maintenance</primary>
12 </indexterm>
14 <para>
15 <productname>PostgreSQL</>, like any database software, requires that certain tasks
16 be performed regularly to achieve optimum performance. The tasks
17 discussed here are <emphasis>required</emphasis>, but they
18 are repetitive in nature and can easily be automated using standard
19 tools such as <application>cron</application> scripts or
20 Windows' <application>Task Scheduler</>. But it is the database
21 administrator's responsibility to set up appropriate scripts, and to
22 check that they execute successfully.
23 </para>
25 <para>
26 One obvious maintenance task is creation of backup copies of the data on a
27 regular schedule. Without a recent backup, you have no chance of recovery
28 after a catastrophe (disk failure, fire, mistakenly dropping a critical
29 table, etc.). The backup and recovery mechanisms available in
30 <productname>PostgreSQL</productname> are discussed at length in
31 <xref linkend="backup">.
32 </para>
34 <para>
35 The other main category of maintenance task is periodic <quote>vacuuming</>
36 of the database. This activity is discussed in
37 <xref linkend="routine-vacuuming">. Closely related to this is updating
38 the statistics that will be used by the query planner, as discussed in
39 <xref linkend="vacuum-for-statistics">.
40 </para>
42 <para>
43 Another task that might need periodic attention is log file management.
44 This is discussed in <xref linkend="logfile-maintenance">.
45 </para>
47 <para>
48 <productname>PostgreSQL</productname> is low-maintenance compared
49 to some other database management systems. Nonetheless,
50 appropriate attention to these tasks will go far towards ensuring a
51 pleasant and productive experience with the system.
52 </para>
54 <sect1 id="routine-vacuuming">
55 <title>Routine Vacuuming</title>
57 <indexterm zone="routine-vacuuming">
58 <primary>vacuum</primary>
59 </indexterm>
61 <para>
62 <productname>PostgreSQL</productname> databases require periodic
63 maintenance known as <firstterm>vacuuming</>. For many installations, it
64 is sufficient to let vacuuming be performed by the <firstterm>autovacuum
65 daemon</>, which is described in <xref linkend="autovacuum">. You might
66 need to adjust the autovacuuming parameters described there to obtain best
67 results for your situation. Some database administrators will want to
68 supplement or replace the daemon's activities with manually-managed
69 <command>VACUUM</> commands, which typically are executed according to a
70 schedule by <application>cron</application> or <application>Task
71 Scheduler</> scripts. To set up manually-managed vacuuming properly,
72 it is essential to understand the issues discussed in the next few
73 subsections. Administrators who rely on autovacuuming may still wish
74 to skim this material to help them understand and adjust autovacuuming.
75 </para>
77 <sect2 id="vacuum-basics">
78 <title>Vacuuming Basics</title>
80 <para>
81 <productname>PostgreSQL</productname>'s
82 <xref linkend="sql-vacuum" endterm="sql-vacuum-title"> command has to
83 process each table on a regular basis for several reasons:
85 <orderedlist>
86 <listitem>
87 <simpara>To recover or reuse disk space occupied by updated or deleted
88 rows.</simpara>
89 </listitem>
91 <listitem>
92 <simpara>To update data statistics used by the
93 <productname>PostgreSQL</productname> query planner.</simpara>
94 </listitem>
96 <listitem>
97 <simpara>To protect against loss of very old data due to
98 <firstterm>transaction ID wraparound</>.</simpara>
99 </listitem>
100 </orderedlist>
102 Each of these reasons dictates performing <command>VACUUM</> operations
103 of varying frequency and scope, as explained in the following subsections.
104 </para>
106 <para>
107 There are two variants of <command>VACUUM</>: standard <command>VACUUM</>
108 and <command>VACUUM FULL</>. <command>VACUUM FULL</> can reclaim more
109 disk space but runs much more slowly. Also,
110 the standard form of <command>VACUUM</> can run in parallel with production
111 database operations. (Commands such as <command>SELECT</command>,
112 <command>INSERT</command>, <command>UPDATE</command>, and
113 <command>DELETE</command> will continue to function as normal, though you
114 will not be able to modify the definition of a table with commands such as
115 <command>ALTER TABLE</command> while it is being vacuumed.)
116 <command>VACUUM FULL</> requires exclusive lock on the table it is
117 working on, and therefore cannot be done in parallel with other use
118 of the table. Another disadvantage of <command>VACUUM FULL</> is that
119 while it reduces table size, it does not reduce index size proportionally;
120 in fact it can make indexes <emphasis>larger</>. Generally, therefore,
121 administrators should strive to use standard <command>VACUUM</> and
122 avoid <command>VACUUM FULL</>.
123 </para>
125 <para>
126 <command>VACUUM</command> creates a substantial amount of I/O
127 traffic, which can cause poor performance for other active sessions.
128 There are configuration parameters that can be adjusted to reduce the
129 performance impact of background vacuuming &mdash; see
130 <xref linkend="runtime-config-resource-vacuum-cost">.
131 </para>
132 </sect2>
134 <sect2 id="vacuum-for-space-recovery">
135 <title>Recovering Disk Space</title>
137 <indexterm zone="vacuum-for-space-recovery">
138 <primary>disk space</primary>
139 </indexterm>
141 <para>
142 In <productname>PostgreSQL</productname>, an
143 <command>UPDATE</> or <command>DELETE</> of a row does not
144 immediately remove the old version of the row.
145 This approach is necessary to gain the benefits of multiversion
146 concurrency control (see <xref linkend="mvcc">): the row version
147 must not be deleted while it is still potentially visible to other
148 transactions. But eventually, an outdated or deleted row version is no
149 longer of interest to any transaction. The space it occupies must then be
150 reclaimed for reuse by new rows, to avoid infinite growth of disk
151 space requirements. This is done by running <command>VACUUM</>.
152 </para>
154 <para>
155 The standard form of <command>VACUUM</command> removes dead row
156 versions in tables and indexes and marks the space available for
157 future reuse. However, it will not return the space to the operating
158 system, except in the special case where one or more pages at the
159 end of a table become entirely free and an exclusive table lock can be
160 easily obtained. In contrast, <command>VACUUM FULL</> actively compacts
161 tables by moving row versions to earlier pages. It is thus able to
162 force pages at the end of the table to become entirely free, whereupon
163 it will return them to the operating system. However, if many rows
164 must be moved, this can take a long time. Also, moving a row requires
165 transiently making duplicate index entries for it (the entry pointing
166 to its new location must be made before the old entry can be removed);
167 so moving a lot of rows this way causes severe index bloat.
168 </para>
170 <para>
171 The usual goal of routine vacuuming is to do standard <command>VACUUM</>s
172 often enough to avoid needing <command>VACUUM FULL</>. The
173 autovacuum daemon attempts to work this way, and in fact will
174 never issue <command>VACUUM FULL</>. In this approach, the idea
175 is not to keep tables at their minimum size, but to maintain steady-state
176 usage of disk space: each table occupies space equivalent to its
177 minimum size plus however much space gets used up between vacuumings.
178 Although <command>VACUUM FULL</> can be used to shrink a table back
179 to its minimum size and return the disk space to the operating system,
180 there is not much point in this if the table will just grow again in the
181 future. Thus, moderately-frequent standard <command>VACUUM</> runs are a
182 better approach than infrequent <command>VACUUM FULL</> runs for
183 maintaining heavily-updated tables.
184 </para>
186 <para>
187 Some administrators prefer to schedule vacuuming themselves, for example
188 doing all the work at night when load is low.
189 The difficulty with doing vacuuming according to a fixed schedule
190 is that if a table has an unexpected spike in update activity, it may
191 get bloated to the point that <command>VACUUM FULL</> is really necessary
192 to reclaim space. Using the autovacuum daemon alleviates this problem,
193 since the daemon schedules vacuuming dynamically in response to update
194 activity. It is unwise to disable the daemon completely unless you
195 have an extremely predictable workload. One possible compromise is
196 to set the daemon's parameters so that it will only react to unusually
197 heavy update activity, thus keeping things from getting out of hand,
198 while scheduled <command>VACUUM</>s are expected to do the bulk of the
199 work when the load is typical.
200 </para>
202 <para>
203 For those not using autovacuum, a typical approach is to schedule a
204 database-wide <command>VACUUM</> once a day during a low-usage period,
205 supplemented by more frequent vacuuming of heavily-updated tables as
206 necessary. (Some installations with extremely high update rates vacuum
207 their busiest tables as often as once every few minutes.) If you have
208 multiple databases in a cluster, don't forget to
209 <command>VACUUM</command> each one; the program <xref
210 linkend="app-vacuumdb" endterm="app-vacuumdb-title"> might be helpful.
211 </para>
213 <tip>
214 <para>
215 Neither form of <command>VACUUM</> is entirely satisfactory when
216 a table contains large numbers of dead row versions as a result of
217 massive update or delete activity. If you have such a table and
218 you need to reclaim the excess disk space it occupies, the best
219 way is to use <xref linkend="sql-cluster" endterm="sql-cluster-title">
220 or one of the table-rewriting variants of
221 <xref linkend="sql-altertable" endterm="sql-altertable-title">.
222 These commands rewrite an entire new copy of the table and build
223 new indexes for it. Like <command>VACUUM FULL</>, they require
224 exclusive lock. Note that they also temporarily use extra disk
225 space, since the old copies of the table and indexes can't be
226 released until the new ones are complete. In the worst case where
227 your disk is nearly full, <command>VACUUM FULL</> may be the only
228 workable alternative.
229 </para>
230 </tip>
232 <tip>
233 <para>
234 If you have a table whose entire contents are deleted on a periodic
235 basis, consider doing it with
236 <xref linkend="sql-truncate" endterm="sql-truncate-title"> rather
237 than using <command>DELETE</command> followed by
238 <command>VACUUM</command>. <command>TRUNCATE</command> removes the
239 entire content of the table immediately, without requiring a
240 subsequent <command>VACUUM</command> or <command>VACUUM
241 FULL</command> to reclaim the now-unused disk space.
242 The disadvantage is that strict MVCC semantics are violated.
243 </para>
244 </tip>
245 </sect2>
247 <sect2 id="vacuum-for-statistics">
248 <title>Updating Planner Statistics</title>
250 <indexterm zone="vacuum-for-statistics">
251 <primary>statistics</primary>
252 <secondary>of the planner</secondary>
253 </indexterm>
255 <indexterm zone="vacuum-for-statistics">
256 <primary>ANALYZE</primary>
257 </indexterm>
259 <para>
260 The <productname>PostgreSQL</productname> query planner relies on
261 statistical information about the contents of tables in order to
262 generate good plans for queries. These statistics are gathered by
263 the <xref linkend="sql-analyze" endterm="sql-analyze-title"> command,
264 which can be invoked by itself or
265 as an optional step in <command>VACUUM</>. It is important to have
266 reasonably accurate statistics, otherwise poor choices of plans might
267 degrade database performance.
268 </para>
270 <para>
271 The autovacuum daemon, if enabled, will automatically issue
272 <command>ANALYZE</> commands whenever the content of a table has
273 changed sufficiently. However, administrators might prefer to rely
274 on manually-scheduled <command>ANALYZE</> operations, particularly
275 if it is known that update activity on a table will not affect the
276 statistics of <quote>interesting</> columns. The daemon schedules
277 <command>ANALYZE</> strictly as a function of the number of rows
278 inserted or updated; it has no knowledge of whether that will lead
279 to meaningful statistical changes.
280 </para>
282 <para>
283 As with vacuuming for space recovery, frequent updates of statistics
284 are more useful for heavily-updated tables than for seldom-updated
285 ones. But even for a heavily-updated table, there might be no need for
286 statistics updates if the statistical distribution of the data is
287 not changing much. A simple rule of thumb is to think about how much
288 the minimum and maximum values of the columns in the table change.
289 For example, a <type>timestamp</type> column that contains the time
290 of row update will have a constantly-increasing maximum value as
291 rows are added and updated; such a column will probably need more
292 frequent statistics updates than, say, a column containing URLs for
293 pages accessed on a website. The URL column might receive changes just
294 as often, but the statistical distribution of its values probably
295 changes relatively slowly.
296 </para>
298 <para>
299 It is possible to run <command>ANALYZE</> on specific tables and even
300 just specific columns of a table, so the flexibility exists to update some
301 statistics more frequently than others if your application requires it.
302 In practice, however, it is usually best to just analyze the entire
303 database, because it is a fast operation. <command>ANALYZE</> uses a
304 statistical random sampling of the rows of a table rather than reading
305 every single row.
306 </para>
308 <tip>
309 <para>
310 Although per-column tweaking of <command>ANALYZE</> frequency might not be
311 very productive, you might well find it worthwhile to do per-column
312 adjustment of the level of detail of the statistics collected by
313 <command>ANALYZE</>. Columns that are heavily used in <literal>WHERE</>
314 clauses and have highly irregular data distributions might require a
315 finer-grain data histogram than other columns. See <command>ALTER TABLE
316 SET STATISTICS</>, or change the database-wide default using the <xref
317 linkend="guc-default-statistics-target"> configuration parameter.
318 </para>
319 </tip>
320 </sect2>
322 <sect2 id="vacuum-for-wraparound">
323 <title>Preventing Transaction ID Wraparound Failures</title>
325 <indexterm zone="vacuum-for-wraparound">
326 <primary>transaction ID</primary>
327 <secondary>wraparound</secondary>
328 </indexterm>
330 <para>
331 <productname>PostgreSQL</productname>'s MVCC transaction semantics
332 depend on being able to compare transaction ID (<acronym>XID</>)
333 numbers: a row version with an insertion XID greater than the current
334 transaction's XID is <quote>in the future</> and should not be visible
335 to the current transaction. But since transaction IDs have limited size
336 (32 bits at this writing) a cluster that runs for a long time (more
337 than 4 billion transactions) would suffer <firstterm>transaction ID
338 wraparound</>: the XID counter wraps around to zero, and all of a sudden
339 transactions that were in the past appear to be in the future &mdash; which
340 means their outputs become invisible. In short, catastrophic data loss.
341 (Actually the data is still there, but that's cold comfort if you cannot
342 get at it.) To avoid this, it is necessary to vacuum every table
343 in every database at least once every two billion transactions.
344 </para>
346 <para>
347 The reason that periodic vacuuming solves the problem is that
348 <productname>PostgreSQL</productname> distinguishes a special XID
349 <literal>FrozenXID</>. This XID is always considered older
350 than every normal XID. Normal XIDs are
351 compared using modulo-2<superscript>31</> arithmetic. This means
352 that for every normal XID, there are two billion XIDs that are
353 <quote>older</> and two billion that are <quote>newer</>; another
354 way to say it is that the normal XID space is circular with no
355 endpoint. Therefore, once a row version has been created with a particular
356 normal XID, the row version will appear to be <quote>in the past</> for
357 the next two billion transactions, no matter which normal XID we are
358 talking about. If the row version still exists after more than two billion
359 transactions, it will suddenly appear to be in the future. To
360 prevent data loss, old row versions must be reassigned the XID
361 <literal>FrozenXID</> sometime before they reach the
362 two-billion-transactions-old mark. Once they are assigned this
363 special XID, they will appear to be <quote>in the past</> to all
364 normal transactions regardless of wraparound issues, and so such
365 row versions will be good until deleted, no matter how long that is.
366 This reassignment of old XIDs is handled by <command>VACUUM</>.
367 </para>
369 <para>
370 <command>VACUUM</>'s behavior is controlled by the two configuration
371 parameters: <xref linkend="guc-vacuum-freeze-min-age"> and
372 <xref linkend="guc-vacuum-freeze-table-age">.
373 <varname>vacuum_freeze_table_age</> controls when <command>VACUUM</>
374 performs a full sweep of the table, in order to replace old XID values
375 with <literal>FrozenXID</>. <varname>vacuum_freeze_min_age</>
376 controls how old an XID value has to be before it's replaced with
377 <literal>FrozenXID</>. Larger values of these settings
378 preserve transactional information longer, while smaller values increase
379 the number of transactions that can elapse before the table must be
380 vacuumed again.
381 </para>
383 <para>
384 The maximum time that a table can go unvacuumed is two billion
385 transactions minus the <varname>vacuum_freeze_min_age</> that was used
386 when <command>VACUUM</> last scanned the whole table. If it were to go
387 unvacuumed for longer than
388 that, data loss could result. To ensure that this does not happen,
389 autovacuum is invoked on any table that might contain XIDs older than the
390 age specified by the configuration parameter <xref
391 linkend="guc-autovacuum-freeze-max-age">. (This will happen even if
392 autovacuum is otherwise disabled.)
393 </para>
395 <para>
396 This implies that if a table is not otherwise vacuumed,
397 autovacuum will be invoked on it approximately once every
398 <varname>autovacuum_freeze_max_age</> minus
399 <varname>vacuum_freeze_min_age</> transactions.
400 For tables that are regularly vacuumed for space reclamation purposes,
401 this is of little importance. However, for static tables
402 (including tables that receive inserts, but no updates or deletes),
403 there is no need for vacuuming for space reclamation, and so it can
404 be useful to try to maximize the interval between forced autovacuums
405 on very large static tables. Obviously one can do this either by
406 increasing <varname>autovacuum_freeze_max_age</> or by decreasing
407 <varname>vacuum_freeze_min_age</>.
408 </para>
410 <para>
411 The sole disadvantage of increasing <varname>vacuum_freeze_table_age</>
412 and <varname>autovacuum_freeze_max_age</>
413 is that the <filename>pg_clog</> subdirectory of the database cluster
414 will take more space, because it must store the commit status for all
415 transactions back to the <varname>autovacuum_freeze_max_age</> horizon.
416 The commit status uses two bits per transaction, so if
417 <varname>autovacuum_freeze_max_age</> has its maximum allowed value of
418 a little less than two billion, <filename>pg_clog</> can be expected to
419 grow to about half a gigabyte. If this is trivial compared to your
420 total database size, setting <varname>autovacuum_freeze_max_age</> and
421 <varname>vacuum_freeze_table_age</varname> to their maximum allowed values
422 is recommended. Otherwise, set them depending
423 on what you are willing to allow for <filename>pg_clog</> storage.
424 (The default, 200 million transactions, translates to about 50MB of
425 <filename>pg_clog</> storage.)
426 </para>
428 <para>
429 One disadvantage of decreasing <varname>vacuum_freeze_min_age</> is that
430 it might cause <command>VACUUM</> to do useless work: changing a table row's
431 XID to <literal>FrozenXID</> is a waste of time if the row is modified
432 soon thereafter (causing it to acquire a new XID). So the setting should
433 be large enough that rows are not frozen until they are unlikely to change
434 any more. Another disadvantage of decreasing this setting is
435 that details about exactly which transaction inserted or modified a
436 row will be lost sooner. This information sometimes comes in handy,
437 particularly when trying to analyze what went wrong after a database
438 failure. For these two reasons, decreasing this setting is not
439 recommended except for completely static tables.
440 </para>
442 <para>
443 To track the age of the oldest XIDs in a database,
444 <command>VACUUM</> stores XID
445 statistics in the system tables <structname>pg_class</> and
446 <structname>pg_database</>. In particular,
447 the <structfield>relfrozenxid</> column of a table's
448 <structname>pg_class</> row contains the freeze cutoff XID that was used
449 by the last <command>VACUUM</> for that table. All normal
450 XIDs older than this cutoff XID are guaranteed to have been replaced by
451 <literal>FrozenXID</> within the table. Similarly,
452 the <structfield>datfrozenxid</> column of a database's
453 <structname>pg_database</> row is a lower bound on the normal XIDs
454 appearing in that database &mdash; it is just the minimum of the
455 per-table <structfield>relfrozenxid</> values within the database.
456 A convenient way to
457 examine this information is to execute queries such as:
459 <programlisting>
460 SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';
461 SELECT datname, age(datfrozenxid) FROM pg_database;
462 </programlisting>
464 The <literal>age</> column measures the number of transactions from the
465 cutoff XID to the current transaction's XID. When <command>VACUUM</>
466 scans the whole table, after it's finished <literal>age(relfrozenxid)</>
467 should be a little more than the <varname>vacuum_freeze_min_age</> setting
468 that was used (more by the number of transactions started since the
469 <command>VACUUM</> started).
470 </para>
472 <para>
473 <command>VACUUM</> normally only scans pages that have been modified
474 since last vacuum, but <structfield>relfrozenxid</> can only be advanced
475 when the whole table is scanned. The whole table is scanned when
476 <structfield>relfrozenxid</> is more than
477 <varname>vacuum_freeze_table_age</> transactions old, if
478 <command>VACUUM FREEZE</> command is used, or if all pages happen to
479 require vacuuming to remove dead row versions. If no whole-table-scanning
480 <command>VACUUM</> is issued on the table until
481 <varname>autovacuum_freeze_max_age</> is reached, an autovacuum will soon
482 be forced for the table.
483 </para>
485 <para>
486 If for some reason autovacuum fails to clear old XIDs from a table,
487 the system will begin to emit warning messages like this when the
488 database's oldest XIDs reach ten million transactions from the wraparound
489 point:
491 <programlisting>
492 WARNING: database "mydb" must be vacuumed within 177009986 transactions
493 HINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb".
494 </programlisting>
496 (A manual <command>VACUUM</> should fix the problem, as suggested by the
497 hint; but note that the <command>VACUUM</> must be performed by a
498 superuser, else it will fail to process system catalogs and thus not
499 be able to advance the database's <structfield>datfrozenxid</>.)
500 If these warnings are
501 ignored, the system will shut down and refuse to execute any new
502 transactions once there are fewer than 1 million transactions left
503 until wraparound:
505 <programlisting>
506 ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
507 HINT: Stop the postmaster and use a standalone backend to VACUUM in "mydb".
508 </programlisting>
510 The 1-million-transaction safety margin exists to let the
511 administrator recover without data loss, by manually executing the
512 required <command>VACUUM</> commands. However, since the system will not
513 execute commands once it has gone into the safety shutdown mode,
514 the only way to do this is to stop the server and use a single-user
515 backend to execute <command>VACUUM</>. The shutdown mode is not enforced
516 by a single-user backend. See the <xref linkend="app-postgres"> reference
517 page for details about using a single-user backend.
518 </para>
520 </sect2>
522 <sect2 id="autovacuum">
523 <title id="autovacuum-title">The Autovacuum Daemon</title>
525 <indexterm>
526 <primary>autovacuum</primary>
527 <secondary>general information</secondary>
528 </indexterm>
529 <para>
530 <productname>PostgreSQL</productname> has an optional but highly
531 recommended feature called <firstterm>autovacuum</firstterm>,
532 whose purpose is to automate the execution of
533 <command>VACUUM</command> and <command>ANALYZE </command> commands.
534 When enabled, autovacuum checks for
535 tables that have had a large number of inserted, updated or deleted
536 tuples. These checks use the statistics collection facility;
537 therefore, autovacuum cannot be used unless <xref
538 linkend="guc-track-counts"> is set to <literal>true</literal>.
539 In the default configuration, autovacuuming is enabled and the related
540 configuration parameters are appropriately set.
541 </para>
543 <para>
544 The <quote>autovacuum daemon</> actually consists of multiple processes.
545 There is a persistent daemon process, called the
546 <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
547 <firstterm>autovacuum worker</firstterm> processes for all databases. The
548 launcher will distribute the work across time, but attempt to start one
549 worker on each database every <xref linkend="guc-autovacuum-naptime">
550 seconds. One worker will be launched for each database, with a maximum
551 of <xref linkend="guc-autovacuum-max-workers"> processes running at the
552 same time. If there are more than
553 <xref linkend="guc-autovacuum-max-workers"> databases to be processed,
554 the next database will be processed as soon as the first worker finishes.
555 The worker processes will check each table within its database and
556 execute <command>VACUUM</> and/or <command>ANALYZE</> as needed.
557 </para>
559 <para>
560 The <xref linkend="guc-autovacuum-max-workers"> setting limits how many
561 workers may be running at any time. If several large tables all become
562 eligible for vacuuming in a short amount of time, all autovacuum workers
563 may become occupied with vacuuming those tables for a long period.
564 This would result
565 in other tables and databases not being vacuumed until a worker became
566 available. There is not a limit on how many workers might be in a
567 single database, but workers do try to avoid repeating work that has
568 already been done by other workers. Note that the number of running
569 workers does not count towards the <xref linkend="guc-max-connections"> nor
570 the <xref linkend="guc-superuser-reserved-connections"> limits.
571 </para>
573 <para>
574 Tables whose <structfield>relfrozenxid</> value is more than
575 <varname>autovacuum_freeze_max_age</> transactions old are always
576 vacuumed. Otherwise, if the number of tuples obsoleted since the last
577 <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
578 table is vacuumed. The vacuum threshold is defined as:
579 <programlisting>
580 vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
581 </programlisting>
582 where the vacuum base threshold is
583 <xref linkend="guc-autovacuum-vacuum-threshold">,
584 the vacuum scale factor is
585 <xref linkend="guc-autovacuum-vacuum-scale-factor">,
586 and the number of tuples is
587 <structname>pg_class</structname>.<structfield>reltuples</structfield>.
588 The number of obsolete tuples is obtained from the statistics
589 collector; it is a semi-accurate count updated by each
590 <command>UPDATE</command> and <command>DELETE</command> operation. (It
591 is only semi-accurate because some information might be lost under heavy
592 load.)
593 </para>
595 <para>
596 For analyze, a similar condition is used: the threshold, defined as:
597 <programlisting>
598 analyze threshold = analyze base threshold + analyze scale factor * number of tuples
599 </programlisting>
600 is compared to the total number of tuples inserted or updated
601 since the last <command>ANALYZE</command>.
602 </para>
604 <para>
605 The default thresholds and scale factors are taken from
606 <filename>postgresql.conf</filename>, but it is possible to override them
607 on a table-by-table basis by making entries in the system catalog
608 <link
609 linkend="catalog-pg-autovacuum"><structname>pg_autovacuum</></link>.
610 If a <structname>pg_autovacuum</structname> row exists for a particular
611 table, the settings it specifies are applied; otherwise the global
612 settings are used. See <xref linkend="runtime-config-autovacuum"> for
613 more details on the global settings.
614 </para>
616 <para>
617 Besides the base threshold values and scale factors, there are five
618 more parameters that can be set for each table in
619 <structname>pg_autovacuum</structname>.
620 The first, <structname>pg_autovacuum</>.<structfield>enabled</>,
621 can be set to <literal>false</literal> to instruct the autovacuum daemon
622 to skip that particular table entirely. In this case
623 autovacuum will only touch the table if it must do so
624 to prevent transaction ID wraparound.
625 The next two parameters, the vacuum cost delay
626 (<structname>pg_autovacuum</structname>.<structfield>vac_cost_delay</structfield>)
627 and the vacuum cost limit
628 (<structname>pg_autovacuum</structname>.<structfield>vac_cost_limit</structfield>),
629 are used to set table-specific values for the
630 <xref linkend="runtime-config-resource-vacuum-cost" endterm="runtime-config-resource-vacuum-cost-title">
631 feature.
632 The last two parameters,
633 (<structname>pg_autovacuum</structname>.<structfield>freeze_min_age</structfield>)
635 (<structname>pg_autovacuum</structname>.<structfield>freeze_max_age</structfield>),
636 are used to set table-specific values for
637 <xref linkend="guc-vacuum-freeze-min-age"> and
638 <xref linkend="guc-autovacuum-freeze-max-age"> respectively.
639 </para>
641 <para>
642 If any of the values in <structname>pg_autovacuum</structname>
643 are set to a negative number, or if a row is not present at all in
644 <structname>pg_autovacuum</structname> for any particular table, the
645 corresponding values from <filename>postgresql.conf</filename> are used.
646 </para>
648 <para>
649 There is not currently any support for making
650 <structname>pg_autovacuum</structname> entries, except by doing
651 manual <command>INSERT</>s into the catalog. This feature will be
652 improved in future releases, and it is likely that the catalog
653 definition will change.
654 </para>
656 <caution>
657 <para>
658 The contents of the <structname>pg_autovacuum</structname> system
659 catalog are currently not saved in database dumps created by the
660 tools <application>pg_dump</> and <application>pg_dumpall</>. If
661 you want to preserve them across a dump/reload cycle, make sure
662 you dump the catalog manually.
663 </para>
664 </caution>
666 <para>
667 When multiple workers are running, the cost limit is
668 <quote>balanced</quote> among all the running workers, so that the
669 total impact on the system is the same, regardless of the number
670 of workers actually running.
671 </para>
672 </sect2>
673 </sect1>
676 <sect1 id="routine-reindex">
677 <title>Routine Reindexing</title>
679 <indexterm zone="routine-reindex">
680 <primary>reindex</primary>
681 </indexterm>
683 <para>
684 In some situations it is worthwhile to rebuild indexes periodically
685 with the <xref linkend="sql-reindex" endterm="sql-reindex-title">
686 command.
687 </para>
689 <para>
690 In <productname>PostgreSQL</> releases before 7.4, periodic reindexing
691 was frequently necessary to avoid <quote>index bloat</>, due to lack of
692 internal space reclamation in B-tree indexes. Any situation in which the
693 range of index keys changed over time &mdash; for example, an index on
694 timestamps in a table where old entries are eventually deleted &mdash;
695 would result in bloat, because index pages for no-longer-needed portions
696 of the key range were not reclaimed for re-use. Over time, the index size
697 could become indefinitely much larger than the amount of useful data in it.
698 </para>
700 <para>
701 In <productname>PostgreSQL</> 7.4 and later, index pages that have become
702 completely empty are reclaimed for re-use. There is still a possibility
703 for inefficient use of space: if all but a few index keys on a page have
704 been deleted, the page remains allocated. So a usage pattern in which all
705 but a few keys in each range are eventually deleted will see poor use of
706 space. For such usage patterns, periodic reindexing is recommended.
707 </para>
709 <para>
710 The potential for bloat in non-B-tree indexes has not been well
711 characterized. It is a good idea to keep an eye on the index's physical
712 size when using any non-B-tree index type.
713 </para>
715 <para>
716 Also, for B-tree indexes a freshly-constructed index is somewhat faster to
717 access than one that has been updated many times, because logically
718 adjacent pages are usually also physically adjacent in a newly built index.
719 (This consideration does not currently apply to non-B-tree indexes.) It
720 might be worthwhile to reindex periodically just to improve access speed.
721 </para>
722 </sect1>
725 <sect1 id="logfile-maintenance">
726 <title>Log File Maintenance</title>
728 <indexterm zone="logfile-maintenance">
729 <primary>server log</primary>
730 <secondary>log file maintenance</secondary>
731 </indexterm>
733 <para>
734 It is a good idea to save the database server's log output
735 somewhere, rather than just routing it to <filename>/dev/null</>.
736 The log output is invaluable when it comes time to diagnose
737 problems. However, the log output tends to be voluminous
738 (especially at higher debug levels) and you won't want to save it
739 indefinitely. You need to <quote>rotate</> the log files so that
740 new log files are started and old ones removed after a reasonable
741 period of time.
742 </para>
744 <para>
745 If you simply direct the <systemitem>stderr</> of
746 <command>postgres</command> into a
747 file, you will have log output, but
748 the only way to truncate the log file is to stop and restart
749 the server. This might be OK if you are using
750 <productname>PostgreSQL</productname> in a development environment,
751 but few production servers would find this behavior acceptable.
752 </para>
754 <para>
755 A better approach is to send the server's
756 <systemitem>stderr</> output to some type of log rotation program.
757 There is a built-in log rotation program, which you can use by
758 setting the configuration parameter <literal>logging_collector</> to
759 <literal>true</> in <filename>postgresql.conf</>. The control
760 parameters for this program are described in <xref
761 linkend="runtime-config-logging-where">. You can also use this approach
762 to capture the log data in machine readable CSV format.
763 </para>
765 <para>
766 Alternatively, you might prefer to use an external log rotation
767 program, if you have one that you are already using with other
768 server software. For example, the <application>rotatelogs</application>
769 tool included in the <productname>Apache</productname> distribution
770 can be used with <productname>PostgreSQL</productname>. To do this,
771 just pipe the server's
772 <systemitem>stderr</> output to the desired program.
773 If you start the server with
774 <command>pg_ctl</>, then <systemitem>stderr</>
775 is already redirected to <systemitem>stdout</>, so you just need a
776 pipe command, for example:
778 <programlisting>
779 pg_ctl start | rotatelogs /var/log/pgsql_log 86400
780 </programlisting>
781 </para>
783 <para>
784 Another production-grade approach to managing log output is to
785 send it all to <application>syslog</> and let
786 <application>syslog</> deal with file rotation. To do this, set the
787 configuration parameter <literal>log_destination</> to <literal>syslog</>
788 (to log to <application>syslog</> only) in
789 <filename>postgresql.conf</>. Then you can send a <literal>SIGHUP</literal>
790 signal to the <application>syslog</> daemon whenever you want to force it
791 to start writing a new log file. If you want to automate log
792 rotation, the <application>logrotate</application> program can be
793 configured to work with log files from
794 <application>syslog</application>.
795 </para>
797 <para>
798 On many systems, however, <application>syslog</> is not very reliable,
799 particularly with large log messages; it might truncate or drop messages
800 just when you need them the most. Also, on <productname>Linux</>,
801 <application>syslog</> will sync each message to disk, yielding poor
802 performance. (You can use a <literal>-</> at the start of the file name
803 in the <application>syslog</> configuration file to disable this behavior.)
804 </para>
806 <para>
807 Note that all the solutions described above take care of starting new
808 log files at configurable intervals, but they do not handle deletion
809 of old, no-longer-interesting log files. You will probably want to set
810 up a batch job to periodically delete old log files. Another possibility
811 is to configure the rotation program so that old log files are overwritten
812 cyclically.
813 </para>
814 </sect1>
815 </chapter>