Update autovacuum to use reloptions instead of a system catalog, for
[PostgreSQL.git] / doc / src / sgml / maintenance.sgml
bloba53c7e5dd3e4482b3d97d1bc4fbe8ec2d394a766
1 <!-- $PostgreSQL$ -->
3 <chapter id="maintenance">
4 <title>Routine Database Maintenance Tasks</title>
6 <indexterm zone="maintenance">
7 <primary>maintenance</primary>
8 </indexterm>
10 <indexterm zone="maintenance">
11 <primary>routine maintenance</primary>
12 </indexterm>
14 <para>
15 <productname>PostgreSQL</>, like any database software, requires that certain tasks
16 be performed regularly to achieve optimum performance. The tasks
17 discussed here are <emphasis>required</emphasis>, but they
18 are repetitive in nature and can easily be automated using standard
19 tools such as <application>cron</application> scripts or
20 Windows' <application>Task Scheduler</>. But it is the database
21 administrator's responsibility to set up appropriate scripts, and to
22 check that they execute successfully.
23 </para>
25 <para>
26 One obvious maintenance task is creation of backup copies of the data on a
27 regular schedule. Without a recent backup, you have no chance of recovery
28 after a catastrophe (disk failure, fire, mistakenly dropping a critical
29 table, etc.). The backup and recovery mechanisms available in
30 <productname>PostgreSQL</productname> are discussed at length in
31 <xref linkend="backup">.
32 </para>
34 <para>
35 The other main category of maintenance task is periodic <quote>vacuuming</>
36 of the database. This activity is discussed in
37 <xref linkend="routine-vacuuming">. Closely related to this is updating
38 the statistics that will be used by the query planner, as discussed in
39 <xref linkend="vacuum-for-statistics">.
40 </para>
42 <para>
43 Another task that might need periodic attention is log file management.
44 This is discussed in <xref linkend="logfile-maintenance">.
45 </para>
47 <para>
48 <productname>PostgreSQL</productname> is low-maintenance compared
49 to some other database management systems. Nonetheless,
50 appropriate attention to these tasks will go far towards ensuring a
51 pleasant and productive experience with the system.
52 </para>
54 <sect1 id="routine-vacuuming">
55 <title>Routine Vacuuming</title>
57 <indexterm zone="routine-vacuuming">
58 <primary>vacuum</primary>
59 </indexterm>
61 <para>
62 <productname>PostgreSQL</productname> databases require periodic
63 maintenance known as <firstterm>vacuuming</>. For many installations, it
64 is sufficient to let vacuuming be performed by the <firstterm>autovacuum
65 daemon</>, which is described in <xref linkend="autovacuum">. You might
66 need to adjust the autovacuuming parameters described there to obtain best
67 results for your situation. Some database administrators will want to
68 supplement or replace the daemon's activities with manually-managed
69 <command>VACUUM</> commands, which typically are executed according to a
70 schedule by <application>cron</application> or <application>Task
71 Scheduler</> scripts. To set up manually-managed vacuuming properly,
72 it is essential to understand the issues discussed in the next few
73 subsections. Administrators who rely on autovacuuming may still wish
74 to skim this material to help them understand and adjust autovacuuming.
75 </para>
77 <sect2 id="vacuum-basics">
78 <title>Vacuuming Basics</title>
80 <para>
81 <productname>PostgreSQL</productname>'s
82 <xref linkend="sql-vacuum" endterm="sql-vacuum-title"> command has to
83 process each table on a regular basis for several reasons:
85 <orderedlist>
86 <listitem>
87 <simpara>To recover or reuse disk space occupied by updated or deleted
88 rows.</simpara>
89 </listitem>
91 <listitem>
92 <simpara>To update data statistics used by the
93 <productname>PostgreSQL</productname> query planner.</simpara>
94 </listitem>
96 <listitem>
97 <simpara>To protect against loss of very old data due to
98 <firstterm>transaction ID wraparound</>.</simpara>
99 </listitem>
100 </orderedlist>
102 Each of these reasons dictates performing <command>VACUUM</> operations
103 of varying frequency and scope, as explained in the following subsections.
104 </para>
106 <para>
107 There are two variants of <command>VACUUM</>: standard <command>VACUUM</>
108 and <command>VACUUM FULL</>. <command>VACUUM FULL</> can reclaim more
109 disk space but runs much more slowly. Also,
110 the standard form of <command>VACUUM</> can run in parallel with production
111 database operations. (Commands such as <command>SELECT</command>,
112 <command>INSERT</command>, <command>UPDATE</command>, and
113 <command>DELETE</command> will continue to function as normal, though you
114 will not be able to modify the definition of a table with commands such as
115 <command>ALTER TABLE</command> while it is being vacuumed.)
116 <command>VACUUM FULL</> requires exclusive lock on the table it is
117 working on, and therefore cannot be done in parallel with other use
118 of the table. Another disadvantage of <command>VACUUM FULL</> is that
119 while it reduces table size, it does not reduce index size proportionally;
120 in fact it can make indexes <emphasis>larger</>. Generally, therefore,
121 administrators should strive to use standard <command>VACUUM</> and
122 avoid <command>VACUUM FULL</>.
123 </para>
125 <para>
126 <command>VACUUM</command> creates a substantial amount of I/O
127 traffic, which can cause poor performance for other active sessions.
128 There are configuration parameters that can be adjusted to reduce the
129 performance impact of background vacuuming &mdash; see
130 <xref linkend="runtime-config-resource-vacuum-cost">.
131 </para>
132 </sect2>
134 <sect2 id="vacuum-for-space-recovery">
135 <title>Recovering Disk Space</title>
137 <indexterm zone="vacuum-for-space-recovery">
138 <primary>disk space</primary>
139 </indexterm>
141 <para>
142 In <productname>PostgreSQL</productname>, an
143 <command>UPDATE</> or <command>DELETE</> of a row does not
144 immediately remove the old version of the row.
145 This approach is necessary to gain the benefits of multiversion
146 concurrency control (see <xref linkend="mvcc">): the row version
147 must not be deleted while it is still potentially visible to other
148 transactions. But eventually, an outdated or deleted row version is no
149 longer of interest to any transaction. The space it occupies must then be
150 reclaimed for reuse by new rows, to avoid infinite growth of disk
151 space requirements. This is done by running <command>VACUUM</>.
152 </para>
154 <para>
155 The standard form of <command>VACUUM</command> removes dead row
156 versions in tables and indexes and marks the space available for
157 future reuse. However, it will not return the space to the operating
158 system, except in the special case where one or more pages at the
159 end of a table become entirely free and an exclusive table lock can be
160 easily obtained. In contrast, <command>VACUUM FULL</> actively compacts
161 tables by moving row versions to earlier pages. It is thus able to
162 force pages at the end of the table to become entirely free, whereupon
163 it will return them to the operating system. However, if many rows
164 must be moved, this can take a long time. Also, moving a row requires
165 transiently making duplicate index entries for it (the entry pointing
166 to its new location must be made before the old entry can be removed);
167 so moving a lot of rows this way causes severe index bloat.
168 </para>
170 <para>
171 The usual goal of routine vacuuming is to do standard <command>VACUUM</>s
172 often enough to avoid needing <command>VACUUM FULL</>. The
173 autovacuum daemon attempts to work this way, and in fact will
174 never issue <command>VACUUM FULL</>. In this approach, the idea
175 is not to keep tables at their minimum size, but to maintain steady-state
176 usage of disk space: each table occupies space equivalent to its
177 minimum size plus however much space gets used up between vacuumings.
178 Although <command>VACUUM FULL</> can be used to shrink a table back
179 to its minimum size and return the disk space to the operating system,
180 there is not much point in this if the table will just grow again in the
181 future. Thus, moderately-frequent standard <command>VACUUM</> runs are a
182 better approach than infrequent <command>VACUUM FULL</> runs for
183 maintaining heavily-updated tables.
184 </para>
186 <para>
187 Some administrators prefer to schedule vacuuming themselves, for example
188 doing all the work at night when load is low.
189 The difficulty with doing vacuuming according to a fixed schedule
190 is that if a table has an unexpected spike in update activity, it may
191 get bloated to the point that <command>VACUUM FULL</> is really necessary
192 to reclaim space. Using the autovacuum daemon alleviates this problem,
193 since the daemon schedules vacuuming dynamically in response to update
194 activity. It is unwise to disable the daemon completely unless you
195 have an extremely predictable workload. One possible compromise is
196 to set the daemon's parameters so that it will only react to unusually
197 heavy update activity, thus keeping things from getting out of hand,
198 while scheduled <command>VACUUM</>s are expected to do the bulk of the
199 work when the load is typical.
200 </para>
202 <para>
203 For those not using autovacuum, a typical approach is to schedule a
204 database-wide <command>VACUUM</> once a day during a low-usage period,
205 supplemented by more frequent vacuuming of heavily-updated tables as
206 necessary. (Some installations with extremely high update rates vacuum
207 their busiest tables as often as once every few minutes.) If you have
208 multiple databases in a cluster, don't forget to
209 <command>VACUUM</command> each one; the program <xref
210 linkend="app-vacuumdb" endterm="app-vacuumdb-title"> might be helpful.
211 </para>
213 <tip>
214 <para>
215 Neither form of <command>VACUUM</> is entirely satisfactory when
216 a table contains large numbers of dead row versions as a result of
217 massive update or delete activity. If you have such a table and
218 you need to reclaim the excess disk space it occupies, the best
219 way is to use <xref linkend="sql-cluster" endterm="sql-cluster-title">
220 or one of the table-rewriting variants of
221 <xref linkend="sql-altertable" endterm="sql-altertable-title">.
222 These commands rewrite an entire new copy of the table and build
223 new indexes for it. Like <command>VACUUM FULL</>, they require
224 exclusive lock. Note that they also temporarily use extra disk
225 space, since the old copies of the table and indexes can't be
226 released until the new ones are complete. In the worst case where
227 your disk is nearly full, <command>VACUUM FULL</> may be the only
228 workable alternative.
229 </para>
230 </tip>
232 <tip>
233 <para>
234 If you have a table whose entire contents are deleted on a periodic
235 basis, consider doing it with
236 <xref linkend="sql-truncate" endterm="sql-truncate-title"> rather
237 than using <command>DELETE</command> followed by
238 <command>VACUUM</command>. <command>TRUNCATE</command> removes the
239 entire content of the table immediately, without requiring a
240 subsequent <command>VACUUM</command> or <command>VACUUM
241 FULL</command> to reclaim the now-unused disk space.
242 The disadvantage is that strict MVCC semantics are violated.
243 </para>
244 </tip>
245 </sect2>
247 <sect2 id="vacuum-for-statistics">
248 <title>Updating Planner Statistics</title>
250 <indexterm zone="vacuum-for-statistics">
251 <primary>statistics</primary>
252 <secondary>of the planner</secondary>
253 </indexterm>
255 <indexterm zone="vacuum-for-statistics">
256 <primary>ANALYZE</primary>
257 </indexterm>
259 <para>
260 The <productname>PostgreSQL</productname> query planner relies on
261 statistical information about the contents of tables in order to
262 generate good plans for queries. These statistics are gathered by
263 the <xref linkend="sql-analyze" endterm="sql-analyze-title"> command,
264 which can be invoked by itself or
265 as an optional step in <command>VACUUM</>. It is important to have
266 reasonably accurate statistics, otherwise poor choices of plans might
267 degrade database performance.
268 </para>
270 <para>
271 The autovacuum daemon, if enabled, will automatically issue
272 <command>ANALYZE</> commands whenever the content of a table has
273 changed sufficiently. However, administrators might prefer to rely
274 on manually-scheduled <command>ANALYZE</> operations, particularly
275 if it is known that update activity on a table will not affect the
276 statistics of <quote>interesting</> columns. The daemon schedules
277 <command>ANALYZE</> strictly as a function of the number of rows
278 inserted or updated; it has no knowledge of whether that will lead
279 to meaningful statistical changes.
280 </para>
282 <para>
283 As with vacuuming for space recovery, frequent updates of statistics
284 are more useful for heavily-updated tables than for seldom-updated
285 ones. But even for a heavily-updated table, there might be no need for
286 statistics updates if the statistical distribution of the data is
287 not changing much. A simple rule of thumb is to think about how much
288 the minimum and maximum values of the columns in the table change.
289 For example, a <type>timestamp</type> column that contains the time
290 of row update will have a constantly-increasing maximum value as
291 rows are added and updated; such a column will probably need more
292 frequent statistics updates than, say, a column containing URLs for
293 pages accessed on a website. The URL column might receive changes just
294 as often, but the statistical distribution of its values probably
295 changes relatively slowly.
296 </para>
298 <para>
299 It is possible to run <command>ANALYZE</> on specific tables and even
300 just specific columns of a table, so the flexibility exists to update some
301 statistics more frequently than others if your application requires it.
302 In practice, however, it is usually best to just analyze the entire
303 database, because it is a fast operation. <command>ANALYZE</> uses a
304 statistical random sampling of the rows of a table rather than reading
305 every single row.
306 </para>
308 <tip>
309 <para>
310 Although per-column tweaking of <command>ANALYZE</> frequency might not be
311 very productive, you might well find it worthwhile to do per-column
312 adjustment of the level of detail of the statistics collected by
313 <command>ANALYZE</>. Columns that are heavily used in <literal>WHERE</>
314 clauses and have highly irregular data distributions might require a
315 finer-grain data histogram than other columns. See <command>ALTER TABLE
316 SET STATISTICS</>, or change the database-wide default using the <xref
317 linkend="guc-default-statistics-target"> configuration parameter.
318 </para>
319 </tip>
320 </sect2>
322 <sect2 id="vacuum-for-wraparound">
323 <title>Preventing Transaction ID Wraparound Failures</title>
325 <indexterm zone="vacuum-for-wraparound">
326 <primary>transaction ID</primary>
327 <secondary>wraparound</secondary>
328 </indexterm>
330 <para>
331 <productname>PostgreSQL</productname>'s MVCC transaction semantics
332 depend on being able to compare transaction ID (<acronym>XID</>)
333 numbers: a row version with an insertion XID greater than the current
334 transaction's XID is <quote>in the future</> and should not be visible
335 to the current transaction. But since transaction IDs have limited size
336 (32 bits at this writing) a cluster that runs for a long time (more
337 than 4 billion transactions) would suffer <firstterm>transaction ID
338 wraparound</>: the XID counter wraps around to zero, and all of a sudden
339 transactions that were in the past appear to be in the future &mdash; which
340 means their outputs become invisible. In short, catastrophic data loss.
341 (Actually the data is still there, but that's cold comfort if you cannot
342 get at it.) To avoid this, it is necessary to vacuum every table
343 in every database at least once every two billion transactions.
344 </para>
346 <para>
347 The reason that periodic vacuuming solves the problem is that
348 <productname>PostgreSQL</productname> distinguishes a special XID
349 <literal>FrozenXID</>. This XID is always considered older
350 than every normal XID. Normal XIDs are
351 compared using modulo-2<superscript>31</> arithmetic. This means
352 that for every normal XID, there are two billion XIDs that are
353 <quote>older</> and two billion that are <quote>newer</>; another
354 way to say it is that the normal XID space is circular with no
355 endpoint. Therefore, once a row version has been created with a particular
356 normal XID, the row version will appear to be <quote>in the past</> for
357 the next two billion transactions, no matter which normal XID we are
358 talking about. If the row version still exists after more than two billion
359 transactions, it will suddenly appear to be in the future. To
360 prevent data loss, old row versions must be reassigned the XID
361 <literal>FrozenXID</> sometime before they reach the
362 two-billion-transactions-old mark. Once they are assigned this
363 special XID, they will appear to be <quote>in the past</> to all
364 normal transactions regardless of wraparound issues, and so such
365 row versions will be good until deleted, no matter how long that is.
366 This reassignment of old XIDs is handled by <command>VACUUM</>.
367 </para>
369 <para>
370 <command>VACUUM</>'s behavior is controlled by the two configuration
371 parameters: <xref linkend="guc-vacuum-freeze-min-age"> and
372 <xref linkend="guc-vacuum-freeze-table-age">.
373 <varname>vacuum_freeze_table_age</> controls when <command>VACUUM</>
374 performs a full sweep of the table, in order to replace old XID values
375 with <literal>FrozenXID</>. <varname>vacuum_freeze_min_age</>
376 controls how old an XID value has to be before it's replaced with
377 <literal>FrozenXID</>. Larger values of these settings
378 preserve transactional information longer, while smaller values increase
379 the number of transactions that can elapse before the table must be
380 vacuumed again.
381 </para>
383 <para>
384 The maximum time that a table can go unvacuumed is two billion
385 transactions minus the <varname>vacuum_freeze_min_age</> that was used
386 when <command>VACUUM</> last scanned the whole table. If it were to go
387 unvacuumed for longer than
388 that, data loss could result. To ensure that this does not happen,
389 autovacuum is invoked on any table that might contain XIDs older than the
390 age specified by the configuration parameter <xref
391 linkend="guc-autovacuum-freeze-max-age">. (This will happen even if
392 autovacuum is otherwise disabled.)
393 </para>
395 <para>
396 This implies that if a table is not otherwise vacuumed,
397 autovacuum will be invoked on it approximately once every
398 <varname>autovacuum_freeze_max_age</> minus
399 <varname>vacuum_freeze_min_age</> transactions.
400 For tables that are regularly vacuumed for space reclamation purposes,
401 this is of little importance. However, for static tables
402 (including tables that receive inserts, but no updates or deletes),
403 there is no need for vacuuming for space reclamation, and so it can
404 be useful to try to maximize the interval between forced autovacuums
405 on very large static tables. Obviously one can do this either by
406 increasing <varname>autovacuum_freeze_max_age</> or by decreasing
407 <varname>vacuum_freeze_min_age</>.
408 </para>
410 <para>
411 The sole disadvantage of increasing <varname>vacuum_freeze_table_age</>
412 and <varname>autovacuum_freeze_max_age</>
413 is that the <filename>pg_clog</> subdirectory of the database cluster
414 will take more space, because it must store the commit status for all
415 transactions back to the <varname>autovacuum_freeze_max_age</> horizon.
416 The commit status uses two bits per transaction, so if
417 <varname>autovacuum_freeze_max_age</> has its maximum allowed value of
418 a little less than two billion, <filename>pg_clog</> can be expected to
419 grow to about half a gigabyte. If this is trivial compared to your
420 total database size, setting <varname>autovacuum_freeze_max_age</> and
421 <varname>vacuum_freeze_table_age</varname> to their maximum allowed values
422 is recommended. Otherwise, set them depending
423 on what you are willing to allow for <filename>pg_clog</> storage.
424 (The default, 200 million transactions, translates to about 50MB of
425 <filename>pg_clog</> storage.)
426 </para>
428 <para>
429 One disadvantage of decreasing <varname>vacuum_freeze_min_age</> is that
430 it might cause <command>VACUUM</> to do useless work: changing a table row's
431 XID to <literal>FrozenXID</> is a waste of time if the row is modified
432 soon thereafter (causing it to acquire a new XID). So the setting should
433 be large enough that rows are not frozen until they are unlikely to change
434 any more. Another disadvantage of decreasing this setting is
435 that details about exactly which transaction inserted or modified a
436 row will be lost sooner. This information sometimes comes in handy,
437 particularly when trying to analyze what went wrong after a database
438 failure. For these two reasons, decreasing this setting is not
439 recommended except for completely static tables.
440 </para>
442 <para>
443 To track the age of the oldest XIDs in a database,
444 <command>VACUUM</> stores XID
445 statistics in the system tables <structname>pg_class</> and
446 <structname>pg_database</>. In particular,
447 the <structfield>relfrozenxid</> column of a table's
448 <structname>pg_class</> row contains the freeze cutoff XID that was used
449 by the last <command>VACUUM</> for that table. All normal
450 XIDs older than this cutoff XID are guaranteed to have been replaced by
451 <literal>FrozenXID</> within the table. Similarly,
452 the <structfield>datfrozenxid</> column of a database's
453 <structname>pg_database</> row is a lower bound on the normal XIDs
454 appearing in that database &mdash; it is just the minimum of the
455 per-table <structfield>relfrozenxid</> values within the database.
456 A convenient way to
457 examine this information is to execute queries such as:
459 <programlisting>
460 SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';
461 SELECT datname, age(datfrozenxid) FROM pg_database;
462 </programlisting>
464 The <literal>age</> column measures the number of transactions from the
465 cutoff XID to the current transaction's XID. When <command>VACUUM</>
466 scans the whole table, after it's finished <literal>age(relfrozenxid)</>
467 should be a little more than the <varname>vacuum_freeze_min_age</> setting
468 that was used (more by the number of transactions started since the
469 <command>VACUUM</> started).
470 </para>
472 <para>
473 <command>VACUUM</> normally only scans pages that have been modified
474 since last vacuum, but <structfield>relfrozenxid</> can only be advanced
475 when the whole table is scanned. The whole table is scanned when
476 <structfield>relfrozenxid</> is more than
477 <varname>vacuum_freeze_table_age</> transactions old, if
478 <command>VACUUM FREEZE</> command is used, or if all pages happen to
479 require vacuuming to remove dead row versions. If no whole-table-scanning
480 <command>VACUUM</> is issued on the table until
481 <varname>autovacuum_freeze_max_age</> is reached, an autovacuum will soon
482 be forced for the table.
483 </para>
485 <para>
486 If for some reason autovacuum fails to clear old XIDs from a table,
487 the system will begin to emit warning messages like this when the
488 database's oldest XIDs reach ten million transactions from the wraparound
489 point:
491 <programlisting>
492 WARNING: database "mydb" must be vacuumed within 177009986 transactions
493 HINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb".
494 </programlisting>
496 (A manual <command>VACUUM</> should fix the problem, as suggested by the
497 hint; but note that the <command>VACUUM</> must be performed by a
498 superuser, else it will fail to process system catalogs and thus not
499 be able to advance the database's <structfield>datfrozenxid</>.)
500 If these warnings are
501 ignored, the system will shut down and refuse to execute any new
502 transactions once there are fewer than 1 million transactions left
503 until wraparound:
505 <programlisting>
506 ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
507 HINT: Stop the postmaster and use a standalone backend to VACUUM in "mydb".
508 </programlisting>
510 The 1-million-transaction safety margin exists to let the
511 administrator recover without data loss, by manually executing the
512 required <command>VACUUM</> commands. However, since the system will not
513 execute commands once it has gone into the safety shutdown mode,
514 the only way to do this is to stop the server and use a single-user
515 backend to execute <command>VACUUM</>. The shutdown mode is not enforced
516 by a single-user backend. See the <xref linkend="app-postgres"> reference
517 page for details about using a single-user backend.
518 </para>
520 </sect2>
522 <sect2 id="autovacuum">
523 <title id="autovacuum-title">The Autovacuum Daemon</title>
525 <indexterm>
526 <primary>autovacuum</primary>
527 <secondary>general information</secondary>
528 </indexterm>
529 <para>
530 <productname>PostgreSQL</productname> has an optional but highly
531 recommended feature called <firstterm>autovacuum</firstterm>,
532 whose purpose is to automate the execution of
533 <command>VACUUM</command> and <command>ANALYZE </command> commands.
534 When enabled, autovacuum checks for
535 tables that have had a large number of inserted, updated or deleted
536 tuples. These checks use the statistics collection facility;
537 therefore, autovacuum cannot be used unless <xref
538 linkend="guc-track-counts"> is set to <literal>true</literal>.
539 In the default configuration, autovacuuming is enabled and the related
540 configuration parameters are appropriately set.
541 </para>
543 <para>
544 The <quote>autovacuum daemon</> actually consists of multiple processes.
545 There is a persistent daemon process, called the
546 <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
547 <firstterm>autovacuum worker</firstterm> processes for all databases. The
548 launcher will distribute the work across time, but attempt to start one
549 worker on each database every <xref linkend="guc-autovacuum-naptime">
550 seconds. One worker will be launched for each database, with a maximum
551 of <xref linkend="guc-autovacuum-max-workers"> processes running at the
552 same time. If there are more than
553 <xref linkend="guc-autovacuum-max-workers"> databases to be processed,
554 the next database will be processed as soon as the first worker finishes.
555 The worker processes will check each table within its database and
556 execute <command>VACUUM</> and/or <command>ANALYZE</> as needed.
557 </para>
559 <para>
560 The <xref linkend="guc-autovacuum-max-workers"> setting limits how many
561 workers may be running at any time. If several large tables all become
562 eligible for vacuuming in a short amount of time, all autovacuum workers
563 may become occupied with vacuuming those tables for a long period.
564 This would result
565 in other tables and databases not being vacuumed until a worker became
566 available. There is not a limit on how many workers might be in a
567 single database, but workers do try to avoid repeating work that has
568 already been done by other workers. Note that the number of running
569 workers does not count towards the <xref linkend="guc-max-connections"> nor
570 the <xref linkend="guc-superuser-reserved-connections"> limits.
571 </para>
573 <para>
574 Tables whose <structfield>relfrozenxid</> value is more than
575 <varname>autovacuum_freeze_max_age</> transactions old are always
576 vacuumed (this also applies to those tables whose freeze max age has
577 been modified via storage parameters; see below). Otherwise, if the
578 number of tuples obsoleted since the last
579 <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
580 table is vacuumed. The vacuum threshold is defined as:
581 <programlisting>
582 vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
583 </programlisting>
584 where the vacuum base threshold is
585 <xref linkend="guc-autovacuum-vacuum-threshold">,
586 the vacuum scale factor is
587 <xref linkend="guc-autovacuum-vacuum-scale-factor">,
588 and the number of tuples is
589 <structname>pg_class</structname>.<structfield>reltuples</structfield>.
590 The number of obsolete tuples is obtained from the statistics
591 collector; it is a semi-accurate count updated by each
592 <command>UPDATE</command> and <command>DELETE</command> operation. (It
593 is only semi-accurate because some information might be lost under heavy
594 load.)
595 </para>
597 <para>
598 For analyze, a similar condition is used: the threshold, defined as:
599 <programlisting>
600 analyze threshold = analyze base threshold + analyze scale factor * number of tuples
601 </programlisting>
602 is compared to the total number of tuples inserted or updated
603 since the last <command>ANALYZE</command>.
604 </para>
606 <para>
607 The default thresholds and scale factors are taken from
608 <filename>postgresql.conf</filename>, but it is possible to override them
609 on a table-by-table basis; see
610 <xref linkend="sql-createtable-storage-parameters"
611 endterm="sql-createtable-storage-parameters-title"> for more information.
612 If a setting
613 has been changed via storage parameters, that value is used; otherwise the
614 global settings are used. See <xref linkend="runtime-config-autovacuum"> for
615 more details on the global settings.
616 </para>
618 <para>
619 Besides the base threshold values and scale factors, there are six
620 more autovacuum parameters that can be set for each table via
621 storage parameters.
622 The first parameter, <literal>autovacuum_enabled</>,
623 can be set to <literal>false</literal> to instruct the autovacuum daemon
624 to skip that particular table entirely. In this case
625 autovacuum will only touch the table if it must do so
626 to prevent transaction ID wraparound.
627 Another two parameters,
628 <literal>autovacuum_vacuum_cost_delay</literal> and
629 <literal>autovacuum_vacuum_cost_limit</literal>, are used to set
630 table-specific values for the
631 <xref linkend="runtime-config-resource-vacuum-cost"
632 endterm="runtime-config-resource-vacuum-cost-title">
633 feature.
634 <literal>autovacuum_freeze_min_age</literal>,
635 <literal>autovacuum_freeze_max_age</literal> and
636 <literal>autovacuum_freeze_table_age</literal> are used to set
637 values for <xref linkend="guc-vacuum-freeze-min-age">,
638 <xref linkend="guc-autovacuum-freeze-max-age"> and
639 <xref linkend="guc-vacuum-freeze-table-age"> respectively.
640 </para>
642 <para>
643 When multiple workers are running, the cost limit is
644 <quote>balanced</quote> among all the running workers, so that the
645 total impact on the system is the same, regardless of the number
646 of workers actually running.
647 </para>
648 </sect2>
649 </sect1>
652 <sect1 id="routine-reindex">
653 <title>Routine Reindexing</title>
655 <indexterm zone="routine-reindex">
656 <primary>reindex</primary>
657 </indexterm>
659 <para>
660 In some situations it is worthwhile to rebuild indexes periodically
661 with the <xref linkend="sql-reindex" endterm="sql-reindex-title">
662 command.
663 </para>
665 <para>
666 In <productname>PostgreSQL</> releases before 7.4, periodic reindexing
667 was frequently necessary to avoid <quote>index bloat</>, due to lack of
668 internal space reclamation in B-tree indexes. Any situation in which the
669 range of index keys changed over time &mdash; for example, an index on
670 timestamps in a table where old entries are eventually deleted &mdash;
671 would result in bloat, because index pages for no-longer-needed portions
672 of the key range were not reclaimed for re-use. Over time, the index size
673 could become indefinitely much larger than the amount of useful data in it.
674 </para>
676 <para>
677 In <productname>PostgreSQL</> 7.4 and later, index pages that have become
678 completely empty are reclaimed for re-use. There is still a possibility
679 for inefficient use of space: if all but a few index keys on a page have
680 been deleted, the page remains allocated. So a usage pattern in which all
681 but a few keys in each range are eventually deleted will see poor use of
682 space. For such usage patterns, periodic reindexing is recommended.
683 </para>
685 <para>
686 The potential for bloat in non-B-tree indexes has not been well
687 characterized. It is a good idea to keep an eye on the index's physical
688 size when using any non-B-tree index type.
689 </para>
691 <para>
692 Also, for B-tree indexes a freshly-constructed index is somewhat faster to
693 access than one that has been updated many times, because logically
694 adjacent pages are usually also physically adjacent in a newly built index.
695 (This consideration does not currently apply to non-B-tree indexes.) It
696 might be worthwhile to reindex periodically just to improve access speed.
697 </para>
698 </sect1>
701 <sect1 id="logfile-maintenance">
702 <title>Log File Maintenance</title>
704 <indexterm zone="logfile-maintenance">
705 <primary>server log</primary>
706 <secondary>log file maintenance</secondary>
707 </indexterm>
709 <para>
710 It is a good idea to save the database server's log output
711 somewhere, rather than just routing it to <filename>/dev/null</>.
712 The log output is invaluable when it comes time to diagnose
713 problems. However, the log output tends to be voluminous
714 (especially at higher debug levels) and you won't want to save it
715 indefinitely. You need to <quote>rotate</> the log files so that
716 new log files are started and old ones removed after a reasonable
717 period of time.
718 </para>
720 <para>
721 If you simply direct the <systemitem>stderr</> of
722 <command>postgres</command> into a
723 file, you will have log output, but
724 the only way to truncate the log file is to stop and restart
725 the server. This might be OK if you are using
726 <productname>PostgreSQL</productname> in a development environment,
727 but few production servers would find this behavior acceptable.
728 </para>
730 <para>
731 A better approach is to send the server's
732 <systemitem>stderr</> output to some type of log rotation program.
733 There is a built-in log rotation program, which you can use by
734 setting the configuration parameter <literal>logging_collector</> to
735 <literal>true</> in <filename>postgresql.conf</>. The control
736 parameters for this program are described in <xref
737 linkend="runtime-config-logging-where">. You can also use this approach
738 to capture the log data in machine readable CSV format.
739 </para>
741 <para>
742 Alternatively, you might prefer to use an external log rotation
743 program, if you have one that you are already using with other
744 server software. For example, the <application>rotatelogs</application>
745 tool included in the <productname>Apache</productname> distribution
746 can be used with <productname>PostgreSQL</productname>. To do this,
747 just pipe the server's
748 <systemitem>stderr</> output to the desired program.
749 If you start the server with
750 <command>pg_ctl</>, then <systemitem>stderr</>
751 is already redirected to <systemitem>stdout</>, so you just need a
752 pipe command, for example:
754 <programlisting>
755 pg_ctl start | rotatelogs /var/log/pgsql_log 86400
756 </programlisting>
757 </para>
759 <para>
760 Another production-grade approach to managing log output is to
761 send it all to <application>syslog</> and let
762 <application>syslog</> deal with file rotation. To do this, set the
763 configuration parameter <literal>log_destination</> to <literal>syslog</>
764 (to log to <application>syslog</> only) in
765 <filename>postgresql.conf</>. Then you can send a <literal>SIGHUP</literal>
766 signal to the <application>syslog</> daemon whenever you want to force it
767 to start writing a new log file. If you want to automate log
768 rotation, the <application>logrotate</application> program can be
769 configured to work with log files from
770 <application>syslog</application>.
771 </para>
773 <para>
774 On many systems, however, <application>syslog</> is not very reliable,
775 particularly with large log messages; it might truncate or drop messages
776 just when you need them the most. Also, on <productname>Linux</>,
777 <application>syslog</> will sync each message to disk, yielding poor
778 performance. (You can use a <literal>-</> at the start of the file name
779 in the <application>syslog</> configuration file to disable this behavior.)
780 </para>
782 <para>
783 Note that all the solutions described above take care of starting new
784 log files at configurable intervals, but they do not handle deletion
785 of old, no-longer-interesting log files. You will probably want to set
786 up a batch job to periodically delete old log files. Another possibility
787 is to configure the rotation program so that old log files are overwritten
788 cyclically.
789 </para>
790 </sect1>
791 </chapter>