Make LC_COLLATE and LC_CTYPE database-level settings. Collation and
[PostgreSQL.git] / doc / src / sgml / ref / analyze.sgml
blobb1e198180f3e5875588758f16f83e975e95fa23d
1 <!--
2 $PostgreSQL$
3 PostgreSQL documentation
4 -->
6 <refentry id="SQL-ANALYZE">
7 <refmeta>
8 <refentrytitle id="sql-analyze-title">ANALYZE</refentrytitle>
9 <refmiscinfo>SQL - Language Statements</refmiscinfo>
10 </refmeta>
12 <refnamediv>
13 <refname>ANALYZE</refname>
14 <refpurpose>collect statistics about a database</refpurpose>
15 </refnamediv>
17 <indexterm zone="sql-analyze">
18 <primary>ANALYZE</primary>
19 </indexterm>
21 <refsynopsisdiv>
22 <synopsis>
23 ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <replaceable class="PARAMETER">column</replaceable> [, ...] ) ] ]
24 </synopsis>
25 </refsynopsisdiv>
27 <refsect1>
28 <title>Description</title>
30 <para>
31 <command>ANALYZE</command> collects statistics about the contents
32 of tables in the database, and stores the results in the <link
33 linkend="catalog-pg-statistic"><structname>pg_statistic</></>
34 system catalog. Subsequently, the query planner uses these
35 statistics to help determine the most efficient execution plans for
36 queries.
37 </para>
39 <para>
40 With no parameter, <command>ANALYZE</command> examines every table in the
41 current database. With a parameter, <command>ANALYZE</command> examines
42 only that table. It is further possible to give a list of column names,
43 in which case only the statistics for those columns are collected.
44 </para>
45 </refsect1>
47 <refsect1>
48 <title>Parameters</title>
50 <variablelist>
51 <varlistentry>
52 <term><literal>VERBOSE</literal></term>
53 <listitem>
54 <para>
55 Enables display of progress messages.
56 </para>
57 </listitem>
58 </varlistentry>
60 <varlistentry>
61 <term><replaceable class="PARAMETER">table</replaceable></term>
62 <listitem>
63 <para>
64 The name (possibly schema-qualified) of a specific table to
65 analyze. Defaults to all tables in the current database.
66 </para>
67 </listitem>
68 </varlistentry>
70 <varlistentry>
71 <term><replaceable class="PARAMETER">column</replaceable></term>
72 <listitem>
73 <para>
74 The name of a specific column to analyze. Defaults to all columns.
75 </para>
76 </listitem>
77 </varlistentry>
78 </variablelist>
79 </refsect1>
81 <refsect1>
82 <title>Outputs</title>
84 <para>
85 When <literal>VERBOSE</> is specified, <command>ANALYZE</> emits
86 progress messages to indicate which table is currently being
87 processed. Various statistics about the tables are printed as well.
88 </para>
89 </refsect1>
91 <refsect1>
92 <title>Notes</title>
94 <para>
95 In the default <productname>PostgreSQL</productname> configuration,
96 <xref linkend="autovacuum" endterm="autovacuum-title">
97 takes care of automatic analyzing of tables when they are first loaded
98 with data, and as they change throughout regular operation.
99 When autovacuum is disabled,
100 it is a good idea to run <command>ANALYZE</command> periodically, or
101 just after making major changes in the contents of a table. Accurate
102 statistics will help the planner to choose the most appropriate query
103 plan, and thereby improve the speed of query processing. A common
104 strategy is to run <xref linkend="sql-vacuum" endterm="sql-vacuum-title">
105 and <command>ANALYZE</command> once a day during a low-usage time of day.
106 </para>
108 <para>
109 <command>ANALYZE</command>
110 requires only a read lock on the target table, so it can run in
111 parallel with other activity on the table.
112 </para>
114 <para>
115 The statistics collected by <command>ANALYZE</command> usually
116 include a list of some of the most common values in each column and
117 a histogram showing the approximate data distribution in each
118 column. One or both of these can be omitted if
119 <command>ANALYZE</command> deems them uninteresting (for example,
120 in a unique-key column, there are no common values) or if the
121 column data type does not support the appropriate operators. There
122 is more information about the statistics in <xref
123 linkend="maintenance">.
124 </para>
126 <para>
127 For large tables, <command>ANALYZE</command> takes a random sample
128 of the table contents, rather than examining every row. This
129 allows even very large tables to be analyzed in a small amount of
130 time. Note, however, that the statistics are only approximate, and
131 will change slightly each time <command>ANALYZE</command> is run,
132 even if the actual table contents did not change. This might result
133 in small changes in the planner's estimated costs shown by
134 <xref linkend="sql-explain" endterm="sql-explain-title">. In rare situations, this
135 non-determinism will cause the query optimizer to choose a
136 different query plan between runs of <command>ANALYZE</command>. To
137 avoid this, raise the amount of statistics collected by
138 <command>ANALYZE</command>, as described below.
139 </para>
141 <para>
142 The extent of analysis can be controlled by adjusting the
143 <xref linkend="guc-default-statistics-target"> configuration variable, or
144 on a column-by-column basis by setting the per-column statistics
145 target with <command>ALTER TABLE ... ALTER COLUMN ... SET
146 STATISTICS</command> (see <xref linkend="sql-altertable"
147 endterm="sql-altertable-title">). The target value sets the
148 maximum number of entries in the most-common-value list and the
149 maximum number of bins in the histogram. The default target value
150 is 10, but this can be adjusted up or down to trade off accuracy of
151 planner estimates against the time taken for
152 <command>ANALYZE</command> and the amount of space occupied in
153 <literal>pg_statistic</literal>. In particular, setting the
154 statistics target to zero disables collection of statistics for
155 that column. It might be useful to do that for columns that are
156 never used as part of the <literal>WHERE</>, <literal>GROUP BY</>,
157 or <literal>ORDER BY</> clauses of queries, since the planner will
158 have no use for statistics on such columns.
159 </para>
161 <para>
162 The largest statistics target among the columns being analyzed determines
163 the number of table rows sampled to prepare the statistics. Increasing
164 the target causes a proportional increase in the time and space needed
165 to do <command>ANALYZE</command>.
166 </para>
167 </refsect1>
169 <refsect1>
170 <title>Compatibility</title>
172 <para>
173 There is no <command>ANALYZE</command> statement in the SQL standard.
174 </para>
175 </refsect1>
177 <refsect1>
178 <title>See Also</title>
180 <simplelist type="inline">
181 <member><xref linkend="sql-vacuum" endterm="sql-vacuum-title"></member>
182 <member><xref linkend="app-vacuumdb" endterm="app-vacuumdb-title"></member>
183 <member><xref linkend="runtime-config-resource-vacuum-cost" endterm="runtime-config-resource-vacuum-cost-title"></member>
184 <member><xref linkend="autovacuum" endterm="autovacuum-title"></member>
185 </simplelist>
186 </refsect1>
187 </refentry>