Make LC_COLLATE and LC_CTYPE database-level settings. Collation and
[PostgreSQL.git] / doc / src / sgml / ref / cluster.sgml
blob6e9e5bf502f56aecdc1b43e3457f110469158726
1 <!--
2 $PostgreSQL$
3 PostgreSQL documentation
4 -->
6 <refentry id="SQL-CLUSTER">
7 <refmeta>
8 <refentrytitle id="sql-cluster-title">CLUSTER</refentrytitle>
9 <refmiscinfo>SQL - Language Statements</refmiscinfo>
10 </refmeta>
12 <refnamediv>
13 <refname>CLUSTER</refname>
14 <refpurpose>cluster a table according to an index</refpurpose>
15 </refnamediv>
17 <indexterm zone="sql-cluster">
18 <primary>CLUSTER</primary>
19 </indexterm>
21 <refsynopsisdiv>
22 <synopsis>
23 CLUSTER <replaceable class="PARAMETER">tablename</replaceable> [ USING <replaceable class="PARAMETER">indexname</replaceable> ]
24 CLUSTER
25 </synopsis>
26 </refsynopsisdiv>
28 <refsect1>
29 <title>Description</title>
31 <para>
32 <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
33 to cluster the table specified
34 by <replaceable class="parameter">tablename</replaceable>
35 based on the index specified by
36 <replaceable class="parameter">indexname</replaceable>. The index must
37 already have been defined on
38 <replaceable class="parameter">tablename</replaceable>.
39 </para>
41 <para>
42 When a table is clustered, it is physically reordered
43 based on the index information. Clustering is a one-time operation:
44 when the table is subsequently updated, the changes are
45 not clustered. That is, no attempt is made to store new or
46 updated rows according to their index order. (If one wishes, one can
47 periodically recluster by issuing the command again. Also, setting
48 the table's <literal>FILLFACTOR</literal> storage parameter to less than 100% can aid
49 in preserving cluster ordering during updates, since updated rows
50 are preferentially kept on the same page.)
51 </para>
53 <para>
54 When a table is clustered, <productname>PostgreSQL</productname>
55 remembers which index it was clustered by. The form
56 <command>CLUSTER <replaceable class="parameter">tablename</replaceable></command>
57 reclusters the table using the same index as before.
58 </para>
60 <para>
61 <command>CLUSTER</command> without any parameter reclusters all the
62 previously-clustered tables in the current database that the calling user
63 owns, or all such tables if called by a superuser. This
64 form of <command>CLUSTER</command> cannot be executed inside a transaction
65 block.
66 </para>
68 <para>
69 When a table is being clustered, an <literal>ACCESS
70 EXCLUSIVE</literal> lock is acquired on it. This prevents any other
71 database operations (both reads and writes) from operating on the
72 table until the <command>CLUSTER</command> is finished.
73 </para>
74 </refsect1>
76 <refsect1>
77 <title>Parameters</title>
79 <variablelist>
80 <varlistentry>
81 <term><replaceable class="PARAMETER">tablename</replaceable></term>
82 <listitem>
83 <para>
84 The name (possibly schema-qualified) of a table.
85 </para>
86 </listitem>
87 </varlistentry>
89 <varlistentry>
90 <term><replaceable class="PARAMETER">indexname</replaceable></term>
91 <listitem>
92 <para>
93 The name of an index.
94 </para>
95 </listitem>
96 </varlistentry>
97 </variablelist>
98 </refsect1>
100 <refsect1>
101 <title>Notes</title>
103 <para>
104 In cases where you are accessing single rows randomly
105 within a table, the actual order of the data in the
106 table is unimportant. However, if you tend to access some
107 data more than others, and there is an index that groups
108 them together, you will benefit from using <command>CLUSTER</command>.
109 If you are requesting a range of indexed values from a table, or a
110 single indexed value that has multiple rows that match,
111 <command>CLUSTER</command> will help because once the index identifies the
112 table page for the first row that matches, all other rows
113 that match are probably already on the same table page,
114 and so you save disk accesses and speed up the query.
115 </para>
117 <para>
118 During the cluster operation, a temporary copy of the table is created
119 that contains the table data in the index order. Temporary copies of
120 each index on the table are created as well. Therefore, you need free
121 space on disk at least equal to the sum of the table size and the index
122 sizes.
123 </para>
125 <para>
126 Because <command>CLUSTER</command> remembers the clustering information,
127 one can cluster the tables one wants clustered manually the first time, and
128 setup a timed event similar to <command>VACUUM</command> so that the tables
129 are periodically reclustered.
130 </para>
132 <para>
133 Because the planner records statistics about the ordering of
134 tables, it is advisable to run <xref linkend="sql-analyze"
135 endterm="sql-analyze-title"> on the newly clustered table.
136 Otherwise, the planner might make poor choices of query plans.
137 </para>
139 <para>
140 There is another way to cluster data. The
141 <command>CLUSTER</command> command reorders the original table by
142 scanning it using the index you specify. This can be slow
143 on large tables because the rows are fetched from the table
144 in index order, and if the table is disordered, the
145 entries are on random pages, so there is one disk page
146 retrieved for every row moved. (<productname>PostgreSQL</productname> has
147 a cache, but the majority of a big table will not fit in the cache.)
148 The other way to cluster a table is to use:
150 <programlisting>
151 CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
152 SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
153 </programlisting>
155 which uses the <productname>PostgreSQL</productname> sorting code
156 to produce the desired order;
157 this is usually much faster than an index scan for disordered data.
158 Then you drop the old table, use
159 <command>ALTER TABLE ... RENAME</command>
160 to rename <replaceable class="parameter">newtable</replaceable> to the
161 old name, and recreate the table's indexes.
162 The big disadvantage of this approach is that it does not preserve
163 OIDs, constraints, foreign key relationships, granted privileges, and
164 other ancillary properties of the table &mdash; all such items must be
165 manually recreated. Another disadvantage is that this way requires a sort
166 temporary file about the same size as the table itself, so peak disk usage
167 is about three times the table size instead of twice the table size.
168 </para>
169 </refsect1>
171 <refsect1>
172 <title>Examples</title>
174 <para>
175 Cluster the table <literal>employees</literal> on the basis of
176 its index <literal>employees_ind</literal>:
177 <programlisting>
178 CLUSTER employees USING employees_ind;
179 </programlisting>
180 </para>
182 <para>
183 Cluster the <literal>employees</literal> table using the same
184 index that was used before:
185 <programlisting>
186 CLUSTER employees;
187 </programlisting>
188 </para>
190 <para>
191 Cluster all tables in the database that have previously been clustered:
192 <programlisting>
193 CLUSTER;
194 </programlisting>
195 </para>
196 </refsect1>
198 <refsect1>
199 <title>Compatibility</title>
201 <para>
202 There is no <command>CLUSTER</command> statement in the SQL standard.
203 </para>
205 <para>
206 The syntax
207 <synopsis>
208 CLUSTER <replaceable class="PARAMETER">indexname</replaceable> ON <replaceable class="PARAMETER">tablename</replaceable>
209 </synopsis>
210 is also supported for compatibility with pre-8.3 <productname>PostgreSQL</>
211 versions.
212 </para>
213 </refsect1>
215 <refsect1>
216 <title>See Also</title>
218 <simplelist type="inline">
219 <member><xref linkend="app-clusterdb" endterm="app-clusterdb-title"></member>
220 </simplelist>
221 </refsect1>
222 </refentry>