doc/src/sgml/ref/cluster.sgml

   1 <!--
   2 $PostgreSQL$
   3 PostgreSQL documentation
   4 -->
   5
   6 <refentry id="SQL-CLUSTER">
   7  <refmeta>
   8   <refentrytitle id="sql-cluster-title">CLUSTER</refentrytitle>
   9   <refmiscinfo>SQL - Language Statements</refmiscinfo>
  10  </refmeta>
  11
  12  <refnamediv>
  13   <refname>CLUSTER</refname>
  14   <refpurpose>cluster a table according to an index</refpurpose>
  15  </refnamediv>
  16
  17  <indexterm zone="sql-cluster">
  18   <primary>CLUSTER</primary>
  19  </indexterm>
  20
  21  <refsynopsisdiv>
  22 <synopsis>
  23 CLUSTER <replaceable class="PARAMETER">tablename</replaceable> [ USING <replaceable class="PARAMETER">indexname</replaceable> ]
  24 CLUSTER
  25 </synopsis>
  26  </refsynopsisdiv>
  27
  28  <refsect1>
  29   <title>Description</title>
  30
  31   <para>
  32    <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
  33    to cluster the table specified
  34    by <replaceable class="parameter">tablename</replaceable>
  35    based on the index specified by
  36    <replaceable class="parameter">indexname</replaceable>. The index must
  37    already have been defined on
  38    <replaceable class="parameter">tablename</replaceable>.
  39   </para>
  40
  41   <para>
  42    When a table is clustered, it is physically reordered
  43    based on the index information. Clustering is a one-time operation:
  44    when the table is subsequently updated, the changes are
  45    not clustered.  That is, no attempt is made to store new or
  46    updated rows according to their index order.  (If one wishes, one can
  47    periodically recluster by issuing the command again.  Also, setting
  48    the table's <literal>FILLFACTOR</literal> storage parameter to less than 100% can aid
  49    in preserving cluster ordering during updates, since updated rows
  50    are preferentially kept on the same page.)
  51   </para>
  52
  53   <para>
  54    When a table is clustered, <productname>PostgreSQL</productname>
  55    remembers which index it was clustered by.  The form
  56    <command>CLUSTER <replaceable class="parameter">tablename</replaceable></command>
  57    reclusters the table using the same index as before.
  58   </para>
  59
  60   <para>
  61    <command>CLUSTER</command> without any parameter reclusters all the
  62    previously-clustered tables in the current database that the calling user
  63    owns, or all such tables if called by a superuser.  This
  64    form of <command>CLUSTER</command> cannot be executed inside a transaction
  65    block.
  66   </para>
  67
  68   <para>
  69    When a table is being clustered, an <literal>ACCESS
  70    EXCLUSIVE</literal> lock is acquired on it. This prevents any other
  71    database operations (both reads and writes) from operating on the
  72    table until the <command>CLUSTER</command> is finished.
  73   </para>
  74  </refsect1>
  75
  76  <refsect1>
  77   <title>Parameters</title>
  78
  79   <variablelist>
  80    <varlistentry>
  81     <term><replaceable class="PARAMETER">tablename</replaceable></term>
  82     <listitem>
  83      <para>
  84       The name (possibly schema-qualified) of a table.
  85      </para>
  86     </listitem>
  87    </varlistentry>
  88
  89    <varlistentry>
  90     <term><replaceable class="PARAMETER">indexname</replaceable></term>
  91     <listitem>
  92      <para>
  93       The name of an index.
  94      </para>
  95     </listitem>
  96    </varlistentry>
  97   </variablelist>
  98  </refsect1>
  99
 100  <refsect1>
 101   <title>Notes</title>
 102
 103    <para>
 104     In cases where you are accessing single rows randomly
 105     within a table, the actual order of the data in the
 106     table is unimportant. However, if you tend to access some
 107     data more than others, and there is an index that groups
 108     them together, you will benefit from using <command>CLUSTER</command>.
 109     If you are requesting a range of indexed values from a table, or a
 110     single indexed value that has multiple rows that match,
 111     <command>CLUSTER</command> will help because once the index identifies the
 112     table page for the first row that matches, all other rows
 113     that match are probably already on the same table page,
 114     and so you save disk accesses and speed up the query.
 115    </para>
 116
 117    <para>
 118     During the cluster operation, a temporary copy of the table is created
 119     that contains the table data in the index order.  Temporary copies of
 120     each index on the table are created as well.  Therefore, you need free
 121     space on disk at least equal to the sum of the table size and the index
 122     sizes.
 123    </para>
 124
 125    <para>
 126     Because <command>CLUSTER</command> remembers the clustering information,
 127     one can cluster the tables one wants clustered manually the first time, and
 128     setup a timed event similar to <command>VACUUM</command> so that the tables
 129     are periodically reclustered.
 130    </para>
 131
 132    <para>
 133     Because the planner records statistics about the ordering of
 134     tables, it is advisable to run <xref linkend="sql-analyze"
 135     endterm="sql-analyze-title"> on the newly clustered table.
 136     Otherwise, the planner might make poor choices of query plans.
 137    </para>
 138
 139    <para>
 140     There is another way to cluster data. The
 141     <command>CLUSTER</command> command reorders the original table by
 142     scanning it using the index you specify. This can be slow
 143     on large tables because the rows are fetched from the table
 144     in index order, and if the table is disordered, the
 145     entries are on random pages, so there is one disk page
 146     retrieved for every row moved. (<productname>PostgreSQL</productname> has
 147     a cache, but the majority of a big table will not fit in the cache.)
 148     The other way to cluster a table is to use:
 149
 150 <programlisting>
 151 CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
 152     SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
 153 </programlisting>
 154
 155     which uses the <productname>PostgreSQL</productname> sorting code
 156     to produce the desired order;
 157     this is usually much faster than an index scan for disordered data.
 158     Then you drop the old table, use
 159     <command>ALTER TABLE ... RENAME</command>
 160     to rename <replaceable class="parameter">newtable</replaceable> to the
 161     old name, and recreate the table's indexes.
 162     The big disadvantage of this approach is that it does not preserve
 163     OIDs, constraints, foreign key relationships, granted privileges, and
 164     other ancillary properties of the table &mdash; all such items must be
 165     manually recreated.  Another disadvantage is that this way requires a sort
 166     temporary file about the same size as the table itself, so peak disk usage
 167     is about three times the table size instead of twice the table size.
 168    </para>
 169  </refsect1>
 170
 171  <refsect1>
 172   <title>Examples</title>
 173
 174   <para>
 175    Cluster the table <literal>employees</literal> on the basis of
 176    its index <literal>employees_ind</literal>:
 177 <programlisting>
 178 CLUSTER employees USING employees_ind;
 179 </programlisting>
 180   </para>
 181
 182   <para>
 183    Cluster the <literal>employees</literal> table using the same
 184    index that was used before:
 185 <programlisting>
 186 CLUSTER employees;
 187 </programlisting>
 188   </para>
 189
 190   <para>
 191    Cluster all tables in the database that have previously been clustered:
 192 <programlisting>
 193 CLUSTER;
 194 </programlisting>
 195   </para>
 196  </refsect1>
 197
 198  <refsect1>
 199   <title>Compatibility</title>
 200
 201   <para>
 202    There is no <command>CLUSTER</command> statement in the SQL standard.
 203   </para>
 204
 205   <para>
 206    The syntax
 207 <synopsis>
 208 CLUSTER <replaceable class="PARAMETER">indexname</replaceable> ON <replaceable class="PARAMETER">tablename</replaceable>
 209 </synopsis>
 210   is also supported for compatibility with pre-8.3 <productname>PostgreSQL</>
 211   versions.
 212   </para>
 213  </refsect1>
 214
 215  <refsect1>
 216   <title>See Also</title>
 217
 218   <simplelist type="inline">
 219    <member><xref linkend="app-clusterdb" endterm="app-clusterdb-title"></member>
 220   </simplelist>
 221  </refsect1>
 222 </refentry>