doc/src/sgml/queries.sgml

   1 <!-- $PostgreSQL$ -->
   2
   3 <chapter id="queries">
   4  <title>Queries</title>
   5
   6  <indexterm zone="queries">
   7   <primary>query</primary>
   8  </indexterm>
   9
  10  <indexterm zone="queries">
  11   <primary>SELECT</primary>
  12  </indexterm>
  13
  14  <para>
  15   The previous chapters explained how to create tables, how to fill
  16   them with data, and how to manipulate that data.  Now we finally
  17   discuss how to retrieve the data from the database.
  18  </para>
  19
  20
  21  <sect1 id="queries-overview">
  22   <title>Overview</title>
  23
  24   <para>
  25    The process of retrieving or the command to retrieve data from a
  26    database is called a <firstterm>query</firstterm>.  In SQL the
  27    <xref linkend="sql-select" endterm="sql-select-title"> command is
  28    used to specify queries.  The general syntax of the
  29    <command>SELECT</command> command is
  30 <synopsis>
  31 <optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
  32 </synopsis>
  33    The following sections describe the details of the select list, the
  34    table expression, and the sort specification.  <literal>WITH</>
  35    queries are treated last since they are an advanced feature.
  36   </para>
  37
  38   <para>
  39    A simple kind of query has the form:
  40 <programlisting>
  41 SELECT * FROM table1;
  42 </programlisting>
  43   Assuming that there is a table called <literal>table1</literal>,
  44   this command would retrieve all rows and all columns from
  45   <literal>table1</literal>.  (The method of retrieval depends on the
  46   client application.  For example, the
  47   <application>psql</application> program will display an ASCII-art
  48   table on the screen, while client libraries will offer functions to
  49   extract individual values from the query result.)  The select list
  50   specification <literal>*</literal> means all columns that the table
  51   expression happens to provide.  A select list can also select a
  52   subset of the available columns or make calculations using the
  53   columns.  For example, if
  54   <literal>table1</literal> has columns named <literal>a</>,
  55   <literal>b</>, and <literal>c</> (and perhaps others) you can make
  56   the following query:
  57 <programlisting>
  58 SELECT a, b + c FROM table1;
  59 </programlisting>
  60   (assuming that <literal>b</> and <literal>c</> are of a numerical
  61   data type).
  62   See <xref linkend="queries-select-lists"> for more details.
  63  </para>
  64
  65  <para>
  66   <literal>FROM table1</literal> is a simple kind of
  67   table expression: it reads just one table.  In general, table
  68   expressions can be complex constructs of base tables, joins, and
  69   subqueries.  But you can also omit the table expression entirely and
  70   use the <command>SELECT</command> command as a calculator:
  71 <programlisting>
  72 SELECT 3 * 4;
  73 </programlisting>
  74   This is more useful if the expressions in the select list return
  75   varying results.  For example, you could call a function this way:
  76 <programlisting>
  77 SELECT random();
  78 </programlisting>
  79   </para>
  80  </sect1>
  81
  82
  83  <sect1 id="queries-table-expressions">
  84   <title>Table Expressions</title>
  85
  86   <indexterm zone="queries-table-expressions">
  87    <primary>table expression</primary>
  88   </indexterm>
  89
  90   <para>
  91    A <firstterm>table expression</firstterm> computes a table.  The
  92    table expression contains a <literal>FROM</> clause that is
  93    optionally followed by <literal>WHERE</>, <literal>GROUP BY</>, and
  94    <literal>HAVING</> clauses.  Trivial table expressions simply refer
  95    to a table on disk, a so-called base table, but more complex
  96    expressions can be used to modify or combine base tables in various
  97    ways.
  98   </para>
  99
 100   <para>
 101    The optional <literal>WHERE</>, <literal>GROUP BY</>, and
 102    <literal>HAVING</> clauses in the table expression specify a
 103    pipeline of successive transformations performed on the table
 104    derived in the <literal>FROM</> clause.  All these transformations
 105    produce a virtual table that provides the rows that are passed to
 106    the select list to compute the output rows of the query.
 107   </para>
 108
 109   <sect2 id="queries-from">
 110    <title>The <literal>FROM</literal> Clause</title>
 111
 112    <para>
 113     The <xref linkend="sql-from" endterm="sql-from-title"> derives a
 114     table from one or more other tables given in a comma-separated
 115     table reference list.
 116 <synopsis>
 117 FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
 118 </synopsis>
 119
 120     A table reference can be a table name (possibly schema-qualified),
 121     or a derived table such as a subquery, a table join, or complex
 122     combinations of these.  If more than one table reference is listed
 123     in the <literal>FROM</> clause they are cross-joined (see below)
 124     to form the intermediate virtual table that can then be subject to
 125     transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
 126     and <literal>HAVING</> clauses and is finally the result of the
 127     overall table expression.
 128    </para>
 129
 130    <indexterm>
 131     <primary>ONLY</primary>
 132    </indexterm>
 133
 134    <para>
 135     When a table reference names a table that is the parent of a
 136     table inheritance hierarchy, the table reference produces rows of
 137     not only that table but all of its descendant tables, unless the
 138     key word <literal>ONLY</> precedes the table name.  However, the
 139     reference produces only the columns that appear in the named table
 140     &mdash; any columns added in subtables are ignored.
 141    </para>
 142
 143    <sect3 id="queries-join">
 144     <title>Joined Tables</title>
 145
 146     <indexterm zone="queries-join">
 147      <primary>join</primary>
 148     </indexterm>
 149
 150     <para>
 151      A joined table is a table derived from two other (real or
 152      derived) tables according to the rules of the particular join
 153      type.  Inner, outer, and cross-joins are available.
 154     </para>
 155
 156     <variablelist>
 157      <title>Join Types</title>
 158
 159      <varlistentry>
 160       <term>Cross join</term>
 161
 162       <indexterm>
 163        <primary>join</primary>
 164        <secondary>cross</secondary>
 165       </indexterm>
 166
 167       <indexterm>
 168        <primary>cross join</primary>
 169       </indexterm>
 170
 171       <listitem>
 172 <synopsis>
 173 <replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
 174 </synopsis>
 175
 176        <para>
 177         For every possible combination of rows from
 178         <replaceable>T1</replaceable> and
 179         <replaceable>T2</replaceable> (i.e., a Cartesian product),
 180         the joined table will contain a
 181         row consisting of all columns in <replaceable>T1</replaceable>
 182         followed by all columns in <replaceable>T2</replaceable>.  If
 183         the tables have N and M rows respectively, the joined
 184         table will have N * M rows.
 185        </para>
 186
 187        <para>
 188         <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
 189         <replaceable>T2</replaceable></literal> is equivalent to
 190         <literal>FROM <replaceable>T1</replaceable>,
 191         <replaceable>T2</replaceable></literal>.  It is also equivalent to
 192         <literal>FROM <replaceable>T1</replaceable> INNER JOIN
 193         <replaceable>T2</replaceable> ON TRUE</literal> (see below).
 194        </para>
 195       </listitem>
 196      </varlistentry>
 197
 198      <varlistentry>
 199       <term>Qualified joins</term>
 200
 201       <indexterm>
 202        <primary>join</primary>
 203        <secondary>outer</secondary>
 204       </indexterm>
 205
 206       <indexterm>
 207        <primary>outer join</primary>
 208       </indexterm>
 209
 210       <listitem>
 211 <synopsis>
 212 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
 213 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
 214 <replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
 215 </synopsis>
 216
 217        <para>
 218         The words <literal>INNER</literal> and
 219         <literal>OUTER</literal> are optional in all forms.
 220         <literal>INNER</literal> is the default;
 221         <literal>LEFT</literal>, <literal>RIGHT</literal>, and
 222         <literal>FULL</literal> imply an outer join.
 223        </para>
 224
 225        <para>
 226         The <firstterm>join condition</firstterm> is specified in the
 227         <literal>ON</> or <literal>USING</> clause, or implicitly by
 228         the word <literal>NATURAL</>.  The join condition determines
 229         which rows from the two source tables are considered to
 230         <quote>match</quote>, as explained in detail below.
 231        </para>
 232
 233        <para>
 234         The <literal>ON</> clause is the most general kind of join
 235         condition: it takes a Boolean value expression of the same
 236         kind as is used in a <literal>WHERE</> clause.  A pair of rows
 237         from <replaceable>T1</> and <replaceable>T2</> match if the
 238         <literal>ON</> expression evaluates to true for them.
 239        </para>
 240
 241        <para>
 242         <literal>USING</> is a shorthand notation: it takes a
 243         comma-separated list of column names, which the joined tables
 244         must have in common, and forms a join condition specifying
 245         equality of each of these pairs of columns.  Furthermore, the
 246         output of <literal>JOIN USING</> has one column for each of
 247         the equated pairs of input columns, followed by the
 248         remaining columns from each table.  Thus, <literal>USING (a, b,
 249         c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
 250         t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
 251         if <literal>ON</> is used there will be two columns
 252         <literal>a</>, <literal>b</>, and <literal>c</> in the result,
 253         whereas with <literal>USING</> there will be only one of each
 254         (and they will appear first if <command>SELECT *</> is used).
 255        </para>
 256
 257        <para>
 258         <indexterm>
 259          <primary>join</primary>
 260          <secondary>natural</secondary>
 261         </indexterm>
 262         <indexterm>
 263          <primary>natural join</primary>
 264         </indexterm>
 265         Finally, <literal>NATURAL</> is a shorthand form of
 266         <literal>USING</>: it forms a <literal>USING</> list
 267         consisting of all column names that appear in both
 268         input tables.  As with <literal>USING</>, these columns appear
 269         only once in the output table.
 270        </para>
 271
 272        <para>
 273         The possible types of qualified join are:
 274
 275        <variablelist>
 276         <varlistentry>
 277          <term><literal>INNER JOIN</></term>
 278
 279          <listitem>
 280           <para>
 281            For each row R1 of T1, the joined table has a row for each
 282            row in T2 that satisfies the join condition with R1.
 283           </para>
 284          </listitem>
 285         </varlistentry>
 286
 287         <varlistentry>
 288          <term><literal>LEFT OUTER JOIN</></term>
 289
 290          <indexterm>
 291           <primary>join</primary>
 292           <secondary>left</secondary>
 293          </indexterm>
 294
 295          <indexterm>
 296           <primary>left join</primary>
 297          </indexterm>
 298
 299          <listitem>
 300           <para>
 301            First, an inner join is performed.  Then, for each row in
 302            T1 that does not satisfy the join condition with any row in
 303            T2, a joined row is added with null values in columns of
 304            T2.  Thus, the joined table always has at least
 305            one row for each row in T1.
 306           </para>
 307          </listitem>
 308         </varlistentry>
 309
 310         <varlistentry>
 311          <term><literal>RIGHT OUTER JOIN</></term>
 312
 313          <indexterm>
 314           <primary>join</primary>
 315           <secondary>right</secondary>
 316          </indexterm>
 317
 318          <indexterm>
 319           <primary>right join</primary>
 320          </indexterm>
 321
 322          <listitem>
 323           <para>
 324            First, an inner join is performed.  Then, for each row in
 325            T2 that does not satisfy the join condition with any row in
 326            T1, a joined row is added with null values in columns of
 327            T1.  This is the converse of a left join: the result table
 328            will always have a row for each row in T2.
 329           </para>
 330          </listitem>
 331         </varlistentry>
 332
 333         <varlistentry>
 334          <term><literal>FULL OUTER JOIN</></term>
 335
 336          <listitem>
 337           <para>
 338            First, an inner join is performed.  Then, for each row in
 339            T1 that does not satisfy the join condition with any row in
 340            T2, a joined row is added with null values in columns of
 341            T2.  Also, for each row of T2 that does not satisfy the
 342            join condition with any row in T1, a joined row with null
 343            values in the columns of T1 is added.
 344           </para>
 345          </listitem>
 346         </varlistentry>
 347        </variablelist>
 348        </para>
 349       </listitem>
 350      </varlistentry>
 351     </variablelist>
 352
 353     <para>
 354      Joins of all types can be chained together or nested: either or
 355      both <replaceable>T1</replaceable> and
 356      <replaceable>T2</replaceable> can be joined tables.  Parentheses
 357      can be used around <literal>JOIN</> clauses to control the join
 358      order.  In the absence of parentheses, <literal>JOIN</> clauses
 359      nest left-to-right.
 360     </para>
 361
 362     <para>
 363      To put this together, assume we have tables <literal>t1</literal>:
 364 <programlisting>
 365  num | name
 366 -----+------
 367    1 | a
 368    2 | b
 369    3 | c
 370 </programlisting>
 371      and <literal>t2</literal>:
 372 <programlisting>
 373  num | value
 374 -----+-------
 375    1 | xxx
 376    3 | yyy
 377    5 | zzz
 378 </programlisting>
 379      then we get the following results for the various joins:
 380 <screen>
 381 <prompt>=&gt;</> <userinput>SELECT * FROM t1 CROSS JOIN t2;</>
 382  num | name | num | value
 383 -----+------+-----+-------
 384    1 | a    |   1 | xxx
 385    1 | a    |   3 | yyy
 386    1 | a    |   5 | zzz
 387    2 | b    |   1 | xxx
 388    2 | b    |   3 | yyy
 389    2 | b    |   5 | zzz
 390    3 | c    |   1 | xxx
 391    3 | c    |   3 | yyy
 392    3 | c    |   5 | zzz
 393 (9 rows)
 394
 395 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;</>
 396  num | name | num | value
 397 -----+------+-----+-------
 398    1 | a    |   1 | xxx
 399    3 | c    |   3 | yyy
 400 (2 rows)
 401
 402 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 USING (num);</>
 403  num | name | value
 404 -----+------+-------
 405    1 | a    | xxx
 406    3 | c    | yyy
 407 (2 rows)
 408
 409 <prompt>=&gt;</> <userinput>SELECT * FROM t1 NATURAL INNER JOIN t2;</>
 410  num | name | value
 411 -----+------+-------
 412    1 | a    | xxx
 413    3 | c    | yyy
 414 (2 rows)
 415
 416 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;</>
 417  num | name | num | value
 418 -----+------+-----+-------
 419    1 | a    |   1 | xxx
 420    2 | b    |     |
 421    3 | c    |   3 | yyy
 422 (3 rows)
 423
 424 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 USING (num);</>
 425  num | name | value
 426 -----+------+-------
 427    1 | a    | xxx
 428    2 | b    |
 429    3 | c    | yyy
 430 (3 rows)
 431
 432 <prompt>=&gt;</> <userinput>SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;</>
 433  num | name | num | value
 434 -----+------+-----+-------
 435    1 | a    |   1 | xxx
 436    3 | c    |   3 | yyy
 437      |      |   5 | zzz
 438 (3 rows)
 439
 440 <prompt>=&gt;</> <userinput>SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;</>
 441  num | name | num | value
 442 -----+------+-----+-------
 443    1 | a    |   1 | xxx
 444    2 | b    |     |
 445    3 | c    |   3 | yyy
 446      |      |   5 | zzz
 447 (4 rows)
 448 </screen>
 449     </para>
 450
 451     <para>
 452      The join condition specified with <literal>ON</> can also contain
 453      conditions that do not relate directly to the join.  This can
 454      prove useful for some queries but needs to be thought out
 455      carefully.  For example:
 456 <screen>
 457 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';</>
 458  num | name | num | value
 459 -----+------+-----+-------
 460    1 | a    |   1 | xxx
 461    2 | b    |     |
 462    3 | c    |     |
 463 (3 rows)
 464 </screen>
 465      Notice that placing the restriction in the <literal>WHERE</> clause
 466      produces a different result:
 467 <screen>
 468 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';</>
 469  num | name | num | value
 470 -----+------+-----+-------
 471    1 | a    |   1 | xxx
 472 (1 row)
 473 </screen>
 474      This is because a restriction placed in the <literal>ON</>
 475      clause is processed <emphasis>before</> the join, while
 476      a restriction placed in the <literal>WHERE</> clause is processed
 477      <emphasis>after</> the join.
 478     </para>
 479    </sect3>
 480
 481    <sect3 id="queries-table-aliases">
 482     <title>Table and Column Aliases</title>
 483
 484     <indexterm zone="queries-table-aliases">
 485      <primary>alias</primary>
 486      <secondary>in the FROM clause</secondary>
 487     </indexterm>
 488
 489     <indexterm>
 490      <primary>label</primary>
 491      <see>alias</see>
 492     </indexterm>
 493
 494     <para>
 495      A temporary name can be given to tables and complex table
 496      references to be used for references to the derived table in
 497      the rest of the query.  This is called a <firstterm>table
 498      alias</firstterm>.
 499     </para>
 500
 501     <para>
 502      To create a table alias, write
 503 <synopsis>
 504 FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
 505 </synopsis>
 506      or
 507 <synopsis>
 508 FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
 509 </synopsis>
 510      The <literal>AS</literal> key word is optional noise.
 511      <replaceable>alias</replaceable> can be any identifier.
 512     </para>
 513
 514     <para>
 515      A typical application of table aliases is to assign short
 516      identifiers to long table names to keep the join clauses
 517      readable.  For example:
 518 <programlisting>
 519 SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
 520 </programlisting>
 521     </para>
 522
 523     <para>
 524      The alias becomes the new name of the table reference for the
 525      current query &mdash; it is no longer possible to refer to the table
 526      by the original name.  Thus:
 527 <programlisting>
 528 SELECT * FROM my_table AS m WHERE my_table.a &gt; 5;
 529 </programlisting>
 530      is not valid according to the SQL standard.  In
 531      <productname>PostgreSQL</productname> this will draw an error, assuming the
 532      <xref linkend="guc-add-missing-from"> configuration variable is
 533      <literal>off</> (as it is by default).  If it is <literal>on</>,
 534      an implicit table reference will be added to the
 535      <literal>FROM</literal> clause, so the query is processed as if
 536      it were written as:
 537 <programlisting>
 538 SELECT * FROM my_table AS m, my_table AS my_table WHERE my_table.a &gt; 5;
 539 </programlisting>
 540      That will result in a cross join, which is usually not what you want.
 541     </para>
 542
 543     <para>
 544      Table aliases are mainly for notational convenience, but it is
 545      necessary to use them when joining a table to itself, e.g.:
 546 <programlisting>
 547 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
 548 </programlisting>
 549      Additionally, an alias is required if the table reference is a
 550      subquery (see <xref linkend="queries-subqueries">).
 551     </para>
 552
 553     <para>
 554      Parentheses are used to resolve ambiguities.  In the following example,
 555      the first statement assigns the alias <literal>b</literal> to the second
 556      instance of <literal>my_table</>, but the second statement assigns the
 557      alias to the result of the join:
 558 <programlisting>
 559 SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
 560 SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
 561 </programlisting>
 562     </para>
 563
 564     <para>
 565      Another form of table aliasing gives temporary names to the columns of
 566      the table, as well as the table itself:
 567 <synopsis>
 568 FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
 569 </synopsis>
 570      If fewer column aliases are specified than the actual table has
 571      columns, the remaining columns are not renamed.  This syntax is
 572      especially useful for self-joins or subqueries.
 573     </para>
 574
 575     <para>
 576      When an alias is applied to the output of a <literal>JOIN</>
 577      clause, the alias hides the original
 578      name(s) within the <literal>JOIN</>.  For example:
 579 <programlisting>
 580 SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
 581 </programlisting>
 582      is valid SQL, but:
 583 <programlisting>
 584 SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
 585 </programlisting>
 586      is not valid; the table alias <literal>a</> is not visible
 587      outside the alias <literal>c</>.
 588     </para>
 589    </sect3>
 590
 591    <sect3 id="queries-subqueries">
 592     <title>Subqueries</title>
 593
 594     <indexterm zone="queries-subqueries">
 595      <primary>subquery</primary>
 596     </indexterm>
 597
 598     <para>
 599      Subqueries specifying a derived table must be enclosed in
 600      parentheses and <emphasis>must</emphasis> be assigned a table
 601      alias name.  (See <xref linkend="queries-table-aliases">.)  For
 602      example:
 603 <programlisting>
 604 FROM (SELECT * FROM table1) AS alias_name
 605 </programlisting>
 606     </para>
 607
 608     <para>
 609      This example is equivalent to <literal>FROM table1 AS
 610      alias_name</literal>.  More interesting cases, which cannot be
 611      reduced to a plain join, arise when the subquery involves
 612      grouping or aggregation.
 613     </para>
 614
 615     <para>
 616      A subquery can also be a <command>VALUES</> list:
 617 <programlisting>
 618 FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
 619      AS names(first, last)
 620 </programlisting>
 621      Again, a table alias is required.  Assigning alias names to the columns
 622      of the <command>VALUES</> list is optional, but is good practice.
 623      For more information see <xref linkend="queries-values">.
 624     </para>
 625    </sect3>
 626
 627    <sect3 id="queries-tablefunctions">
 628     <title>Table Functions</title>
 629
 630     <indexterm zone="queries-tablefunctions"><primary>table function</></>
 631
 632     <indexterm zone="queries-tablefunctions">
 633      <primary>function</>
 634      <secondary>in the FROM clause</>
 635     </indexterm>
 636
 637     <para>
 638      Table functions are functions that produce a set of rows, made up
 639      of either base data types (scalar types) or composite data types
 640      (table rows).  They are used like a table, view, or subquery in
 641      the <literal>FROM</> clause of a query. Columns returned by table
 642      functions can be included in <literal>SELECT</>,
 643      <literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
 644      as a table, view, or subquery column.
 645     </para>
 646
 647     <para>
 648      If a table function returns a base data type, the single result
 649      column name matches the function name. If the function returns a
 650      composite type, the result columns get the same names as the
 651      individual attributes of the type.
 652     </para>
 653
 654     <para>
 655      A table function can be aliased in the <literal>FROM</> clause,
 656      but it also can be left unaliased. If a function is used in the
 657      <literal>FROM</> clause with no alias, the function name is used
 658      as the resulting table name.
 659     </para>
 660
 661     <para>
 662      Some examples:
 663 <programlisting>
 664 CREATE TABLE foo (fooid int, foosubid int, fooname text);
 665
 666 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
 667     SELECT * FROM foo WHERE fooid = $1;
 668 $$ LANGUAGE SQL;
 669
 670 SELECT * FROM getfoo(1) AS t1;
 671
 672 SELECT * FROM foo
 673     WHERE foosubid IN (
 674                         SELECT foosubid
 675                         FROM getfoo(foo.fooid) z
 676                         WHERE z.fooid = foo.fooid
 677                       );
 678
 679 CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
 680
 681 SELECT * FROM vw_getfoo;
 682 </programlisting>
 683     </para>
 684
 685     <para>
 686      In some cases it is useful to define table functions that can
 687      return different column sets depending on how they are invoked.
 688      To support this, the table function can be declared as returning
 689      the pseudotype <type>record</>.  When such a function is used in
 690      a query, the expected row structure must be specified in the
 691      query itself, so that the system can know how to parse and plan
 692      the query.  Consider this example:
 693 <programlisting>
 694 SELECT *
 695     FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
 696       AS t1(proname name, prosrc text)
 697     WHERE proname LIKE 'bytea%';
 698 </programlisting>
 699      The <literal>dblink</> function executes a remote query (see
 700      <filename>contrib/dblink</>).  It is declared to return
 701      <type>record</> since it might be used for any kind of query.
 702      The actual column set must be specified in the calling query so
 703      that the parser knows, for example, what <literal>*</> should
 704      expand to.
 705     </para>
 706    </sect3>
 707   </sect2>
 708
 709   <sect2 id="queries-where">
 710    <title>The <literal>WHERE</literal> Clause</title>
 711
 712    <indexterm zone="queries-where">
 713     <primary>WHERE</primary>
 714    </indexterm>
 715
 716    <para>
 717     The syntax of the <xref linkend="sql-where"
 718     endterm="sql-where-title"> is
 719 <synopsis>
 720 WHERE <replaceable>search_condition</replaceable>
 721 </synopsis>
 722     where <replaceable>search_condition</replaceable> is any value
 723     expression (see <xref linkend="sql-expressions">) that
 724     returns a value of type <type>boolean</type>.
 725    </para>
 726
 727    <para>
 728     After the processing of the <literal>FROM</> clause is done, each
 729     row of the derived virtual table is checked against the search
 730     condition.  If the result of the condition is true, the row is
 731     kept in the output table, otherwise (i.e., if the result is
 732     false or null) it is discarded.  The search condition typically
 733     references at least one column of the table generated in the
 734     <literal>FROM</> clause; this is not required, but otherwise the
 735     <literal>WHERE</> clause will be fairly useless.
 736    </para>
 737
 738    <note>
 739     <para>
 740      The join condition of an inner join can be written either in
 741      the <literal>WHERE</> clause or in the <literal>JOIN</> clause.
 742      For example, these table expressions are equivalent:
 743 <programlisting>
 744 FROM a, b WHERE a.id = b.id AND b.val &gt; 5
 745 </programlisting>
 746      and:
 747 <programlisting>
 748 FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
 749 </programlisting>
 750      or perhaps even:
 751 <programlisting>
 752 FROM a NATURAL JOIN b WHERE b.val &gt; 5
 753 </programlisting>
 754      Which one of these you use is mainly a matter of style.  The
 755      <literal>JOIN</> syntax in the <literal>FROM</> clause is
 756      probably not as portable to other SQL database management systems,
 757      even though it is in the SQL standard.  For
 758      outer joins there is no choice:  they must be done in
 759      the <literal>FROM</> clause.  The <literal>ON</> or <literal>USING</>
 760      clause of an outer join is <emphasis>not</> equivalent to a
 761      <literal>WHERE</> condition, because it results in the addition
 762      of rows (for unmatched input rows) as well as the removal of rows
 763      in the final result.
 764     </para>
 765    </note>
 766
 767    <para>
 768     Here are some examples of <literal>WHERE</literal> clauses:
 769 <programlisting>
 770 SELECT ... FROM fdt WHERE c1 &gt; 5
 771
 772 SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
 773
 774 SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
 775
 776 SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
 777
 778 SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
 779
 780 SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 &gt; fdt.c1)
 781 </programlisting>
 782     <literal>fdt</literal> is the table derived in the
 783     <literal>FROM</> clause. Rows that do not meet the search
 784     condition of the <literal>WHERE</> clause are eliminated from
 785     <literal>fdt</literal>. Notice the use of scalar subqueries as
 786     value expressions.  Just like any other query, the subqueries can
 787     employ complex table expressions.  Notice also how
 788     <literal>fdt</literal> is referenced in the subqueries.
 789     Qualifying <literal>c1</> as <literal>fdt.c1</> is only necessary
 790     if <literal>c1</> is also the name of a column in the derived
 791     input table of the subquery.  But qualifying the column name adds
 792     clarity even when it is not needed.  This example shows how the column
 793     naming scope of an outer query extends into its inner queries.
 794    </para>
 795   </sect2>
 796
 797
 798   <sect2 id="queries-group">
 799    <title>The <literal>GROUP BY</literal> and <literal>HAVING</literal> Clauses</title>
 800
 801    <indexterm zone="queries-group">
 802     <primary>GROUP BY</primary>
 803    </indexterm>
 804
 805    <indexterm zone="queries-group">
 806     <primary>grouping</primary>
 807    </indexterm>
 808
 809    <para>
 810     After passing the <literal>WHERE</> filter, the derived input
 811     table might be subject to grouping, using the <literal>GROUP BY</>
 812     clause, and elimination of group rows using the <literal>HAVING</>
 813     clause.
 814    </para>
 815
 816 <synopsis>
 817 SELECT <replaceable>select_list</replaceable>
 818     FROM ...
 819     <optional>WHERE ...</optional>
 820     GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
 821 </synopsis>
 822
 823    <para>
 824     The <xref linkend="sql-groupby" endterm="sql-groupby-title"> is
 825     used to group together those rows in a table that have the same
 826     values in all the columns listed. The order in which the columns
 827     are listed does not matter.  The effect is to combine each set
 828     of rows having common values into one group row that
 829     represents all rows in the group.  This is done to
 830     eliminate redundancy in the output and/or compute aggregates that
 831     apply to these groups.  For instance:
 832 <screen>
 833 <prompt>=&gt;</> <userinput>SELECT * FROM test1;</>
 834  x | y
 835 ---+---
 836  a | 3
 837  c | 2
 838  b | 5
 839  a | 1
 840 (4 rows)
 841
 842 <prompt>=&gt;</> <userinput>SELECT x FROM test1 GROUP BY x;</>
 843  x
 844 ---
 845  a
 846  b
 847  c
 848 (3 rows)
 849 </screen>
 850    </para>
 851
 852    <para>
 853     In the second query, we could not have written <literal>SELECT *
 854     FROM test1 GROUP BY x</literal>, because there is no single value
 855     for the column <literal>y</> that could be associated with each
 856     group.  The grouped-by columns can be referenced in the select list since
 857     they have a single value in each group.
 858    </para>
 859
 860    <para>
 861     In general, if a table is grouped, columns that are not
 862     listed in <literal>GROUP BY</> cannot be referenced except in aggregate
 863     expressions.  An example with aggregate expressions is:
 864 <screen>
 865 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x;</>
 866  x | sum
 867 ---+-----
 868  a |   4
 869  b |   5
 870  c |   2
 871 (3 rows)
 872 </screen>
 873     Here <literal>sum</literal> is an aggregate function that
 874     computes a single value over the entire group.  More information
 875     about the available aggregate functions can be found in <xref
 876     linkend="functions-aggregate">.
 877    </para>
 878
 879    <tip>
 880     <para>
 881      Grouping without aggregate expressions effectively calculates the
 882      set of distinct values in a column.  This can also be achieved
 883      using the <literal>DISTINCT</> clause (see <xref
 884      linkend="queries-distinct">).
 885     </para>
 886    </tip>
 887
 888    <para>
 889     Here is another example:  it calculates the total sales for each
 890     product (rather than the total sales of all products):
 891 <programlisting>
 892 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
 893     FROM products p LEFT JOIN sales s USING (product_id)
 894     GROUP BY product_id, p.name, p.price;
 895 </programlisting>
 896     In this example, the columns <literal>product_id</literal>,
 897     <literal>p.name</literal>, and <literal>p.price</literal> must be
 898     in the <literal>GROUP BY</> clause since they are referenced in
 899     the query select list.  (Depending on how the products
 900     table is set up, name and price might be fully dependent on the
 901     product ID, so the additional groupings could theoretically be
 902     unnecessary, though this is not implemented.)  The column
 903     <literal>s.units</> does not have to be in the <literal>GROUP
 904     BY</> list since it is only used in an aggregate expression
 905     (<literal>sum(...)</literal>), which represents the sales
 906     of a product.  For each product, the query returns a summary row about
 907     all sales of the product.
 908    </para>
 909
 910    <para>
 911     In strict SQL, <literal>GROUP BY</> can only group by columns of
 912     the source table but <productname>PostgreSQL</productname> extends
 913     this to also allow <literal>GROUP BY</> to group by columns in the
 914     select list.  Grouping by value expressions instead of simple
 915     column names is also allowed.
 916    </para>
 917
 918    <indexterm>
 919     <primary>HAVING</primary>
 920    </indexterm>
 921
 922    <para>
 923     If a table has been grouped using <literal>GROUP BY</literal>,
 924     but only certain groups are of interest, the
 925     <literal>HAVING</literal> clause can be used, much like a
 926     <literal>WHERE</> clause, to eliminate groups from the result.
 927     The syntax is:
 928 <synopsis>
 929 SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
 930 </synopsis>
 931     Expressions in the <literal>HAVING</> clause can refer both to
 932     grouped expressions and to ungrouped expressions (which necessarily
 933     involve an aggregate function).
 934    </para>
 935
 936    <para>
 937     Example:
 938 <screen>
 939 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) &gt; 3;</>
 940  x | sum
 941 ---+-----
 942  a |   4
 943  b |   5
 944 (2 rows)
 945
 946 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x &lt; 'c';</>
 947  x | sum
 948 ---+-----
 949  a |   4
 950  b |   5
 951 (2 rows)
 952 </screen>
 953    </para>
 954
 955    <para>
 956     Again, a more realistic example:
 957 <programlisting>
 958 SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
 959     FROM products p LEFT JOIN sales s USING (product_id)
 960     WHERE s.date &gt; CURRENT_DATE - INTERVAL '4 weeks'
 961     GROUP BY product_id, p.name, p.price, p.cost
 962     HAVING sum(p.price * s.units) &gt; 5000;
 963 </programlisting>
 964     In the example above, the <literal>WHERE</> clause is selecting
 965     rows by a column that is not grouped (the expression is only true for
 966     sales during the last four weeks), while the <literal>HAVING</>
 967     clause restricts the output to groups with total gross sales over
 968     5000.  Note that the aggregate expressions do not necessarily need
 969     to be the same in all parts of the query.
 970    </para>
 971
 972    <para>
 973     If a query contains aggregate function calls, but no <literal>GROUP BY</>
 974     clause, grouping still occurs: the result is a single group row (or
 975     perhaps no rows at all, if the single row is then eliminated by
 976     <literal>HAVING</>).
 977     The same is true if it contains a <literal>HAVING</> clause, even
 978     without any aggregate function calls or <literal>GROUP BY</> clause.
 979    </para>
 980   </sect2>
 981
 982   <sect2 id="queries-window">
 983    <title>Window Function Processing</>
 984
 985    <indexterm zone="queries-window">
 986     <primary>window function</primary>
 987     <secondary>order of execution</>
 988    </indexterm>
 989
 990    <para>
 991     If the query contains any window functions (see
 992     <xref linkend="tutorial-window"> and
 993     <xref linkend="syntax-window-functions">), these functions are evaluated
 994     after any grouping, aggregation, and <literal>HAVING</> filtering is
 995     performed.  That is, if the query uses any aggregates, <literal>GROUP
 996     BY</>, or <literal>HAVING</>, then the rows seen by the window functions
 997     are the group rows instead of the original table rows from
 998     <literal>FROM</>/<literal>WHERE</>.
 999    </para>
1000
1001    <para>
1002     When multiple window functions are used, all the window functions having
1003     syntactically equivalent <literal>PARTITION BY</> and <literal>ORDER BY</>
1004     clauses in their window definitions are guaranteed to be evaluated in a
1005     single pass over the data. Therefore they will see the same sort ordering,
1006     even if the <literal>ORDER BY</> does not uniquely determine an ordering.
1007     However, no guarantees are made about the evaluation of functions having
1008     different <literal>PARTITION BY</> or <literal>ORDER BY</> specifications.
1009     (In such cases a sort step is typically required between the passes of
1010     window function evaluations, and the sort is not guaranteed to preserve
1011     ordering of rows that its <literal>ORDER BY</> sees as equivalent.)
1012    </para>
1013
1014    <para>
1015     Currently, window functions always require presorted data, and so the
1016     query output will be ordered according to one or another of the window
1017     functions' <literal>PARTITION BY</>/<literal>ORDER BY</> clauses.
1018     It is not recommendable to rely on this, however.  Use an explicit
1019     top-level <literal>ORDER BY</> clause if you want to be sure the
1020     results are sorted in a particular way.
1021    </para>
1022   </sect2>
1023  </sect1>
1024
1025
1026  <sect1 id="queries-select-lists">
1027   <title>Select Lists</title>
1028
1029   <indexterm>
1030    <primary>SELECT</primary>
1031    <secondary>select list</secondary>
1032   </indexterm>
1033
1034   <para>
1035    As shown in the previous section,
1036    the table expression in the <command>SELECT</command> command
1037    constructs an intermediate virtual table by possibly combining
1038    tables, views, eliminating rows, grouping, etc.  This table is
1039    finally passed on to processing by the <firstterm>select list</firstterm>.  The select
1040    list determines which <emphasis>columns</emphasis> of the
1041    intermediate table are actually output.
1042   </para>
1043
1044   <sect2 id="queries-select-list-items">
1045    <title>Select-List Items</title>
1046
1047    <indexterm>
1048     <primary>*</primary>
1049    </indexterm>
1050
1051    <para>
1052     The simplest kind of select list is <literal>*</literal> which
1053     emits all columns that the table expression produces.  Otherwise,
1054     a select list is a comma-separated list of value expressions (as
1055     defined in <xref linkend="sql-expressions">).  For instance, it
1056     could be a list of column names:
1057 <programlisting>
1058 SELECT a, b, c FROM ...
1059 </programlisting>
1060      The columns names <literal>a</>, <literal>b</>, and <literal>c</>
1061      are either the actual names of the columns of tables referenced
1062      in the <literal>FROM</> clause, or the aliases given to them as
1063      explained in <xref linkend="queries-table-aliases">.  The name
1064      space available in the select list is the same as in the
1065      <literal>WHERE</> clause, unless grouping is used, in which case
1066      it is the same as in the <literal>HAVING</> clause.
1067    </para>
1068
1069    <para>
1070     If more than one table has a column of the same name, the table
1071     name must also be given, as in:
1072 <programlisting>
1073 SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
1074 </programlisting>
1075     When working with multiple tables, it can also be useful to ask for
1076     all the columns of a particular table:
1077 <programlisting>
1078 SELECT tbl1.*, tbl2.a FROM ...
1079 </programlisting>
1080     (See also <xref linkend="queries-where">.)
1081    </para>
1082
1083    <para>
1084     If an arbitrary value expression is used in the select list, it
1085     conceptually adds a new virtual column to the returned table.  The
1086     value expression is evaluated once for each result row, with
1087     the row's values substituted for any column references.  But the
1088     expressions in the select list do not have to reference any
1089     columns in the table expression of the <literal>FROM</> clause;
1090     they can be constant arithmetic expressions, for instance.
1091    </para>
1092   </sect2>
1093
1094   <sect2 id="queries-column-labels">
1095    <title>Column Labels</title>
1096
1097    <indexterm zone="queries-column-labels">
1098     <primary>alias</primary>
1099     <secondary>in the select list</secondary>
1100    </indexterm>
1101
1102    <para>
1103     The entries in the select list can be assigned names for subsequent
1104     processing, such as for use in an <literal>ORDER BY</> clause
1105     or for display by the client application.  For example:
1106 <programlisting>
1107 SELECT a AS value, b + c AS sum FROM ...
1108 </programlisting>
1109    </para>
1110
1111    <para>
1112     If no output column name is specified using <literal>AS</>,
1113     the system assigns a default column name.  For simple column references,
1114     this is the name of the referenced column.  For function
1115     calls, this is the name of the function.  For complex expressions,
1116     the system will generate a generic name.
1117    </para>
1118
1119    <para>
1120     The <literal>AS</> keyword is optional, but only if the new column
1121     name does not match any
1122     <productname>PostgreSQL</productname> keyword (see <xref
1123     linkend="sql-keywords-appendix">).  To avoid an accidental match to
1124     a keyword, you can double-quote the column name.  For example,
1125     <literal>VALUE</> is a keyword, so this does not work:
1126 <programlisting>
1127 SELECT a value, b + c AS sum FROM ...
1128 </programlisting>
1129     but this does:
1130 <programlisting>
1131 SELECT a "value", b + c AS sum FROM ...
1132 </programlisting>
1133     For protection against possible
1134     future keyword additions, it is recommended that you always either
1135     write <literal>AS</literal> or double-quote the output column name.
1136    </para>
1137
1138    <note>
1139     <para>
1140      The naming of output columns here is different from that done in
1141      the <literal>FROM</> clause (see <xref
1142      linkend="queries-table-aliases">).  It is possible
1143      to rename the same column twice, but the name assigned in
1144      the select list is the one that will be passed on.
1145     </para>
1146    </note>
1147   </sect2>
1148
1149   <sect2 id="queries-distinct">
1150    <title><literal>DISTINCT</literal></title>
1151
1152    <indexterm zone="queries-distinct">
1153     <primary>DISTINCT</primary>
1154    </indexterm>
1155
1156    <indexterm zone="queries-distinct">
1157     <primary>duplicates</primary>
1158    </indexterm>
1159
1160    <para>
1161     After the select list has been processed, the result table can
1162     optionally be subject to the elimination of duplicate rows.  The
1163     <literal>DISTINCT</literal> key word is written directly after
1164     <literal>SELECT</literal> to specify this:
1165 <synopsis>
1166 SELECT DISTINCT <replaceable>select_list</replaceable> ...
1167 </synopsis>
1168     (Instead of <literal>DISTINCT</> the key word <literal>ALL</literal>
1169     can be used to specify the default behavior of retaining all rows.)
1170    </para>
1171
1172    <para>
1173     <indexterm><primary>null value</><secondary sortas="DISTINCT">in
1174     DISTINCT</></indexterm>
1175     Obviously, two rows are considered distinct if they differ in at
1176     least one column value.  Null values are considered equal in this
1177     comparison.
1178    </para>
1179
1180    <para>
1181     Alternatively, an arbitrary expression can determine what rows are
1182     to be considered distinct:
1183 <synopsis>
1184 SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
1185 </synopsis>
1186     Here <replaceable>expression</replaceable> is an arbitrary value
1187     expression that is evaluated for all rows.  A set of rows for
1188     which all the expressions are equal are considered duplicates, and
1189     only the first row of the set is kept in the output.  Note that
1190     the <quote>first row</quote> of a set is unpredictable unless the
1191     query is sorted on enough columns to guarantee a unique ordering
1192     of the rows arriving at the <literal>DISTINCT</> filter.
1193     (<literal>DISTINCT ON</> processing occurs after <literal>ORDER
1194     BY</> sorting.)
1195    </para>
1196
1197    <para>
1198     The <literal>DISTINCT ON</> clause is not part of the SQL standard
1199     and is sometimes considered bad style because of the potentially
1200     indeterminate nature of its results.  With judicious use of
1201     <literal>GROUP BY</> and subqueries in <literal>FROM</>, this
1202     construct can be avoided, but it is often the most convenient
1203     alternative.
1204    </para>
1205   </sect2>
1206  </sect1>
1207
1208
1209  <sect1 id="queries-union">
1210   <title>Combining Queries</title>
1211
1212   <indexterm zone="queries-union">
1213    <primary>UNION</primary>
1214   </indexterm>
1215   <indexterm zone="queries-union">
1216    <primary>INTERSECT</primary>
1217   </indexterm>
1218   <indexterm zone="queries-union">
1219    <primary>EXCEPT</primary>
1220   </indexterm>
1221   <indexterm zone="queries-union">
1222    <primary>set union</primary>
1223   </indexterm>
1224   <indexterm zone="queries-union">
1225    <primary>set intersection</primary>
1226   </indexterm>
1227   <indexterm zone="queries-union">
1228    <primary>set difference</primary>
1229   </indexterm>
1230   <indexterm zone="queries-union">
1231    <primary>set operation</primary>
1232   </indexterm>
1233
1234   <para>
1235    The results of two queries can be combined using the set operations
1236    union, intersection, and difference.  The syntax is
1237 <synopsis>
1238 <replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
1239 <replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
1240 <replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
1241 </synopsis>
1242    <replaceable>query1</replaceable> and
1243    <replaceable>query2</replaceable> are queries that can use any of
1244    the features discussed up to this point.  Set operations can also
1245    be nested and chained, for example
1246 <synopsis>
1247 <replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
1248 </synopsis>
1249    which is executed as:
1250 <synopsis>
1251 (<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
1252 </synopsis>
1253   </para>
1254
1255   <para>
1256    <literal>UNION</> effectively appends the result of
1257    <replaceable>query2</replaceable> to the result of
1258    <replaceable>query1</replaceable> (although there is no guarantee
1259    that this is the order in which the rows are actually returned).
1260    Furthermore, it eliminates duplicate rows from its result, in the same
1261    way as <literal>DISTINCT</>, unless <literal>UNION ALL</> is used.
1262   </para>
1263
1264   <para>
1265    <literal>INTERSECT</> returns all rows that are both in the result
1266    of <replaceable>query1</replaceable> and in the result of
1267    <replaceable>query2</replaceable>.  Duplicate rows are eliminated
1268    unless <literal>INTERSECT ALL</> is used.
1269   </para>
1270
1271   <para>
1272    <literal>EXCEPT</> returns all rows that are in the result of
1273    <replaceable>query1</replaceable> but not in the result of
1274    <replaceable>query2</replaceable>.  (This is sometimes called the
1275    <firstterm>difference</> between two queries.)  Again, duplicates
1276    are eliminated unless <literal>EXCEPT ALL</> is used.
1277   </para>
1278
1279   <para>
1280    In order to calculate the union, intersection, or difference of two
1281    queries, the two queries must be <quote>union compatible</quote>,
1282    which means that they return the same number of columns and
1283    the corresponding columns have compatible data types, as
1284    described in <xref linkend="typeconv-union-case">.
1285   </para>
1286  </sect1>
1287
1288
1289  <sect1 id="queries-order">
1290   <title>Sorting Rows</title>
1291
1292   <indexterm zone="queries-order">
1293    <primary>sorting</primary>
1294   </indexterm>
1295
1296   <indexterm zone="queries-order">
1297    <primary>ORDER BY</primary>
1298   </indexterm>
1299
1300   <para>
1301    After a query has produced an output table (after the select list
1302    has been processed) it can optionally be sorted.  If sorting is not
1303    chosen, the rows will be returned in an unspecified order.  The actual
1304    order in that case will depend on the scan and join plan types and
1305    the order on disk, but it must not be relied on.  A particular
1306    output ordering can only be guaranteed if the sort step is explicitly
1307    chosen.
1308   </para>
1309
1310   <para>
1311    The <literal>ORDER BY</> clause specifies the sort order:
1312 <synopsis>
1313 SELECT <replaceable>select_list</replaceable>
1314     FROM <replaceable>table_expression</replaceable>
1315     ORDER BY <replaceable>sort_expression1</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional>
1316              <optional>, <replaceable>sort_expression2</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional> ...</optional>
1317 </synopsis>
1318    The sort expression(s) can be any expression that would be valid in the
1319    query's select list.  An example is:
1320 <programlisting>
1321 SELECT a, b FROM table1 ORDER BY a + b, c;
1322 </programlisting>
1323    When more than one expression is specified,
1324    the later values are used to sort rows that are equal according to the
1325    earlier values.  Each expression can be followed by an optional
1326    <literal>ASC</> or <literal>DESC</> keyword to set the sort direction to
1327    ascending or descending.  <literal>ASC</> order is the default.
1328    Ascending order puts smaller values first, where
1329    <quote>smaller</quote> is defined in terms of the
1330    <literal>&lt;</literal> operator.  Similarly, descending order is
1331    determined with the <literal>&gt;</literal> operator.
1332     <footnote>
1333      <para>
1334       Actually, <productname>PostgreSQL</> uses the <firstterm>default B-tree
1335       operator class</> for the expression's data type to determine the sort
1336       ordering for <literal>ASC</> and <literal>DESC</>.  Conventionally,
1337       data types will be set up so that the <literal>&lt;</literal> and
1338       <literal>&gt;</literal> operators correspond to this sort ordering,
1339       but a user-defined data type's designer could choose to do something
1340       different.
1341      </para>
1342     </footnote>
1343   </para>
1344
1345   <para>
1346    The <literal>NULLS FIRST</> and <literal>NULLS LAST</> options can be
1347    used to determine whether nulls appear before or after non-null values
1348    in the sort ordering.  By default, null values sort as if larger than any
1349    non-null value; that is, <literal>NULLS FIRST</> is the default for
1350    <literal>DESC</> order, and <literal>NULLS LAST</> otherwise.
1351   </para>
1352
1353   <para>
1354    Note that the ordering options are considered independently for each
1355    sort column.  For example <literal>ORDER BY x, y DESC</> means
1356    <literal>ORDER BY x ASC, y DESC</>, which is not the same as
1357    <literal>ORDER BY x DESC, y DESC</>.
1358   </para>
1359
1360   <para>
1361    A <replaceable>sort_expression</> can also be the column label or number
1362    of an output column, as in:
1363 <programlisting>
1364 SELECT a + b AS sum, c FROM table1 ORDER BY sum;
1365 SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
1366 </programlisting>
1367    both of which sort by the first output column.  Note that an output
1368    column name has to stand alone, that is, it cannot be used in an expression
1369    &mdash; for example, this is <emphasis>not</> correct:
1370 <programlisting>
1371 SELECT a + b AS sum, c FROM table1 ORDER BY sum + c;          -- wrong
1372 </programlisting>
1373    This restriction is made to reduce ambiguity.  There is still
1374    ambiguity if an <literal>ORDER BY</> item is a simple name that
1375    could match either an output column name or a column from the table
1376    expression.  The output column is used in such cases.  This would
1377    only cause confusion if you use <literal>AS</> to rename an output
1378    column to match some other table column's name.
1379   </para>
1380
1381   <para>
1382    <literal>ORDER BY</> can be applied to the result of a
1383    <literal>UNION</>, <literal>INTERSECT</>, or <literal>EXCEPT</>
1384    combination, but in this case it is only permitted to sort by
1385    output column names or numbers, not by expressions.
1386   </para>
1387  </sect1>
1388
1389
1390  <sect1 id="queries-limit">
1391   <title><literal>LIMIT</literal> and <literal>OFFSET</literal></title>
1392
1393   <indexterm zone="queries-limit">
1394    <primary>LIMIT</primary>
1395   </indexterm>
1396
1397   <indexterm zone="queries-limit">
1398    <primary>OFFSET</primary>
1399   </indexterm>
1400
1401   <para>
1402    <literal>LIMIT</> and <literal>OFFSET</> allow you to retrieve just
1403    a portion of the rows that are generated by the rest of the query:
1404 <synopsis>
1405 SELECT <replaceable>select_list</replaceable>
1406     FROM <replaceable>table_expression</replaceable>
1407     <optional> ORDER BY ... </optional>
1408     <optional> LIMIT { <replaceable>number</replaceable> | ALL } </optional> <optional> OFFSET <replaceable>number</replaceable> </optional>
1409 </synopsis>
1410   </para>
1411
1412   <para>
1413    If a limit count is given, no more than that many rows will be
1414    returned (but possibly less, if the query itself yields less rows).
1415    <literal>LIMIT ALL</> is the same as omitting the <literal>LIMIT</>
1416    clause.
1417   </para>
1418
1419   <para>
1420    <literal>OFFSET</> says to skip that many rows before beginning to
1421    return rows.  <literal>OFFSET 0</> is the same as omitting the
1422    <literal>OFFSET</> clause, and <literal>LIMIT NULL</> is the same
1423    as omitting the <literal>LIMIT</> clause.  If both <literal>OFFSET</>
1424    and <literal>LIMIT</> appear, then <literal>OFFSET</> rows are
1425    skipped before starting to count the <literal>LIMIT</> rows that
1426    are returned.
1427   </para>
1428
1429   <para>
1430    When using <literal>LIMIT</>, it is important to use an
1431    <literal>ORDER BY</> clause that constrains the result rows into a
1432    unique order.  Otherwise you will get an unpredictable subset of
1433    the query's rows. You might be asking for the tenth through
1434    twentieth rows, but tenth through twentieth in what ordering? The
1435    ordering is unknown, unless you specified <literal>ORDER BY</>.
1436   </para>
1437
1438   <para>
1439    The query optimizer takes <literal>LIMIT</> into account when
1440    generating query plans, so you are very likely to get different
1441    plans (yielding different row orders) depending on what you give
1442    for <literal>LIMIT</> and <literal>OFFSET</>.  Thus, using
1443    different <literal>LIMIT</>/<literal>OFFSET</> values to select
1444    different subsets of a query result <emphasis>will give
1445    inconsistent results</emphasis> unless you enforce a predictable
1446    result ordering with <literal>ORDER BY</>.  This is not a bug; it
1447    is an inherent consequence of the fact that SQL does not promise to
1448    deliver the results of a query in any particular order unless
1449    <literal>ORDER BY</> is used to constrain the order.
1450   </para>
1451
1452   <para>
1453    The rows skipped by an <literal>OFFSET</> clause still have to be
1454    computed inside the server; therefore a large <literal>OFFSET</>
1455    might be inefficient.
1456   </para>
1457  </sect1>
1458
1459
1460  <sect1 id="queries-values">
1461   <title><literal>VALUES</literal> Lists</title>
1462
1463   <indexterm zone="queries-values">
1464    <primary>VALUES</primary>
1465   </indexterm>
1466
1467   <para>
1468    <literal>VALUES</> provides a way to generate a <quote>constant table</>
1469    that can be used in a query without having to actually create and populate
1470    a table on-disk.  The syntax is
1471 <synopsis>
1472 VALUES ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) [, ...]
1473 </synopsis>
1474    Each parenthesized list of expressions generates a row in the table.
1475    The lists must all have the same number of elements (i.e., the number
1476    of columns in the table), and corresponding entries in each list must
1477    have compatible data types.  The actual data type assigned to each column
1478    of the result is determined using the same rules as for <literal>UNION</>
1479    (see <xref linkend="typeconv-union-case">).
1480   </para>
1481
1482   <para>
1483    As an example:
1484 <programlisting>
1485 VALUES (1, 'one'), (2, 'two'), (3, 'three');
1486 </programlisting>
1487
1488    will return a table of two columns and three rows.  It's effectively
1489    equivalent to:
1490 <programlisting>
1491 SELECT 1 AS column1, 'one' AS column2
1492 UNION ALL
1493 SELECT 2, 'two'
1494 UNION ALL
1495 SELECT 3, 'three';
1496 </programlisting>
1497
1498    By default, <productname>PostgreSQL</productname> assigns the names
1499    <literal>column1</>, <literal>column2</>, etc. to the columns of a
1500    <literal>VALUES</> table.  The column names are not specified by the
1501    SQL standard and different database systems do it differently, so
1502    it's usually better to override the default names with a table alias
1503    list.
1504   </para>
1505
1506   <para>
1507    Syntactically, <literal>VALUES</> followed by expression lists is
1508    treated as equivalent to:
1509 <synopsis>
1510 SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable>
1511 </synopsis>
1512    and can appear anywhere a <literal>SELECT</> can.  For example, you can
1513    use it as part of a <literal>UNION</>, or attach a
1514    <replaceable>sort_specification</replaceable> (<literal>ORDER BY</>,
1515    <literal>LIMIT</>, and/or <literal>OFFSET</>) to it.  <literal>VALUES</>
1516    is most commonly used as the data source in an <command>INSERT</> command,
1517    and next most commonly as a subquery.
1518   </para>
1519
1520   <para>
1521    For more information see <xref linkend="sql-values"
1522    endterm="sql-values-title">.
1523   </para>
1524
1525  </sect1>
1526
1527
1528  <sect1 id="queries-with">
1529   <title><literal>WITH</literal> Queries</title>
1530
1531   <indexterm zone="queries-with">
1532    <primary>WITH</primary>
1533    <secondary>in SELECT</secondary>
1534   </indexterm>
1535
1536   <indexterm>
1537    <primary>common table expression</primary>
1538    <see>WITH</see>
1539   </indexterm>
1540
1541   <para>
1542    <literal>WITH</> provides a way to write subqueries for use in a larger
1543    <literal>SELECT</> query.  The subqueries can be thought of as defining
1544    temporary tables that exist just for this query.  One use of this feature
1545    is to break down complicated queries into simpler parts.  An example is:
1546
1547 <programlisting>
1548 WITH regional_sales AS (
1549         SELECT region, SUM(amount) AS total_sales
1550         FROM orders
1551         GROUP BY region
1552      ), top_regions AS (
1553         SELECT region
1554         FROM regional_sales
1555         WHERE total_sales &gt; (SELECT SUM(total_sales)/10 FROM regional_sales)
1556      )
1557 SELECT region,
1558        product,
1559        SUM(quantity) AS product_units,
1560        SUM(amount) AS product_sales
1561 FROM orders
1562 WHERE region IN (SELECT region FROM top_regions)
1563 GROUP BY region, product;
1564 </programlisting>
1565
1566    which displays per-product sales totals in only the top sales regions.
1567    This example could have been written without <literal>WITH</>,
1568    but we'd have needed two levels of nested sub-SELECTs.  It's a bit
1569    easier to follow this way.
1570   </para>
1571
1572   <para>
1573    The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
1574    from a mere syntactic convenience into a feature that accomplishes
1575    things not otherwise possible in standard SQL.  Using
1576    <literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
1577    output.  A very simple example is this query to sum the integers from 1
1578    through 100:
1579
1580 <programlisting>
1581 WITH RECURSIVE t(n) AS (
1582     VALUES (1)
1583   UNION ALL
1584     SELECT n+1 FROM t WHERE n &lt; 100
1585 )
1586 SELECT sum(n) FROM t;
1587 </programlisting>
1588
1589    The general form of a recursive <literal>WITH</> query is always a
1590    <firstterm>non-recursive term</>, then <literal>UNION</> (or
1591    <literal>UNION ALL</>), then a
1592    <firstterm>recursive term</>, where only the recursive term can contain
1593    a reference to the query's own output.  Such a query is executed as
1594    follows:
1595   </para>
1596
1597   <procedure>
1598    <title>Recursive Query Evaluation</title>
1599
1600    <step performance="required">
1601     <para>
1602      Evaluate the non-recursive term.  For <literal>UNION</> (but not
1603      <literal>UNION ALL</>), discard duplicate rows.  Include all remaining
1604      rows in the result of the recursive query, and also place them in a
1605      temporary <firstterm>working table</>.
1606     </para>
1607    </step>
1608
1609    <step performance="required">
1610     <para>
1611      So long as the working table is not empty, repeat these steps:
1612     </para>
1613     <substeps>
1614      <step performance="required">
1615       <para>
1616        Evaluate the recursive term, substituting the current contents of
1617        the working table for the recursive self-reference.
1618        For <literal>UNION</> (but not <literal>UNION ALL</>), discard
1619        duplicate rows and rows that duplicate any previous result row.
1620        Include all remaining rows in the result of the recursive query, and
1621        also place them in a temporary <firstterm>intermediate table</>.
1622       </para>
1623      </step>
1624
1625      <step performance="required">
1626       <para>
1627        Replace the contents of the working table with the contents of the
1628        intermediate table, then empty the intermediate table.
1629       </para>
1630      </step>
1631     </substeps>
1632    </step>
1633   </procedure>
1634
1635   <note>
1636    <para>
1637     Strictly speaking, this process is iteration not recursion, but
1638     <literal>RECURSIVE</> is the terminology chosen by the SQL standards
1639     committee.
1640    </para>
1641   </note>
1642
1643   <para>
1644    In the example above, the working table has just a single row in each step,
1645    and it takes on the values from 1 through 100 in successive steps.  In
1646    the 100th step, there is no output because of the <literal>WHERE</>
1647    clause, and so the query terminates.
1648   </para>
1649
1650   <para>
1651    Recursive queries are typically used to deal with hierarchical or
1652    tree-structured data.  A useful example is this query to find all the
1653    direct and indirect sub-parts of a product, given only a table that
1654    shows immediate inclusions:
1655
1656 <programlisting>
1657 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
1658     SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
1659   UNION ALL
1660     SELECT p.sub_part, p.part, p.quantity
1661     FROM included_parts pr, parts p
1662     WHERE p.part = pr.sub_part
1663   )
1664 SELECT sub_part, SUM(quantity) as total_quantity
1665 FROM included_parts
1666 GROUP BY sub_part
1667 </programlisting>
1668   </para>
1669
1670   <para>
1671    When working with recursive queries it is important to be sure that
1672    the recursive part of the query will eventually return no tuples,
1673    or else the query will loop indefinitely.  Sometimes, using
1674    <literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
1675    by discarding rows that duplicate previous output rows.  However, often a
1676    cycle does not involve output rows that are completely duplicate: it may be
1677    necessary to check just one or a few fields to see if the same point has
1678    been reached before.  The standard method for handling such situations is
1679    to compute an array of the already-visited values.  For example, consider
1680    the following query that searches a table <structname>graph</> using a
1681    <structfield>link</> field:
1682
1683 <programlisting>
1684 WITH RECURSIVE search_graph(id, link, data, depth) AS (
1685         SELECT g.id, g.link, g.data, 1
1686         FROM graph g
1687       UNION ALL
1688         SELECT g.id, g.link, g.data, sg.depth + 1
1689         FROM graph g, search_graph sg
1690         WHERE g.id = sg.link
1691 )
1692 SELECT * FROM search_graph;
1693 </programlisting>
1694
1695    This query will loop if the <structfield>link</> relationships contain
1696    cycles.  Because we require a <quote>depth</> output, just changing
1697    <literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
1698    Instead we need to recognize whether we have reached the same row again
1699    while following a particular path of links.  We add two columns
1700    <structfield>path</> and <structfield>cycle</> to the loop-prone query:
1701
1702 <programlisting>
1703 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1704         SELECT g.id, g.link, g.data, 1,
1705           ARRAY[g.id],
1706           false
1707         FROM graph g
1708       UNION ALL
1709         SELECT g.id, g.link, g.data, sg.depth + 1,
1710           path || g.id,
1711           g.id = ANY(path)
1712         FROM graph g, search_graph sg
1713         WHERE g.id = sg.link AND NOT cycle
1714 )
1715 SELECT * FROM search_graph;
1716 </programlisting>
1717
1718    Aside from preventing cycles, the array value is often useful in its own
1719    right as representing the <quote>path</> taken to reach any particular row.
1720   </para>
1721
1722   <para>
1723    In the general case where more than one field needs to be checked to
1724    recognize a cycle, use an array of rows.  For example, if we needed to
1725    compare fields <structfield>f1</> and <structfield>f2</>:
1726
1727 <programlisting>
1728 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1729         SELECT g.id, g.link, g.data, 1,
1730           ARRAY[ROW(g.f1, g.f2)],
1731           false
1732         FROM graph g
1733       UNION ALL
1734         SELECT g.id, g.link, g.data, sg.depth + 1,
1735           path || ROW(g.f1, g.f2),
1736           ROW(g.f1, g.f2) = ANY(path)
1737         FROM graph g, search_graph sg
1738         WHERE g.id = sg.link AND NOT cycle
1739 )
1740 SELECT * FROM search_graph;
1741 </programlisting>
1742   </para>
1743
1744   <tip>
1745    <para>
1746     Omit the <literal>ROW()</> syntax in the common case where only one field
1747     needs to be checked to recognize a cycle.  This allows a simple array
1748     rather than a composite-type array to be used, gaining efficiency.
1749    </para>
1750   </tip>
1751
1752   <tip>
1753    <para>
1754     The recursive query evaluation algorithm produces its output in
1755     breadth-first search order.  You can display the results in depth-first
1756     search order by making the outer query <literal>ORDER BY</> a
1757     <quote>path</> column constructed in this way.
1758    </para>
1759   </tip>
1760
1761   <para>
1762    A helpful trick for testing queries
1763    when you are not certain if they might loop is to place a <literal>LIMIT</>
1764    in the parent query.  For example, this query would loop forever without
1765    the <literal>LIMIT</>:
1766
1767 <programlisting>
1768 WITH RECURSIVE t(n) AS (
1769     SELECT 1
1770   UNION ALL
1771     SELECT n+1 FROM t
1772 )
1773 SELECT n FROM t LIMIT 100;
1774 </programlisting>
1775
1776    This works because <productname>PostgreSQL</productname>'s implementation
1777    evaluates only as many rows of a <literal>WITH</> query as are actually
1778    fetched by the parent query.  Using this trick in production is not
1779    recommended, because other systems might work differently.  Also, it
1780    usually won't work if you make the outer query sort the recursive query's
1781    results or join them to some other table.
1782   </para>
1783
1784   <para>
1785    A useful property of <literal>WITH</> queries is that they are evaluated
1786    only once per execution of the parent query, even if they are referred to
1787    more than once by the parent query or sibling <literal>WITH</> queries.
1788    Thus, expensive calculations that are needed in multiple places can be
1789    placed within a <literal>WITH</> query to avoid redundant work.  Another
1790    possible application is to prevent unwanted multiple evaluations of
1791    functions with side-effects.
1792    However, the other side of this coin is that the optimizer is less able to
1793    push restrictions from the parent query down into a <literal>WITH</> query
1794    than an ordinary sub-query.  The <literal>WITH</> query will generally be
1795    evaluated as stated, without suppression of rows that the parent query
1796    might discard afterwards.  (But, as mentioned above, evaluation might stop
1797    early if the reference(s) to the query demand only a limited number of
1798    rows.)
1799   </para>
1800
1801  </sect1>
1802
1803 </chapter>