Fix xslt_process() to ensure that it inserts a NULL terminator after the
[PostgreSQL.git] / doc / src / sgml / queries.sgml
blob1be9a8b3e3ce6ef73cdcd6dd8f14a99606c76b31
1 <!-- $PostgreSQL$ -->
3 <chapter id="queries">
4 <title>Queries</title>
6 <indexterm zone="queries">
7 <primary>query</primary>
8 </indexterm>
10 <indexterm zone="queries">
11 <primary>SELECT</primary>
12 </indexterm>
14 <para>
15 The previous chapters explained how to create tables, how to fill
16 them with data, and how to manipulate that data. Now we finally
17 discuss how to retrieve the data from the database.
18 </para>
21 <sect1 id="queries-overview">
22 <title>Overview</title>
24 <para>
25 The process of retrieving or the command to retrieve data from a
26 database is called a <firstterm>query</firstterm>. In SQL the
27 <xref linkend="sql-select" endterm="sql-select-title"> command is
28 used to specify queries. The general syntax of the
29 <command>SELECT</command> command is
30 <synopsis>
31 <optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
32 </synopsis>
33 The following sections describe the details of the select list, the
34 table expression, and the sort specification. <literal>WITH</>
35 queries are treated last since they are an advanced feature.
36 </para>
38 <para>
39 A simple kind of query has the form:
40 <programlisting>
41 SELECT * FROM table1;
42 </programlisting>
43 Assuming that there is a table called <literal>table1</literal>,
44 this command would retrieve all rows and all columns from
45 <literal>table1</literal>. (The method of retrieval depends on the
46 client application. For example, the
47 <application>psql</application> program will display an ASCII-art
48 table on the screen, while client libraries will offer functions to
49 extract individual values from the query result.) The select list
50 specification <literal>*</literal> means all columns that the table
51 expression happens to provide. A select list can also select a
52 subset of the available columns or make calculations using the
53 columns. For example, if
54 <literal>table1</literal> has columns named <literal>a</>,
55 <literal>b</>, and <literal>c</> (and perhaps others) you can make
56 the following query:
57 <programlisting>
58 SELECT a, b + c FROM table1;
59 </programlisting>
60 (assuming that <literal>b</> and <literal>c</> are of a numerical
61 data type).
62 See <xref linkend="queries-select-lists"> for more details.
63 </para>
65 <para>
66 <literal>FROM table1</literal> is a simple kind of
67 table expression: it reads just one table. In general, table
68 expressions can be complex constructs of base tables, joins, and
69 subqueries. But you can also omit the table expression entirely and
70 use the <command>SELECT</command> command as a calculator:
71 <programlisting>
72 SELECT 3 * 4;
73 </programlisting>
74 This is more useful if the expressions in the select list return
75 varying results. For example, you could call a function this way:
76 <programlisting>
77 SELECT random();
78 </programlisting>
79 </para>
80 </sect1>
83 <sect1 id="queries-table-expressions">
84 <title>Table Expressions</title>
86 <indexterm zone="queries-table-expressions">
87 <primary>table expression</primary>
88 </indexterm>
90 <para>
91 A <firstterm>table expression</firstterm> computes a table. The
92 table expression contains a <literal>FROM</> clause that is
93 optionally followed by <literal>WHERE</>, <literal>GROUP BY</>, and
94 <literal>HAVING</> clauses. Trivial table expressions simply refer
95 to a table on disk, a so-called base table, but more complex
96 expressions can be used to modify or combine base tables in various
97 ways.
98 </para>
100 <para>
101 The optional <literal>WHERE</>, <literal>GROUP BY</>, and
102 <literal>HAVING</> clauses in the table expression specify a
103 pipeline of successive transformations performed on the table
104 derived in the <literal>FROM</> clause. All these transformations
105 produce a virtual table that provides the rows that are passed to
106 the select list to compute the output rows of the query.
107 </para>
109 <sect2 id="queries-from">
110 <title>The <literal>FROM</literal> Clause</title>
112 <para>
113 The <xref linkend="sql-from" endterm="sql-from-title"> derives a
114 table from one or more other tables given in a comma-separated
115 table reference list.
116 <synopsis>
117 FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
118 </synopsis>
120 A table reference can be a table name (possibly schema-qualified),
121 or a derived table such as a subquery, a table join, or complex
122 combinations of these. If more than one table reference is listed
123 in the <literal>FROM</> clause they are cross-joined (see below)
124 to form the intermediate virtual table that can then be subject to
125 transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
126 and <literal>HAVING</> clauses and is finally the result of the
127 overall table expression.
128 </para>
130 <indexterm>
131 <primary>ONLY</primary>
132 </indexterm>
134 <para>
135 When a table reference names a table that is the parent of a
136 table inheritance hierarchy, the table reference produces rows of
137 not only that table but all of its descendant tables, unless the
138 key word <literal>ONLY</> precedes the table name. However, the
139 reference produces only the columns that appear in the named table
140 &mdash; any columns added in subtables are ignored.
141 </para>
143 <sect3 id="queries-join">
144 <title>Joined Tables</title>
146 <indexterm zone="queries-join">
147 <primary>join</primary>
148 </indexterm>
150 <para>
151 A joined table is a table derived from two other (real or
152 derived) tables according to the rules of the particular join
153 type. Inner, outer, and cross-joins are available.
154 </para>
156 <variablelist>
157 <title>Join Types</title>
159 <varlistentry>
160 <term>Cross join</term>
162 <indexterm>
163 <primary>join</primary>
164 <secondary>cross</secondary>
165 </indexterm>
167 <indexterm>
168 <primary>cross join</primary>
169 </indexterm>
171 <listitem>
172 <synopsis>
173 <replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
174 </synopsis>
176 <para>
177 For every possible combination of rows from
178 <replaceable>T1</replaceable> and
179 <replaceable>T2</replaceable> (i.e., a Cartesian product),
180 the joined table will contain a
181 row consisting of all columns in <replaceable>T1</replaceable>
182 followed by all columns in <replaceable>T2</replaceable>. If
183 the tables have N and M rows respectively, the joined
184 table will have N * M rows.
185 </para>
187 <para>
188 <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
189 <replaceable>T2</replaceable></literal> is equivalent to
190 <literal>FROM <replaceable>T1</replaceable>,
191 <replaceable>T2</replaceable></literal>. It is also equivalent to
192 <literal>FROM <replaceable>T1</replaceable> INNER JOIN
193 <replaceable>T2</replaceable> ON TRUE</literal> (see below).
194 </para>
195 </listitem>
196 </varlistentry>
198 <varlistentry>
199 <term>Qualified joins</term>
201 <indexterm>
202 <primary>join</primary>
203 <secondary>outer</secondary>
204 </indexterm>
206 <indexterm>
207 <primary>outer join</primary>
208 </indexterm>
210 <listitem>
211 <synopsis>
212 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
213 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
214 <replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
215 </synopsis>
217 <para>
218 The words <literal>INNER</literal> and
219 <literal>OUTER</literal> are optional in all forms.
220 <literal>INNER</literal> is the default;
221 <literal>LEFT</literal>, <literal>RIGHT</literal>, and
222 <literal>FULL</literal> imply an outer join.
223 </para>
225 <para>
226 The <firstterm>join condition</firstterm> is specified in the
227 <literal>ON</> or <literal>USING</> clause, or implicitly by
228 the word <literal>NATURAL</>. The join condition determines
229 which rows from the two source tables are considered to
230 <quote>match</quote>, as explained in detail below.
231 </para>
233 <para>
234 The <literal>ON</> clause is the most general kind of join
235 condition: it takes a Boolean value expression of the same
236 kind as is used in a <literal>WHERE</> clause. A pair of rows
237 from <replaceable>T1</> and <replaceable>T2</> match if the
238 <literal>ON</> expression evaluates to true for them.
239 </para>
241 <para>
242 <literal>USING</> is a shorthand notation: it takes a
243 comma-separated list of column names, which the joined tables
244 must have in common, and forms a join condition specifying
245 equality of each of these pairs of columns. Furthermore, the
246 output of <literal>JOIN USING</> has one column for each of
247 the equated pairs of input columns, followed by the
248 remaining columns from each table. Thus, <literal>USING (a, b,
249 c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
250 t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
251 if <literal>ON</> is used there will be two columns
252 <literal>a</>, <literal>b</>, and <literal>c</> in the result,
253 whereas with <literal>USING</> there will be only one of each
254 (and they will appear first if <command>SELECT *</> is used).
255 </para>
257 <para>
258 <indexterm>
259 <primary>join</primary>
260 <secondary>natural</secondary>
261 </indexterm>
262 <indexterm>
263 <primary>natural join</primary>
264 </indexterm>
265 Finally, <literal>NATURAL</> is a shorthand form of
266 <literal>USING</>: it forms a <literal>USING</> list
267 consisting of all column names that appear in both
268 input tables. As with <literal>USING</>, these columns appear
269 only once in the output table.
270 </para>
272 <para>
273 The possible types of qualified join are:
275 <variablelist>
276 <varlistentry>
277 <term><literal>INNER JOIN</></term>
279 <listitem>
280 <para>
281 For each row R1 of T1, the joined table has a row for each
282 row in T2 that satisfies the join condition with R1.
283 </para>
284 </listitem>
285 </varlistentry>
287 <varlistentry>
288 <term><literal>LEFT OUTER JOIN</></term>
290 <indexterm>
291 <primary>join</primary>
292 <secondary>left</secondary>
293 </indexterm>
295 <indexterm>
296 <primary>left join</primary>
297 </indexterm>
299 <listitem>
300 <para>
301 First, an inner join is performed. Then, for each row in
302 T1 that does not satisfy the join condition with any row in
303 T2, a joined row is added with null values in columns of
304 T2. Thus, the joined table always has at least
305 one row for each row in T1.
306 </para>
307 </listitem>
308 </varlistentry>
310 <varlistentry>
311 <term><literal>RIGHT OUTER JOIN</></term>
313 <indexterm>
314 <primary>join</primary>
315 <secondary>right</secondary>
316 </indexterm>
318 <indexterm>
319 <primary>right join</primary>
320 </indexterm>
322 <listitem>
323 <para>
324 First, an inner join is performed. Then, for each row in
325 T2 that does not satisfy the join condition with any row in
326 T1, a joined row is added with null values in columns of
327 T1. This is the converse of a left join: the result table
328 will always have a row for each row in T2.
329 </para>
330 </listitem>
331 </varlistentry>
333 <varlistentry>
334 <term><literal>FULL OUTER JOIN</></term>
336 <listitem>
337 <para>
338 First, an inner join is performed. Then, for each row in
339 T1 that does not satisfy the join condition with any row in
340 T2, a joined row is added with null values in columns of
341 T2. Also, for each row of T2 that does not satisfy the
342 join condition with any row in T1, a joined row with null
343 values in the columns of T1 is added.
344 </para>
345 </listitem>
346 </varlistentry>
347 </variablelist>
348 </para>
349 </listitem>
350 </varlistentry>
351 </variablelist>
353 <para>
354 Joins of all types can be chained together or nested: either or
355 both <replaceable>T1</replaceable> and
356 <replaceable>T2</replaceable> can be joined tables. Parentheses
357 can be used around <literal>JOIN</> clauses to control the join
358 order. In the absence of parentheses, <literal>JOIN</> clauses
359 nest left-to-right.
360 </para>
362 <para>
363 To put this together, assume we have tables <literal>t1</literal>:
364 <programlisting>
365 num | name
366 -----+------
367 1 | a
368 2 | b
369 3 | c
370 </programlisting>
371 and <literal>t2</literal>:
372 <programlisting>
373 num | value
374 -----+-------
375 1 | xxx
376 3 | yyy
377 5 | zzz
378 </programlisting>
379 then we get the following results for the various joins:
380 <screen>
381 <prompt>=&gt;</> <userinput>SELECT * FROM t1 CROSS JOIN t2;</>
382 num | name | num | value
383 -----+------+-----+-------
384 1 | a | 1 | xxx
385 1 | a | 3 | yyy
386 1 | a | 5 | zzz
387 2 | b | 1 | xxx
388 2 | b | 3 | yyy
389 2 | b | 5 | zzz
390 3 | c | 1 | xxx
391 3 | c | 3 | yyy
392 3 | c | 5 | zzz
393 (9 rows)
395 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;</>
396 num | name | num | value
397 -----+------+-----+-------
398 1 | a | 1 | xxx
399 3 | c | 3 | yyy
400 (2 rows)
402 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 USING (num);</>
403 num | name | value
404 -----+------+-------
405 1 | a | xxx
406 3 | c | yyy
407 (2 rows)
409 <prompt>=&gt;</> <userinput>SELECT * FROM t1 NATURAL INNER JOIN t2;</>
410 num | name | value
411 -----+------+-------
412 1 | a | xxx
413 3 | c | yyy
414 (2 rows)
416 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;</>
417 num | name | num | value
418 -----+------+-----+-------
419 1 | a | 1 | xxx
420 2 | b | |
421 3 | c | 3 | yyy
422 (3 rows)
424 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 USING (num);</>
425 num | name | value
426 -----+------+-------
427 1 | a | xxx
428 2 | b |
429 3 | c | yyy
430 (3 rows)
432 <prompt>=&gt;</> <userinput>SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;</>
433 num | name | num | value
434 -----+------+-----+-------
435 1 | a | 1 | xxx
436 3 | c | 3 | yyy
437 | | 5 | zzz
438 (3 rows)
440 <prompt>=&gt;</> <userinput>SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;</>
441 num | name | num | value
442 -----+------+-----+-------
443 1 | a | 1 | xxx
444 2 | b | |
445 3 | c | 3 | yyy
446 | | 5 | zzz
447 (4 rows)
448 </screen>
449 </para>
451 <para>
452 The join condition specified with <literal>ON</> can also contain
453 conditions that do not relate directly to the join. This can
454 prove useful for some queries but needs to be thought out
455 carefully. For example:
456 <screen>
457 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';</>
458 num | name | num | value
459 -----+------+-----+-------
460 1 | a | 1 | xxx
461 2 | b | |
462 3 | c | |
463 (3 rows)
464 </screen>
465 Notice that placing the restriction in the <literal>WHERE</> clause
466 produces a different result:
467 <screen>
468 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';</>
469 num | name | num | value
470 -----+------+-----+-------
471 1 | a | 1 | xxx
472 (1 row)
473 </screen>
474 This is because a restriction placed in the <literal>ON</>
475 clause is processed <emphasis>before</> the join, while
476 a restriction placed in the <literal>WHERE</> clause is processed
477 <emphasis>after</> the join.
478 </para>
479 </sect3>
481 <sect3 id="queries-table-aliases">
482 <title>Table and Column Aliases</title>
484 <indexterm zone="queries-table-aliases">
485 <primary>alias</primary>
486 <secondary>in the FROM clause</secondary>
487 </indexterm>
489 <indexterm>
490 <primary>label</primary>
491 <see>alias</see>
492 </indexterm>
494 <para>
495 A temporary name can be given to tables and complex table
496 references to be used for references to the derived table in
497 the rest of the query. This is called a <firstterm>table
498 alias</firstterm>.
499 </para>
501 <para>
502 To create a table alias, write
503 <synopsis>
504 FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
505 </synopsis>
507 <synopsis>
508 FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
509 </synopsis>
510 The <literal>AS</literal> key word is optional noise.
511 <replaceable>alias</replaceable> can be any identifier.
512 </para>
514 <para>
515 A typical application of table aliases is to assign short
516 identifiers to long table names to keep the join clauses
517 readable. For example:
518 <programlisting>
519 SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
520 </programlisting>
521 </para>
523 <para>
524 The alias becomes the new name of the table reference for the
525 current query &mdash; it is no longer possible to refer to the table
526 by the original name. Thus:
527 <programlisting>
528 SELECT * FROM my_table AS m WHERE my_table.a &gt; 5;
529 </programlisting>
530 is not valid according to the SQL standard. In
531 <productname>PostgreSQL</productname> this will draw an error, assuming the
532 <xref linkend="guc-add-missing-from"> configuration variable is
533 <literal>off</> (as it is by default). If it is <literal>on</>,
534 an implicit table reference will be added to the
535 <literal>FROM</literal> clause, so the query is processed as if
536 it were written as:
537 <programlisting>
538 SELECT * FROM my_table AS m, my_table AS my_table WHERE my_table.a &gt; 5;
539 </programlisting>
540 That will result in a cross join, which is usually not what you want.
541 </para>
543 <para>
544 Table aliases are mainly for notational convenience, but it is
545 necessary to use them when joining a table to itself, e.g.:
546 <programlisting>
547 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
548 </programlisting>
549 Additionally, an alias is required if the table reference is a
550 subquery (see <xref linkend="queries-subqueries">).
551 </para>
553 <para>
554 Parentheses are used to resolve ambiguities. In the following example,
555 the first statement assigns the alias <literal>b</literal> to the second
556 instance of <literal>my_table</>, but the second statement assigns the
557 alias to the result of the join:
558 <programlisting>
559 SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
560 SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
561 </programlisting>
562 </para>
564 <para>
565 Another form of table aliasing gives temporary names to the columns of
566 the table, as well as the table itself:
567 <synopsis>
568 FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
569 </synopsis>
570 If fewer column aliases are specified than the actual table has
571 columns, the remaining columns are not renamed. This syntax is
572 especially useful for self-joins or subqueries.
573 </para>
575 <para>
576 When an alias is applied to the output of a <literal>JOIN</>
577 clause, the alias hides the original
578 name(s) within the <literal>JOIN</>. For example:
579 <programlisting>
580 SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
581 </programlisting>
582 is valid SQL, but:
583 <programlisting>
584 SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
585 </programlisting>
586 is not valid; the table alias <literal>a</> is not visible
587 outside the alias <literal>c</>.
588 </para>
589 </sect3>
591 <sect3 id="queries-subqueries">
592 <title>Subqueries</title>
594 <indexterm zone="queries-subqueries">
595 <primary>subquery</primary>
596 </indexterm>
598 <para>
599 Subqueries specifying a derived table must be enclosed in
600 parentheses and <emphasis>must</emphasis> be assigned a table
601 alias name. (See <xref linkend="queries-table-aliases">.) For
602 example:
603 <programlisting>
604 FROM (SELECT * FROM table1) AS alias_name
605 </programlisting>
606 </para>
608 <para>
609 This example is equivalent to <literal>FROM table1 AS
610 alias_name</literal>. More interesting cases, which cannot be
611 reduced to a plain join, arise when the subquery involves
612 grouping or aggregation.
613 </para>
615 <para>
616 A subquery can also be a <command>VALUES</> list:
617 <programlisting>
618 FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
619 AS names(first, last)
620 </programlisting>
621 Again, a table alias is required. Assigning alias names to the columns
622 of the <command>VALUES</> list is optional, but is good practice.
623 For more information see <xref linkend="queries-values">.
624 </para>
625 </sect3>
627 <sect3 id="queries-tablefunctions">
628 <title>Table Functions</title>
630 <indexterm zone="queries-tablefunctions"><primary>table function</></>
632 <indexterm zone="queries-tablefunctions">
633 <primary>function</>
634 <secondary>in the FROM clause</>
635 </indexterm>
637 <para>
638 Table functions are functions that produce a set of rows, made up
639 of either base data types (scalar types) or composite data types
640 (table rows). They are used like a table, view, or subquery in
641 the <literal>FROM</> clause of a query. Columns returned by table
642 functions can be included in <literal>SELECT</>,
643 <literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
644 as a table, view, or subquery column.
645 </para>
647 <para>
648 If a table function returns a base data type, the single result
649 column name matches the function name. If the function returns a
650 composite type, the result columns get the same names as the
651 individual attributes of the type.
652 </para>
654 <para>
655 A table function can be aliased in the <literal>FROM</> clause,
656 but it also can be left unaliased. If a function is used in the
657 <literal>FROM</> clause with no alias, the function name is used
658 as the resulting table name.
659 </para>
661 <para>
662 Some examples:
663 <programlisting>
664 CREATE TABLE foo (fooid int, foosubid int, fooname text);
666 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
667 SELECT * FROM foo WHERE fooid = $1;
668 $$ LANGUAGE SQL;
670 SELECT * FROM getfoo(1) AS t1;
672 SELECT * FROM foo
673 WHERE foosubid IN (
674 SELECT foosubid
675 FROM getfoo(foo.fooid) z
676 WHERE z.fooid = foo.fooid
679 CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
681 SELECT * FROM vw_getfoo;
682 </programlisting>
683 </para>
685 <para>
686 In some cases it is useful to define table functions that can
687 return different column sets depending on how they are invoked.
688 To support this, the table function can be declared as returning
689 the pseudotype <type>record</>. When such a function is used in
690 a query, the expected row structure must be specified in the
691 query itself, so that the system can know how to parse and plan
692 the query. Consider this example:
693 <programlisting>
694 SELECT *
695 FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
696 AS t1(proname name, prosrc text)
697 WHERE proname LIKE 'bytea%';
698 </programlisting>
699 The <literal>dblink</> function executes a remote query (see
700 <filename>contrib/dblink</>). It is declared to return
701 <type>record</> since it might be used for any kind of query.
702 The actual column set must be specified in the calling query so
703 that the parser knows, for example, what <literal>*</> should
704 expand to.
705 </para>
706 </sect3>
707 </sect2>
709 <sect2 id="queries-where">
710 <title>The <literal>WHERE</literal> Clause</title>
712 <indexterm zone="queries-where">
713 <primary>WHERE</primary>
714 </indexterm>
716 <para>
717 The syntax of the <xref linkend="sql-where"
718 endterm="sql-where-title"> is
719 <synopsis>
720 WHERE <replaceable>search_condition</replaceable>
721 </synopsis>
722 where <replaceable>search_condition</replaceable> is any value
723 expression (see <xref linkend="sql-expressions">) that
724 returns a value of type <type>boolean</type>.
725 </para>
727 <para>
728 After the processing of the <literal>FROM</> clause is done, each
729 row of the derived virtual table is checked against the search
730 condition. If the result of the condition is true, the row is
731 kept in the output table, otherwise (i.e., if the result is
732 false or null) it is discarded. The search condition typically
733 references at least one column of the table generated in the
734 <literal>FROM</> clause; this is not required, but otherwise the
735 <literal>WHERE</> clause will be fairly useless.
736 </para>
738 <note>
739 <para>
740 The join condition of an inner join can be written either in
741 the <literal>WHERE</> clause or in the <literal>JOIN</> clause.
742 For example, these table expressions are equivalent:
743 <programlisting>
744 FROM a, b WHERE a.id = b.id AND b.val &gt; 5
745 </programlisting>
746 and:
747 <programlisting>
748 FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
749 </programlisting>
750 or perhaps even:
751 <programlisting>
752 FROM a NATURAL JOIN b WHERE b.val &gt; 5
753 </programlisting>
754 Which one of these you use is mainly a matter of style. The
755 <literal>JOIN</> syntax in the <literal>FROM</> clause is
756 probably not as portable to other SQL database management systems,
757 even though it is in the SQL standard. For
758 outer joins there is no choice: they must be done in
759 the <literal>FROM</> clause. The <literal>ON</> or <literal>USING</>
760 clause of an outer join is <emphasis>not</> equivalent to a
761 <literal>WHERE</> condition, because it results in the addition
762 of rows (for unmatched input rows) as well as the removal of rows
763 in the final result.
764 </para>
765 </note>
767 <para>
768 Here are some examples of <literal>WHERE</literal> clauses:
769 <programlisting>
770 SELECT ... FROM fdt WHERE c1 &gt; 5
772 SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
774 SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
776 SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
778 SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
780 SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 &gt; fdt.c1)
781 </programlisting>
782 <literal>fdt</literal> is the table derived in the
783 <literal>FROM</> clause. Rows that do not meet the search
784 condition of the <literal>WHERE</> clause are eliminated from
785 <literal>fdt</literal>. Notice the use of scalar subqueries as
786 value expressions. Just like any other query, the subqueries can
787 employ complex table expressions. Notice also how
788 <literal>fdt</literal> is referenced in the subqueries.
789 Qualifying <literal>c1</> as <literal>fdt.c1</> is only necessary
790 if <literal>c1</> is also the name of a column in the derived
791 input table of the subquery. But qualifying the column name adds
792 clarity even when it is not needed. This example shows how the column
793 naming scope of an outer query extends into its inner queries.
794 </para>
795 </sect2>
798 <sect2 id="queries-group">
799 <title>The <literal>GROUP BY</literal> and <literal>HAVING</literal> Clauses</title>
801 <indexterm zone="queries-group">
802 <primary>GROUP BY</primary>
803 </indexterm>
805 <indexterm zone="queries-group">
806 <primary>grouping</primary>
807 </indexterm>
809 <para>
810 After passing the <literal>WHERE</> filter, the derived input
811 table might be subject to grouping, using the <literal>GROUP BY</>
812 clause, and elimination of group rows using the <literal>HAVING</>
813 clause.
814 </para>
816 <synopsis>
817 SELECT <replaceable>select_list</replaceable>
818 FROM ...
819 <optional>WHERE ...</optional>
820 GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
821 </synopsis>
823 <para>
824 The <xref linkend="sql-groupby" endterm="sql-groupby-title"> is
825 used to group together those rows in a table that have the same
826 values in all the columns listed. The order in which the columns
827 are listed does not matter. The effect is to combine each set
828 of rows having common values into one group row that
829 represents all rows in the group. This is done to
830 eliminate redundancy in the output and/or compute aggregates that
831 apply to these groups. For instance:
832 <screen>
833 <prompt>=&gt;</> <userinput>SELECT * FROM test1;</>
834 x | y
835 ---+---
836 a | 3
837 c | 2
838 b | 5
839 a | 1
840 (4 rows)
842 <prompt>=&gt;</> <userinput>SELECT x FROM test1 GROUP BY x;</>
848 (3 rows)
849 </screen>
850 </para>
852 <para>
853 In the second query, we could not have written <literal>SELECT *
854 FROM test1 GROUP BY x</literal>, because there is no single value
855 for the column <literal>y</> that could be associated with each
856 group. The grouped-by columns can be referenced in the select list since
857 they have a single value in each group.
858 </para>
860 <para>
861 In general, if a table is grouped, columns that are not
862 listed in <literal>GROUP BY</> cannot be referenced except in aggregate
863 expressions. An example with aggregate expressions is:
864 <screen>
865 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x;</>
866 x | sum
867 ---+-----
868 a | 4
869 b | 5
870 c | 2
871 (3 rows)
872 </screen>
873 Here <literal>sum</literal> is an aggregate function that
874 computes a single value over the entire group. More information
875 about the available aggregate functions can be found in <xref
876 linkend="functions-aggregate">.
877 </para>
879 <tip>
880 <para>
881 Grouping without aggregate expressions effectively calculates the
882 set of distinct values in a column. This can also be achieved
883 using the <literal>DISTINCT</> clause (see <xref
884 linkend="queries-distinct">).
885 </para>
886 </tip>
888 <para>
889 Here is another example: it calculates the total sales for each
890 product (rather than the total sales of all products):
891 <programlisting>
892 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
893 FROM products p LEFT JOIN sales s USING (product_id)
894 GROUP BY product_id, p.name, p.price;
895 </programlisting>
896 In this example, the columns <literal>product_id</literal>,
897 <literal>p.name</literal>, and <literal>p.price</literal> must be
898 in the <literal>GROUP BY</> clause since they are referenced in
899 the query select list. (Depending on how the products
900 table is set up, name and price might be fully dependent on the
901 product ID, so the additional groupings could theoretically be
902 unnecessary, though this is not implemented.) The column
903 <literal>s.units</> does not have to be in the <literal>GROUP
904 BY</> list since it is only used in an aggregate expression
905 (<literal>sum(...)</literal>), which represents the sales
906 of a product. For each product, the query returns a summary row about
907 all sales of the product.
908 </para>
910 <para>
911 In strict SQL, <literal>GROUP BY</> can only group by columns of
912 the source table but <productname>PostgreSQL</productname> extends
913 this to also allow <literal>GROUP BY</> to group by columns in the
914 select list. Grouping by value expressions instead of simple
915 column names is also allowed.
916 </para>
918 <indexterm>
919 <primary>HAVING</primary>
920 </indexterm>
922 <para>
923 If a table has been grouped using <literal>GROUP BY</literal>,
924 but only certain groups are of interest, the
925 <literal>HAVING</literal> clause can be used, much like a
926 <literal>WHERE</> clause, to eliminate groups from the result.
927 The syntax is:
928 <synopsis>
929 SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
930 </synopsis>
931 Expressions in the <literal>HAVING</> clause can refer both to
932 grouped expressions and to ungrouped expressions (which necessarily
933 involve an aggregate function).
934 </para>
936 <para>
937 Example:
938 <screen>
939 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) &gt; 3;</>
940 x | sum
941 ---+-----
942 a | 4
943 b | 5
944 (2 rows)
946 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x &lt; 'c';</>
947 x | sum
948 ---+-----
949 a | 4
950 b | 5
951 (2 rows)
952 </screen>
953 </para>
955 <para>
956 Again, a more realistic example:
957 <programlisting>
958 SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
959 FROM products p LEFT JOIN sales s USING (product_id)
960 WHERE s.date &gt; CURRENT_DATE - INTERVAL '4 weeks'
961 GROUP BY product_id, p.name, p.price, p.cost
962 HAVING sum(p.price * s.units) &gt; 5000;
963 </programlisting>
964 In the example above, the <literal>WHERE</> clause is selecting
965 rows by a column that is not grouped (the expression is only true for
966 sales during the last four weeks), while the <literal>HAVING</>
967 clause restricts the output to groups with total gross sales over
968 5000. Note that the aggregate expressions do not necessarily need
969 to be the same in all parts of the query.
970 </para>
972 <para>
973 If a query contains aggregate function calls, but no <literal>GROUP BY</>
974 clause, grouping still occurs: the result is a single group row (or
975 perhaps no rows at all, if the single row is then eliminated by
976 <literal>HAVING</>).
977 The same is true if it contains a <literal>HAVING</> clause, even
978 without any aggregate function calls or <literal>GROUP BY</> clause.
979 </para>
980 </sect2>
982 <sect2 id="queries-window">
983 <title>Window Function Processing</>
985 <indexterm zone="queries-window">
986 <primary>window function</primary>
987 <secondary>order of execution</>
988 </indexterm>
990 <para>
991 If the query contains any window functions (see
992 <xref linkend="tutorial-window"> and
993 <xref linkend="syntax-window-functions">), these functions are evaluated
994 after any grouping, aggregation, and <literal>HAVING</> filtering is
995 performed. That is, if the query uses any aggregates, <literal>GROUP
996 BY</>, or <literal>HAVING</>, then the rows seen by the window functions
997 are the group rows instead of the original table rows from
998 <literal>FROM</>/<literal>WHERE</>.
999 </para>
1001 <para>
1002 When multiple window functions are used, all the window functions having
1003 syntactically equivalent <literal>PARTITION BY</> and <literal>ORDER BY</>
1004 clauses in their window definitions are guaranteed to be evaluated in a
1005 single pass over the data. Therefore they will see the same sort ordering,
1006 even if the <literal>ORDER BY</> does not uniquely determine an ordering.
1007 However, no guarantees are made about the evaluation of functions having
1008 different <literal>PARTITION BY</> or <literal>ORDER BY</> specifications.
1009 (In such cases a sort step is typically required between the passes of
1010 window function evaluations, and the sort is not guaranteed to preserve
1011 ordering of rows that its <literal>ORDER BY</> sees as equivalent.)
1012 </para>
1014 <para>
1015 Currently, window functions always require presorted data, and so the
1016 query output will be ordered according to one or another of the window
1017 functions' <literal>PARTITION BY</>/<literal>ORDER BY</> clauses.
1018 It is not recommendable to rely on this, however. Use an explicit
1019 top-level <literal>ORDER BY</> clause if you want to be sure the
1020 results are sorted in a particular way.
1021 </para>
1022 </sect2>
1023 </sect1>
1026 <sect1 id="queries-select-lists">
1027 <title>Select Lists</title>
1029 <indexterm>
1030 <primary>SELECT</primary>
1031 <secondary>select list</secondary>
1032 </indexterm>
1034 <para>
1035 As shown in the previous section,
1036 the table expression in the <command>SELECT</command> command
1037 constructs an intermediate virtual table by possibly combining
1038 tables, views, eliminating rows, grouping, etc. This table is
1039 finally passed on to processing by the <firstterm>select list</firstterm>. The select
1040 list determines which <emphasis>columns</emphasis> of the
1041 intermediate table are actually output.
1042 </para>
1044 <sect2 id="queries-select-list-items">
1045 <title>Select-List Items</title>
1047 <indexterm>
1048 <primary>*</primary>
1049 </indexterm>
1051 <para>
1052 The simplest kind of select list is <literal>*</literal> which
1053 emits all columns that the table expression produces. Otherwise,
1054 a select list is a comma-separated list of value expressions (as
1055 defined in <xref linkend="sql-expressions">). For instance, it
1056 could be a list of column names:
1057 <programlisting>
1058 SELECT a, b, c FROM ...
1059 </programlisting>
1060 The columns names <literal>a</>, <literal>b</>, and <literal>c</>
1061 are either the actual names of the columns of tables referenced
1062 in the <literal>FROM</> clause, or the aliases given to them as
1063 explained in <xref linkend="queries-table-aliases">. The name
1064 space available in the select list is the same as in the
1065 <literal>WHERE</> clause, unless grouping is used, in which case
1066 it is the same as in the <literal>HAVING</> clause.
1067 </para>
1069 <para>
1070 If more than one table has a column of the same name, the table
1071 name must also be given, as in:
1072 <programlisting>
1073 SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
1074 </programlisting>
1075 When working with multiple tables, it can also be useful to ask for
1076 all the columns of a particular table:
1077 <programlisting>
1078 SELECT tbl1.*, tbl2.a FROM ...
1079 </programlisting>
1080 (See also <xref linkend="queries-where">.)
1081 </para>
1083 <para>
1084 If an arbitrary value expression is used in the select list, it
1085 conceptually adds a new virtual column to the returned table. The
1086 value expression is evaluated once for each result row, with
1087 the row's values substituted for any column references. But the
1088 expressions in the select list do not have to reference any
1089 columns in the table expression of the <literal>FROM</> clause;
1090 they can be constant arithmetic expressions, for instance.
1091 </para>
1092 </sect2>
1094 <sect2 id="queries-column-labels">
1095 <title>Column Labels</title>
1097 <indexterm zone="queries-column-labels">
1098 <primary>alias</primary>
1099 <secondary>in the select list</secondary>
1100 </indexterm>
1102 <para>
1103 The entries in the select list can be assigned names for subsequent
1104 processing, such as for use in an <literal>ORDER BY</> clause
1105 or for display by the client application. For example:
1106 <programlisting>
1107 SELECT a AS value, b + c AS sum FROM ...
1108 </programlisting>
1109 </para>
1111 <para>
1112 If no output column name is specified using <literal>AS</>,
1113 the system assigns a default column name. For simple column references,
1114 this is the name of the referenced column. For function
1115 calls, this is the name of the function. For complex expressions,
1116 the system will generate a generic name.
1117 </para>
1119 <para>
1120 The <literal>AS</> keyword is optional, but only if the new column
1121 name does not match any
1122 <productname>PostgreSQL</productname> keyword (see <xref
1123 linkend="sql-keywords-appendix">). To avoid an accidental match to
1124 a keyword, you can double-quote the column name. For example,
1125 <literal>VALUE</> is a keyword, so this does not work:
1126 <programlisting>
1127 SELECT a value, b + c AS sum FROM ...
1128 </programlisting>
1129 but this does:
1130 <programlisting>
1131 SELECT a "value", b + c AS sum FROM ...
1132 </programlisting>
1133 For protection against possible
1134 future keyword additions, it is recommended that you always either
1135 write <literal>AS</literal> or double-quote the output column name.
1136 </para>
1138 <note>
1139 <para>
1140 The naming of output columns here is different from that done in
1141 the <literal>FROM</> clause (see <xref
1142 linkend="queries-table-aliases">). It is possible
1143 to rename the same column twice, but the name assigned in
1144 the select list is the one that will be passed on.
1145 </para>
1146 </note>
1147 </sect2>
1149 <sect2 id="queries-distinct">
1150 <title><literal>DISTINCT</literal></title>
1152 <indexterm zone="queries-distinct">
1153 <primary>DISTINCT</primary>
1154 </indexterm>
1156 <indexterm zone="queries-distinct">
1157 <primary>duplicates</primary>
1158 </indexterm>
1160 <para>
1161 After the select list has been processed, the result table can
1162 optionally be subject to the elimination of duplicate rows. The
1163 <literal>DISTINCT</literal> key word is written directly after
1164 <literal>SELECT</literal> to specify this:
1165 <synopsis>
1166 SELECT DISTINCT <replaceable>select_list</replaceable> ...
1167 </synopsis>
1168 (Instead of <literal>DISTINCT</> the key word <literal>ALL</literal>
1169 can be used to specify the default behavior of retaining all rows.)
1170 </para>
1172 <para>
1173 <indexterm><primary>null value</><secondary sortas="DISTINCT">in
1174 DISTINCT</></indexterm>
1175 Obviously, two rows are considered distinct if they differ in at
1176 least one column value. Null values are considered equal in this
1177 comparison.
1178 </para>
1180 <para>
1181 Alternatively, an arbitrary expression can determine what rows are
1182 to be considered distinct:
1183 <synopsis>
1184 SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
1185 </synopsis>
1186 Here <replaceable>expression</replaceable> is an arbitrary value
1187 expression that is evaluated for all rows. A set of rows for
1188 which all the expressions are equal are considered duplicates, and
1189 only the first row of the set is kept in the output. Note that
1190 the <quote>first row</quote> of a set is unpredictable unless the
1191 query is sorted on enough columns to guarantee a unique ordering
1192 of the rows arriving at the <literal>DISTINCT</> filter.
1193 (<literal>DISTINCT ON</> processing occurs after <literal>ORDER
1194 BY</> sorting.)
1195 </para>
1197 <para>
1198 The <literal>DISTINCT ON</> clause is not part of the SQL standard
1199 and is sometimes considered bad style because of the potentially
1200 indeterminate nature of its results. With judicious use of
1201 <literal>GROUP BY</> and subqueries in <literal>FROM</>, this
1202 construct can be avoided, but it is often the most convenient
1203 alternative.
1204 </para>
1205 </sect2>
1206 </sect1>
1209 <sect1 id="queries-union">
1210 <title>Combining Queries</title>
1212 <indexterm zone="queries-union">
1213 <primary>UNION</primary>
1214 </indexterm>
1215 <indexterm zone="queries-union">
1216 <primary>INTERSECT</primary>
1217 </indexterm>
1218 <indexterm zone="queries-union">
1219 <primary>EXCEPT</primary>
1220 </indexterm>
1221 <indexterm zone="queries-union">
1222 <primary>set union</primary>
1223 </indexterm>
1224 <indexterm zone="queries-union">
1225 <primary>set intersection</primary>
1226 </indexterm>
1227 <indexterm zone="queries-union">
1228 <primary>set difference</primary>
1229 </indexterm>
1230 <indexterm zone="queries-union">
1231 <primary>set operation</primary>
1232 </indexterm>
1234 <para>
1235 The results of two queries can be combined using the set operations
1236 union, intersection, and difference. The syntax is
1237 <synopsis>
1238 <replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
1239 <replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
1240 <replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
1241 </synopsis>
1242 <replaceable>query1</replaceable> and
1243 <replaceable>query2</replaceable> are queries that can use any of
1244 the features discussed up to this point. Set operations can also
1245 be nested and chained, for example
1246 <synopsis>
1247 <replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
1248 </synopsis>
1249 which is executed as:
1250 <synopsis>
1251 (<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
1252 </synopsis>
1253 </para>
1255 <para>
1256 <literal>UNION</> effectively appends the result of
1257 <replaceable>query2</replaceable> to the result of
1258 <replaceable>query1</replaceable> (although there is no guarantee
1259 that this is the order in which the rows are actually returned).
1260 Furthermore, it eliminates duplicate rows from its result, in the same
1261 way as <literal>DISTINCT</>, unless <literal>UNION ALL</> is used.
1262 </para>
1264 <para>
1265 <literal>INTERSECT</> returns all rows that are both in the result
1266 of <replaceable>query1</replaceable> and in the result of
1267 <replaceable>query2</replaceable>. Duplicate rows are eliminated
1268 unless <literal>INTERSECT ALL</> is used.
1269 </para>
1271 <para>
1272 <literal>EXCEPT</> returns all rows that are in the result of
1273 <replaceable>query1</replaceable> but not in the result of
1274 <replaceable>query2</replaceable>. (This is sometimes called the
1275 <firstterm>difference</> between two queries.) Again, duplicates
1276 are eliminated unless <literal>EXCEPT ALL</> is used.
1277 </para>
1279 <para>
1280 In order to calculate the union, intersection, or difference of two
1281 queries, the two queries must be <quote>union compatible</quote>,
1282 which means that they return the same number of columns and
1283 the corresponding columns have compatible data types, as
1284 described in <xref linkend="typeconv-union-case">.
1285 </para>
1286 </sect1>
1289 <sect1 id="queries-order">
1290 <title>Sorting Rows</title>
1292 <indexterm zone="queries-order">
1293 <primary>sorting</primary>
1294 </indexterm>
1296 <indexterm zone="queries-order">
1297 <primary>ORDER BY</primary>
1298 </indexterm>
1300 <para>
1301 After a query has produced an output table (after the select list
1302 has been processed) it can optionally be sorted. If sorting is not
1303 chosen, the rows will be returned in an unspecified order. The actual
1304 order in that case will depend on the scan and join plan types and
1305 the order on disk, but it must not be relied on. A particular
1306 output ordering can only be guaranteed if the sort step is explicitly
1307 chosen.
1308 </para>
1310 <para>
1311 The <literal>ORDER BY</> clause specifies the sort order:
1312 <synopsis>
1313 SELECT <replaceable>select_list</replaceable>
1314 FROM <replaceable>table_expression</replaceable>
1315 ORDER BY <replaceable>sort_expression1</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional>
1316 <optional>, <replaceable>sort_expression2</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional> ...</optional>
1317 </synopsis>
1318 The sort expression(s) can be any expression that would be valid in the
1319 query's select list. An example is:
1320 <programlisting>
1321 SELECT a, b FROM table1 ORDER BY a + b, c;
1322 </programlisting>
1323 When more than one expression is specified,
1324 the later values are used to sort rows that are equal according to the
1325 earlier values. Each expression can be followed by an optional
1326 <literal>ASC</> or <literal>DESC</> keyword to set the sort direction to
1327 ascending or descending. <literal>ASC</> order is the default.
1328 Ascending order puts smaller values first, where
1329 <quote>smaller</quote> is defined in terms of the
1330 <literal>&lt;</literal> operator. Similarly, descending order is
1331 determined with the <literal>&gt;</literal> operator.
1332 <footnote>
1333 <para>
1334 Actually, <productname>PostgreSQL</> uses the <firstterm>default B-tree
1335 operator class</> for the expression's data type to determine the sort
1336 ordering for <literal>ASC</> and <literal>DESC</>. Conventionally,
1337 data types will be set up so that the <literal>&lt;</literal> and
1338 <literal>&gt;</literal> operators correspond to this sort ordering,
1339 but a user-defined data type's designer could choose to do something
1340 different.
1341 </para>
1342 </footnote>
1343 </para>
1345 <para>
1346 The <literal>NULLS FIRST</> and <literal>NULLS LAST</> options can be
1347 used to determine whether nulls appear before or after non-null values
1348 in the sort ordering. By default, null values sort as if larger than any
1349 non-null value; that is, <literal>NULLS FIRST</> is the default for
1350 <literal>DESC</> order, and <literal>NULLS LAST</> otherwise.
1351 </para>
1353 <para>
1354 Note that the ordering options are considered independently for each
1355 sort column. For example <literal>ORDER BY x, y DESC</> means
1356 <literal>ORDER BY x ASC, y DESC</>, which is not the same as
1357 <literal>ORDER BY x DESC, y DESC</>.
1358 </para>
1360 <para>
1361 A <replaceable>sort_expression</> can also be the column label or number
1362 of an output column, as in:
1363 <programlisting>
1364 SELECT a + b AS sum, c FROM table1 ORDER BY sum;
1365 SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
1366 </programlisting>
1367 both of which sort by the first output column. Note that an output
1368 column name has to stand alone, that is, it cannot be used in an expression
1369 &mdash; for example, this is <emphasis>not</> correct:
1370 <programlisting>
1371 SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; -- wrong
1372 </programlisting>
1373 This restriction is made to reduce ambiguity. There is still
1374 ambiguity if an <literal>ORDER BY</> item is a simple name that
1375 could match either an output column name or a column from the table
1376 expression. The output column is used in such cases. This would
1377 only cause confusion if you use <literal>AS</> to rename an output
1378 column to match some other table column's name.
1379 </para>
1381 <para>
1382 <literal>ORDER BY</> can be applied to the result of a
1383 <literal>UNION</>, <literal>INTERSECT</>, or <literal>EXCEPT</>
1384 combination, but in this case it is only permitted to sort by
1385 output column names or numbers, not by expressions.
1386 </para>
1387 </sect1>
1390 <sect1 id="queries-limit">
1391 <title><literal>LIMIT</literal> and <literal>OFFSET</literal></title>
1393 <indexterm zone="queries-limit">
1394 <primary>LIMIT</primary>
1395 </indexterm>
1397 <indexterm zone="queries-limit">
1398 <primary>OFFSET</primary>
1399 </indexterm>
1401 <para>
1402 <literal>LIMIT</> and <literal>OFFSET</> allow you to retrieve just
1403 a portion of the rows that are generated by the rest of the query:
1404 <synopsis>
1405 SELECT <replaceable>select_list</replaceable>
1406 FROM <replaceable>table_expression</replaceable>
1407 <optional> ORDER BY ... </optional>
1408 <optional> LIMIT { <replaceable>number</replaceable> | ALL } </optional> <optional> OFFSET <replaceable>number</replaceable> </optional>
1409 </synopsis>
1410 </para>
1412 <para>
1413 If a limit count is given, no more than that many rows will be
1414 returned (but possibly less, if the query itself yields less rows).
1415 <literal>LIMIT ALL</> is the same as omitting the <literal>LIMIT</>
1416 clause.
1417 </para>
1419 <para>
1420 <literal>OFFSET</> says to skip that many rows before beginning to
1421 return rows. <literal>OFFSET 0</> is the same as omitting the
1422 <literal>OFFSET</> clause, and <literal>LIMIT NULL</> is the same
1423 as omitting the <literal>LIMIT</> clause. If both <literal>OFFSET</>
1424 and <literal>LIMIT</> appear, then <literal>OFFSET</> rows are
1425 skipped before starting to count the <literal>LIMIT</> rows that
1426 are returned.
1427 </para>
1429 <para>
1430 When using <literal>LIMIT</>, it is important to use an
1431 <literal>ORDER BY</> clause that constrains the result rows into a
1432 unique order. Otherwise you will get an unpredictable subset of
1433 the query's rows. You might be asking for the tenth through
1434 twentieth rows, but tenth through twentieth in what ordering? The
1435 ordering is unknown, unless you specified <literal>ORDER BY</>.
1436 </para>
1438 <para>
1439 The query optimizer takes <literal>LIMIT</> into account when
1440 generating query plans, so you are very likely to get different
1441 plans (yielding different row orders) depending on what you give
1442 for <literal>LIMIT</> and <literal>OFFSET</>. Thus, using
1443 different <literal>LIMIT</>/<literal>OFFSET</> values to select
1444 different subsets of a query result <emphasis>will give
1445 inconsistent results</emphasis> unless you enforce a predictable
1446 result ordering with <literal>ORDER BY</>. This is not a bug; it
1447 is an inherent consequence of the fact that SQL does not promise to
1448 deliver the results of a query in any particular order unless
1449 <literal>ORDER BY</> is used to constrain the order.
1450 </para>
1452 <para>
1453 The rows skipped by an <literal>OFFSET</> clause still have to be
1454 computed inside the server; therefore a large <literal>OFFSET</>
1455 might be inefficient.
1456 </para>
1457 </sect1>
1460 <sect1 id="queries-values">
1461 <title><literal>VALUES</literal> Lists</title>
1463 <indexterm zone="queries-values">
1464 <primary>VALUES</primary>
1465 </indexterm>
1467 <para>
1468 <literal>VALUES</> provides a way to generate a <quote>constant table</>
1469 that can be used in a query without having to actually create and populate
1470 a table on-disk. The syntax is
1471 <synopsis>
1472 VALUES ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) [, ...]
1473 </synopsis>
1474 Each parenthesized list of expressions generates a row in the table.
1475 The lists must all have the same number of elements (i.e., the number
1476 of columns in the table), and corresponding entries in each list must
1477 have compatible data types. The actual data type assigned to each column
1478 of the result is determined using the same rules as for <literal>UNION</>
1479 (see <xref linkend="typeconv-union-case">).
1480 </para>
1482 <para>
1483 As an example:
1484 <programlisting>
1485 VALUES (1, 'one'), (2, 'two'), (3, 'three');
1486 </programlisting>
1488 will return a table of two columns and three rows. It's effectively
1489 equivalent to:
1490 <programlisting>
1491 SELECT 1 AS column1, 'one' AS column2
1492 UNION ALL
1493 SELECT 2, 'two'
1494 UNION ALL
1495 SELECT 3, 'three';
1496 </programlisting>
1498 By default, <productname>PostgreSQL</productname> assigns the names
1499 <literal>column1</>, <literal>column2</>, etc. to the columns of a
1500 <literal>VALUES</> table. The column names are not specified by the
1501 SQL standard and different database systems do it differently, so
1502 it's usually better to override the default names with a table alias
1503 list.
1504 </para>
1506 <para>
1507 Syntactically, <literal>VALUES</> followed by expression lists is
1508 treated as equivalent to:
1509 <synopsis>
1510 SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable>
1511 </synopsis>
1512 and can appear anywhere a <literal>SELECT</> can. For example, you can
1513 use it as part of a <literal>UNION</>, or attach a
1514 <replaceable>sort_specification</replaceable> (<literal>ORDER BY</>,
1515 <literal>LIMIT</>, and/or <literal>OFFSET</>) to it. <literal>VALUES</>
1516 is most commonly used as the data source in an <command>INSERT</> command,
1517 and next most commonly as a subquery.
1518 </para>
1520 <para>
1521 For more information see <xref linkend="sql-values"
1522 endterm="sql-values-title">.
1523 </para>
1525 </sect1>
1528 <sect1 id="queries-with">
1529 <title><literal>WITH</literal> Queries</title>
1531 <indexterm zone="queries-with">
1532 <primary>WITH</primary>
1533 <secondary>in SELECT</secondary>
1534 </indexterm>
1536 <indexterm>
1537 <primary>common table expression</primary>
1538 <see>WITH</see>
1539 </indexterm>
1541 <para>
1542 <literal>WITH</> provides a way to write subqueries for use in a larger
1543 <literal>SELECT</> query. The subqueries can be thought of as defining
1544 temporary tables that exist just for this query. One use of this feature
1545 is to break down complicated queries into simpler parts. An example is:
1547 <programlisting>
1548 WITH regional_sales AS (
1549 SELECT region, SUM(amount) AS total_sales
1550 FROM orders
1551 GROUP BY region
1552 ), top_regions AS (
1553 SELECT region
1554 FROM regional_sales
1555 WHERE total_sales &gt; (SELECT SUM(total_sales)/10 FROM regional_sales)
1557 SELECT region,
1558 product,
1559 SUM(quantity) AS product_units,
1560 SUM(amount) AS product_sales
1561 FROM orders
1562 WHERE region IN (SELECT region FROM top_regions)
1563 GROUP BY region, product;
1564 </programlisting>
1566 which displays per-product sales totals in only the top sales regions.
1567 This example could have been written without <literal>WITH</>,
1568 but we'd have needed two levels of nested sub-SELECTs. It's a bit
1569 easier to follow this way.
1570 </para>
1572 <para>
1573 The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
1574 from a mere syntactic convenience into a feature that accomplishes
1575 things not otherwise possible in standard SQL. Using
1576 <literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
1577 output. A very simple example is this query to sum the integers from 1
1578 through 100:
1580 <programlisting>
1581 WITH RECURSIVE t(n) AS (
1582 VALUES (1)
1583 UNION ALL
1584 SELECT n+1 FROM t WHERE n &lt; 100
1586 SELECT sum(n) FROM t;
1587 </programlisting>
1589 The general form of a recursive <literal>WITH</> query is always a
1590 <firstterm>non-recursive term</>, then <literal>UNION</> (or
1591 <literal>UNION ALL</>), then a
1592 <firstterm>recursive term</>, where only the recursive term can contain
1593 a reference to the query's own output. Such a query is executed as
1594 follows:
1595 </para>
1597 <procedure>
1598 <title>Recursive Query Evaluation</title>
1600 <step performance="required">
1601 <para>
1602 Evaluate the non-recursive term. For <literal>UNION</> (but not
1603 <literal>UNION ALL</>), discard duplicate rows. Include all remaining
1604 rows in the result of the recursive query, and also place them in a
1605 temporary <firstterm>working table</>.
1606 </para>
1607 </step>
1609 <step performance="required">
1610 <para>
1611 So long as the working table is not empty, repeat these steps:
1612 </para>
1613 <substeps>
1614 <step performance="required">
1615 <para>
1616 Evaluate the recursive term, substituting the current contents of
1617 the working table for the recursive self-reference.
1618 For <literal>UNION</> (but not <literal>UNION ALL</>), discard
1619 duplicate rows and rows that duplicate any previous result row.
1620 Include all remaining rows in the result of the recursive query, and
1621 also place them in a temporary <firstterm>intermediate table</>.
1622 </para>
1623 </step>
1625 <step performance="required">
1626 <para>
1627 Replace the contents of the working table with the contents of the
1628 intermediate table, then empty the intermediate table.
1629 </para>
1630 </step>
1631 </substeps>
1632 </step>
1633 </procedure>
1635 <note>
1636 <para>
1637 Strictly speaking, this process is iteration not recursion, but
1638 <literal>RECURSIVE</> is the terminology chosen by the SQL standards
1639 committee.
1640 </para>
1641 </note>
1643 <para>
1644 In the example above, the working table has just a single row in each step,
1645 and it takes on the values from 1 through 100 in successive steps. In
1646 the 100th step, there is no output because of the <literal>WHERE</>
1647 clause, and so the query terminates.
1648 </para>
1650 <para>
1651 Recursive queries are typically used to deal with hierarchical or
1652 tree-structured data. A useful example is this query to find all the
1653 direct and indirect sub-parts of a product, given only a table that
1654 shows immediate inclusions:
1656 <programlisting>
1657 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
1658 SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
1659 UNION ALL
1660 SELECT p.sub_part, p.part, p.quantity
1661 FROM included_parts pr, parts p
1662 WHERE p.part = pr.sub_part
1664 SELECT sub_part, SUM(quantity) as total_quantity
1665 FROM included_parts
1666 GROUP BY sub_part
1667 </programlisting>
1668 </para>
1670 <para>
1671 When working with recursive queries it is important to be sure that
1672 the recursive part of the query will eventually return no tuples,
1673 or else the query will loop indefinitely. Sometimes, using
1674 <literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
1675 by discarding rows that duplicate previous output rows. However, often a
1676 cycle does not involve output rows that are completely duplicate: it may be
1677 necessary to check just one or a few fields to see if the same point has
1678 been reached before. The standard method for handling such situations is
1679 to compute an array of the already-visited values. For example, consider
1680 the following query that searches a table <structname>graph</> using a
1681 <structfield>link</> field:
1683 <programlisting>
1684 WITH RECURSIVE search_graph(id, link, data, depth) AS (
1685 SELECT g.id, g.link, g.data, 1
1686 FROM graph g
1687 UNION ALL
1688 SELECT g.id, g.link, g.data, sg.depth + 1
1689 FROM graph g, search_graph sg
1690 WHERE g.id = sg.link
1692 SELECT * FROM search_graph;
1693 </programlisting>
1695 This query will loop if the <structfield>link</> relationships contain
1696 cycles. Because we require a <quote>depth</> output, just changing
1697 <literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
1698 Instead we need to recognize whether we have reached the same row again
1699 while following a particular path of links. We add two columns
1700 <structfield>path</> and <structfield>cycle</> to the loop-prone query:
1702 <programlisting>
1703 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1704 SELECT g.id, g.link, g.data, 1,
1705 ARRAY[g.id],
1706 false
1707 FROM graph g
1708 UNION ALL
1709 SELECT g.id, g.link, g.data, sg.depth + 1,
1710 path || g.id,
1711 g.id = ANY(path)
1712 FROM graph g, search_graph sg
1713 WHERE g.id = sg.link AND NOT cycle
1715 SELECT * FROM search_graph;
1716 </programlisting>
1718 Aside from preventing cycles, the array value is often useful in its own
1719 right as representing the <quote>path</> taken to reach any particular row.
1720 </para>
1722 <para>
1723 In the general case where more than one field needs to be checked to
1724 recognize a cycle, use an array of rows. For example, if we needed to
1725 compare fields <structfield>f1</> and <structfield>f2</>:
1727 <programlisting>
1728 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1729 SELECT g.id, g.link, g.data, 1,
1730 ARRAY[ROW(g.f1, g.f2)],
1731 false
1732 FROM graph g
1733 UNION ALL
1734 SELECT g.id, g.link, g.data, sg.depth + 1,
1735 path || ROW(g.f1, g.f2),
1736 ROW(g.f1, g.f2) = ANY(path)
1737 FROM graph g, search_graph sg
1738 WHERE g.id = sg.link AND NOT cycle
1740 SELECT * FROM search_graph;
1741 </programlisting>
1742 </para>
1744 <tip>
1745 <para>
1746 Omit the <literal>ROW()</> syntax in the common case where only one field
1747 needs to be checked to recognize a cycle. This allows a simple array
1748 rather than a composite-type array to be used, gaining efficiency.
1749 </para>
1750 </tip>
1752 <tip>
1753 <para>
1754 The recursive query evaluation algorithm produces its output in
1755 breadth-first search order. You can display the results in depth-first
1756 search order by making the outer query <literal>ORDER BY</> a
1757 <quote>path</> column constructed in this way.
1758 </para>
1759 </tip>
1761 <para>
1762 A helpful trick for testing queries
1763 when you are not certain if they might loop is to place a <literal>LIMIT</>
1764 in the parent query. For example, this query would loop forever without
1765 the <literal>LIMIT</>:
1767 <programlisting>
1768 WITH RECURSIVE t(n) AS (
1769 SELECT 1
1770 UNION ALL
1771 SELECT n+1 FROM t
1773 SELECT n FROM t LIMIT 100;
1774 </programlisting>
1776 This works because <productname>PostgreSQL</productname>'s implementation
1777 evaluates only as many rows of a <literal>WITH</> query as are actually
1778 fetched by the parent query. Using this trick in production is not
1779 recommended, because other systems might work differently. Also, it
1780 usually won't work if you make the outer query sort the recursive query's
1781 results or join them to some other table.
1782 </para>
1784 <para>
1785 A useful property of <literal>WITH</> queries is that they are evaluated
1786 only once per execution of the parent query, even if they are referred to
1787 more than once by the parent query or sibling <literal>WITH</> queries.
1788 Thus, expensive calculations that are needed in multiple places can be
1789 placed within a <literal>WITH</> query to avoid redundant work. Another
1790 possible application is to prevent unwanted multiple evaluations of
1791 functions with side-effects.
1792 However, the other side of this coin is that the optimizer is less able to
1793 push restrictions from the parent query down into a <literal>WITH</> query
1794 than an ordinary sub-query. The <literal>WITH</> query will generally be
1795 evaluated as stated, without suppression of rows that the parent query
1796 might discard afterwards. (But, as mentioned above, evaluation might stop
1797 early if the reference(s) to the query demand only a limited number of
1798 rows.)
1799 </para>
1801 </sect1>
1803 </chapter>