Fix spelling error in docs.
[PostgreSQL.git] / doc / src / sgml / sql.sgml
blobc6b56d05f9cf2f4705f109cddc4fce6e81232900
1 <!-- $PostgreSQL$ -->
3 <chapter id="sql-intro">
4 <title>SQL</title>
6 <abstract>
7 <para>
8 This chapter introduces the mathematical concepts behind
9 relational databases. It is not required reading, so if you bog
10 down or want to get straight to some simple examples feel free to
11 jump ahead to the next chapter and come back when you have more
12 time and patience. This stuff is supposed to be fun!
13 </para>
15 <para>
16 This material originally appeared as a part of
17 Stefan Simkovics' Master's Thesis
18 (<xref linkend="SIM98" endterm="SIM98">).
19 </para>
20 </abstract>
22 <para>
23 <acronym>SQL</acronym> has become the most popular relational query
24 language.
25 The name <quote><acronym>SQL</acronym></quote> is an abbreviation for
26 <firstterm>Structured Query Language</firstterm>.
27 In 1974 Donald Chamberlin and others defined the
28 language SEQUEL (<firstterm>Structured English Query
29 Language</firstterm>) at IBM
30 Research. This language was first implemented in an IBM
31 prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version
32 of SEQUEL called SEQUEL/2 was defined and the name was changed to
33 <acronym>SQL</acronym>
34 subsequently.
35 </para>
37 <para>
38 A new prototype called System R was developed by IBM in 1977. System R
39 implemented a large subset of SEQUEL/2 (now <acronym>SQL</acronym>)
40 and a number of
41 changes were made to <acronym>SQL</acronym> during the project.
42 System R was installed in
43 a number of user sites, both internal IBM sites and also some selected
44 customer sites. Thanks to the success and acceptance of System R at
45 those user sites IBM started to develop commercial products that
46 implemented the <acronym>SQL</acronym> language based on the System
47 R technology.
48 </para>
50 <para>
51 Over the next years IBM and also a number of other vendors announced
52 <acronym>SQL</acronym> products such as
53 <productname>SQL/DS</productname> (IBM),
54 <productname>DB2</productname> (IBM),
55 <productname>ORACLE</productname> (Oracle Corp.),
56 <productname>DG/SQL</productname> (Data General Corp.),
57 and <productname>SYBASE</productname> (Sybase Inc.).
58 </para>
60 <para>
61 <acronym>SQL</acronym> is also an official standard now. In 1982
62 the American National
63 Standards Institute (<acronym>ANSI</acronym>) chartered its
64 Database Committee X3H2 to
65 develop a proposal for a standard relational language. This proposal
66 was ratified in 1986 and consisted essentially of the IBM dialect of
67 <acronym>SQL</acronym>. In 1987 this <acronym>ANSI</acronym>
68 standard was also accepted as an international
69 standard by the International Organization for Standardization
70 (<acronym>ISO</acronym>).
71 This original standard version of <acronym>SQL</acronym> is often
72 referred to,
73 informally, as <quote><abbrev>SQL/86</abbrev></quote>. In 1989 the original
74 standard was extended
75 and this new standard is often, again informally, referred to as
76 <quote><abbrev>SQL/89</abbrev></quote>. Also in 1989, a related standard called
77 <firstterm>Database Language Embedded <acronym>SQL</acronym></firstterm>
78 (<acronym>ESQL</acronym>) was developed.
79 </para>
81 <para>
82 The <acronym>ISO</acronym> and <acronym>ANSI</acronym> committees
83 have been working for many years on the
84 definition of a greatly expanded version of the original standard,
85 referred to informally as <firstterm><acronym>SQL2</acronym></firstterm>
86 or <firstterm><acronym>SQL/92</acronym></firstterm>. This version became a
87 ratified standard - <quote>International Standard ISO/IEC 9075:1992,
88 Database Language <acronym>SQL</acronym></quote> - in late 1992.
89 <acronym>SQL/92</acronym> is the version
90 normally meant when people refer to <quote>the <acronym>SQL</acronym>
91 standard</quote>. A detailed
92 description of <acronym>SQL/92</acronym> is given in
93 <xref linkend="DATE97" endterm="DATE97">. At the time of
94 writing this document a new standard informally referred to
95 as <firstterm><acronym>SQL3</acronym></firstterm>
96 is under development. It is planned to make <acronym>SQL</acronym>
97 a Turing-complete
98 language, i.e. all computable queries (e.g. recursive queries) will be
99 possible. This has now been completed as SQL:2003.
100 </para>
102 <sect1 id="rel-model">
103 <title>The Relational Data Model</title>
105 <para>
106 As mentioned before, <acronym>SQL</acronym> is a relational
107 language. That means it is
108 based on the <firstterm>relational data model</firstterm>
109 first published by E.F. Codd in
110 1970. We will give a formal description of the relational model
111 later (in
112 <xref linkend="formal-notion" endterm="formal-notion">)
113 but first we want to have a look at it from a more intuitive
114 point of view.
115 </para>
117 <para>
118 A <firstterm>relational database</firstterm> is a database that is
119 perceived by its
120 users as a <firstterm>collection of tables</firstterm> (and
121 nothing else but tables).
122 A table consists of rows and columns where each row represents a
123 record and each column represents an attribute of the records
124 contained in the table.
125 <xref linkend="supplier-fig" endterm="supplier-fig">
126 shows an example of a database consisting of three tables:
128 <itemizedlist>
129 <listitem>
130 <para>
131 SUPPLIER is a table storing the number
132 (SNO), the name (SNAME) and the city (CITY) of a supplier.
133 </para>
134 </listitem>
136 <listitem>
137 <para>
138 PART is a table storing the number (PNO) the name (PNAME) and
139 the price (PRICE) of a part.
140 </para>
141 </listitem>
143 <listitem>
144 <para>
145 SELLS stores information about which part (PNO) is sold by which
146 supplier (SNO).
147 It serves in a sense to connect the other two tables together.
148 </para>
149 </listitem>
150 </itemizedlist>
152 <example>
153 <title id="supplier-fig">The Suppliers and Parts Database</title>
154 <programlisting>
155 SUPPLIER: SELLS:
156 SNO | SNAME | CITY SNO | PNO
157 ----+---------+-------- -----+-----
158 1 | Smith | London 1 | 1
159 2 | Jones | Paris 1 | 2
160 3 | Adams | Vienna 2 | 4
161 4 | Blake | Rome 3 | 1
162 3 | 3
163 4 | 2
164 PART: 4 | 3
165 PNO | PNAME | PRICE 4 | 4
166 ----+---------+---------
167 1 | Screw | 10
168 2 | Nut | 8
169 3 | Bolt | 15
170 4 | Cam | 25
171 </programlisting>
172 </example>
173 </para>
175 <para>
176 The tables PART and SUPPLIER can be regarded as
177 <firstterm>entities</firstterm> and
178 SELLS can be regarded as a <firstterm>relationship</firstterm>
179 between a particular
180 part and a particular supplier.
181 </para>
183 <para>
184 As we will see later, <acronym>SQL</acronym> operates on tables
185 like the ones just
186 defined but before that we will study the theory of the relational
187 model.
188 </para>
189 </sect1>
191 <sect1 id="relmodel-formal">
192 <title id="formal-notion">Relational Data Model Formalities</title>
194 <para>
195 The mathematical concept underlying the relational model is the
196 set-theoretic <firstterm>relation</firstterm> which is a subset of
197 the Cartesian
198 product of a list of domains. This set-theoretic relation gives
199 the model its name (do not confuse it with the relationship from the
200 <firstterm>Entity-Relationship model</firstterm>).
201 Formally a domain is simply a set of
202 values. For example the set of integers is a domain. Also the set of
203 character strings of length 20 and the real numbers are examples of
204 domains.
205 </para>
207 <para>
208 <!--
209 \begin{definition}
210 The <firstterm>Cartesian product</firstterm> of domains $D_{1},
211 D_{2},\ldots, D_{k}$ written
212 \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set of
213 all $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \in
214 D_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}.
215 \end{definition}
217 The <firstterm>Cartesian product</firstterm> of domains
218 <parameter>D<subscript>1</subscript></parameter>,
219 <parameter>D<subscript>2</subscript></parameter>,
221 <parameter>D<subscript>k</subscript></parameter>,
222 written
223 <parameter>D<subscript>1</subscript></parameter> &times;
224 <parameter>D<subscript>2</subscript></parameter> &times;
225 ... &times;
226 <parameter>D<subscript>k</subscript></parameter>
227 is the set of all k-tuples
228 <parameter>v<subscript>1</subscript></parameter>,
229 <parameter>v<subscript>2</subscript></parameter>,
231 <parameter>v<subscript>k</subscript></parameter>,
232 such that
233 <parameter>v<subscript>1</subscript></parameter> &isin;
234 <parameter>D<subscript>1</subscript></parameter>,
235 <parameter>v<subscript>2</subscript></parameter> &isin;
236 <parameter>D<subscript>2</subscript></parameter>,
238 <parameter>v<subscript>k</subscript></parameter> &isin;
239 <parameter>D<subscript>k</subscript></parameter>.
240 </para>
242 <para>
243 For example, when we have
244 <!--
245 $k=2$, $D_{1}=\{0,1\}$ and
246 $D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is
247 $\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$.
249 <parameter>k</parameter>=2,
250 <parameter>D<subscript>1</subscript></parameter>=<literal>{0,1}</literal> and
251 <parameter>D<subscript>2</subscript></parameter>=<literal>{a,b,c}</literal> then
252 <parameter>D<subscript>1</subscript></parameter> &times;
253 <parameter>D<subscript>2</subscript></parameter> is
254 <literal>{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)}</literal>.
255 </para>
257 <para>
258 <!--
259 \begin{definition}
260 A Relation is any subset of the Cartesian product of one or more
261 domains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$}
262 \end{definition}
264 A Relation is any subset of the Cartesian product of one or more
265 domains: <parameter>R</parameter> &sube;
266 <parameter>D<subscript>1</subscript></parameter> &times;
267 <parameter>D<subscript>2</subscript></parameter> &times;
268 ... &times;
269 <parameter>D<subscript>k</subscript></parameter>.
270 </para>
272 <para>
273 For example <literal>{(0,a),(0,b),(1,a)}</literal> is a relation;
274 it is in fact a subset of
275 <parameter>D<subscript>1</subscript></parameter> &times;
276 <parameter>D<subscript>2</subscript></parameter>
277 mentioned above.
278 </para>
280 <para>
281 The members of a relation are called tuples. Each relation of some
282 Cartesian product
283 <parameter>D<subscript>1</subscript></parameter> &times;
284 <parameter>D<subscript>2</subscript></parameter> &times;
285 ... &times;
286 <parameter>D<subscript>k</subscript></parameter>
287 is said to have arity <literal>k</literal> and is therefore a set
288 of <literal>k</literal>-tuples.
289 </para>
291 <para>
292 A relation can be viewed as a table (as we already did, remember
293 <xref linkend="supplier-fig" endterm="supplier-fig"> where
294 every tuple is represented by a row and every column corresponds to
295 one component of a tuple. Giving names (called attributes) to the
296 columns leads to the definition of a
297 <firstterm>relation scheme</firstterm>.
298 </para>
300 <para>
301 <!--
302 \begin{definition}
303 A {\it relation scheme} $R$ is a finite set of attributes
304 \mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ for
305 each attribute $A_{i}, 1 \le i \le k$ where the values of the
306 attributes are taken from. We often write a relation scheme as
307 \mbox{$R(A_{1},A_{2},\ldots,A_{k})$}.
308 \end{definition}
310 A <firstterm>relation scheme</firstterm> <literal>R</literal> is a
311 finite set of attributes
312 <parameter>A<subscript>1</subscript></parameter>,
313 <parameter>A<subscript>2</subscript></parameter>,
315 <parameter>A<subscript>k</subscript></parameter>.
316 There is a domain
317 <parameter>D<subscript>i</subscript></parameter>,
318 for each attribute
319 <parameter>A<subscript>i</subscript></parameter>,
320 1 &lt;= <literal>i</literal> &lt;= <literal>k</literal>,
321 where the values of the attributes are taken from. We often write
322 a relation scheme as
323 <literal>R(<parameter>A<subscript>1</subscript></parameter>,
324 <parameter>A<subscript>2</subscript></parameter>,
326 <parameter>A<subscript>k</subscript></parameter>)</literal>.
328 <note>
329 <para>
330 A <firstterm>relation scheme</firstterm> is just a kind of template
331 whereas a <firstterm>relation</firstterm> is an instance of a
332 <firstterm>relation
333 scheme</firstterm>. The relation consists of tuples (and can
334 therefore be
335 viewed as a table); not so the relation scheme.
336 </para>
337 </note>
338 </para>
340 <sect2>
341 <title id="domains">Domains vs. Data Types</title>
343 <para>
344 We often talked about <firstterm>domains</firstterm>
345 in the last section. Recall that a
346 domain is, formally, just a set of values (e.g., the set of integers or
347 the real numbers). In terms of database systems we often talk of
348 <firstterm>data types</firstterm> instead of domains.
349 When we define a table we have to make
350 a decision about which attributes to include. Additionally we
351 have to decide which kind of data is going to be stored as
352 attribute values. For example the values of
353 <classname>SNAME</classname> from the table
354 <classname>SUPPLIER</classname> will be character strings,
355 whereas <classname>SNO</classname> will store
356 integers. We define this by assigning a data type to each
357 attribute. The type of <classname>SNAME</classname> will be
358 <type>VARCHAR(20)</type> (this is the <acronym>SQL</acronym> type
359 for character strings of length &lt;= 20),
360 the type of <classname>SNO</classname> will be
361 <type>INTEGER</type>. With the assignment of a data type we also
362 have selected
363 a domain for an attribute. The domain of
364 <classname>SNAME</classname> is the set of all
365 character strings of length &lt;= 20,
366 the domain of <classname>SNO</classname> is the set of
367 all integer numbers.
368 </para>
369 </sect2>
370 </sect1>
372 <sect1 id="relmodel-oper">
373 <title id="operations">Operations in the Relational Data Model</title>
375 <para>
376 In the previous section
377 (<xref linkend="formal-notion" endterm="formal-notion">)
378 we defined the mathematical notion of
379 the relational model. Now we know how the data can be stored using a
380 relational data model but we do not know what to do with all these
381 tables to retrieve something from the database yet. For example somebody
382 could ask for the names of all suppliers that sell the part
383 'Screw'. Therefore two rather different kinds of notations for
384 expressing operations on relations have been defined:
386 <itemizedlist>
387 <listitem>
388 <para>
389 The <firstterm>Relational Algebra</firstterm> which is an
390 algebraic notation,
391 where queries are expressed by applying specialized operators to the
392 relations.
393 </para>
394 </listitem>
396 <listitem>
397 <para>
398 The <firstterm>Relational Calculus</firstterm> which is a
399 logical notation,
400 where queries are expressed by formulating some logical restrictions
401 that the tuples in the answer must satisfy.
402 </para>
403 </listitem>
404 </itemizedlist>
405 </para>
407 <sect2>
408 <title id="rel-alg">Relational Algebra</title>
410 <para>
411 The <firstterm>Relational Algebra</firstterm> was introduced by
412 E. F. Codd in 1972. It consists of a set of operations on relations:
414 <itemizedlist>
415 <listitem>
416 <para>
417 SELECT (&sigma;): extracts <firstterm>tuples</firstterm> from
418 a relation that
419 satisfy a given restriction. Let <parameter>R</parameter> be a
420 table that contains an attribute
421 <parameter>A</parameter>.
422 &sigma;<subscript>A=a</subscript>(R) = {t &isin; R &mid; t(A) = a}
423 where <literal>t</literal> denotes a
424 tuple of <parameter>R</parameter> and <literal>t(A)</literal>
425 denotes the value of attribute <parameter>A</parameter> of
426 tuple <literal>t</literal>.
427 </para>
428 </listitem>
430 <listitem>
431 <para>
432 PROJECT (&pi;): extracts specified
433 <firstterm>attributes</firstterm> (columns) from a
434 relation. Let <classname>R</classname> be a relation
435 that contains an attribute <classname>X</classname>.
436 &pi;<subscript>X</subscript>(<classname>R</classname>) = {t(X) &mid; t &isin; <classname>R</classname>},
437 where <literal>t</literal>(<classname>X</classname>) denotes the value of
438 attribute <classname>X</classname> of tuple <literal>t</literal>.
439 </para>
440 </listitem>
442 <listitem>
443 <para>
444 PRODUCT (&times;): builds the Cartesian product of two
445 relations. Let <classname>R</classname> be a table with arity
446 <literal>k</literal><subscript>1</subscript> and let
447 <classname>S</classname> be a table with
448 arity <literal>k</literal><subscript>2</subscript>.
449 <classname>R</classname> &times; <classname>S</classname>
450 is the set of all
451 <literal>k</literal><subscript>1</subscript>
452 + <literal>k</literal><subscript>2</subscript>-tuples
453 whose first <literal>k</literal><subscript>1</subscript>
454 components form a tuple in <classname>R</classname> and whose last
455 <literal>k</literal><subscript>2</subscript> components form a
456 tuple in <classname>S</classname>.
457 </para>
458 </listitem>
460 <listitem>
461 <para>
462 UNION (&cup;): builds the set-theoretic union of two
463 tables. Given the tables <classname>R</classname> and
464 <classname>S</classname> (both must have the same arity),
465 the union <classname>R</classname> &cup; <classname>S</classname>
466 is the set of tuples that are in <classname>R</classname>
467 or <classname>S</classname> or both.
468 </para>
469 </listitem>
471 <listitem>
472 <para>
473 INTERSECT (&cap;): builds the set-theoretic intersection of two
474 tables. Given the tables <classname>R</classname> and
475 <classname>S</classname>,
476 <classname>R</classname> &cap; <classname>S</classname> is the
477 set of tuples
478 that are in <classname>R</classname> and in
479 <classname>S</classname>.
480 We again require that <classname>R</classname> and
481 <classname>S</classname> have the
482 same arity.
483 </para>
484 </listitem>
486 <listitem>
487 <para>
488 DIFFERENCE (&minus; or &setmn;): builds the set difference of
489 two tables. Let <classname>R</classname> and <classname>S</classname>
490 again be two tables with the same
491 arity. <classname>R</classname> - <classname>S</classname>
492 is the set of tuples in <classname>R</classname> but not in
493 <classname>S</classname>.
494 </para>
495 </listitem>
497 <listitem>
498 <para>
499 JOIN (&prod;): connects two tables by their common
500 attributes. Let <classname>R</classname> be a table with the
501 attributes <classname>A</classname>,<classname>B</classname>
502 and <classname>C</classname> and
503 let <classname>S</classname> be a table with the attributes
504 <classname>C</classname>,<classname>D</classname>
505 and <classname>E</classname>. There is one
506 attribute common to both relations,
507 the attribute <classname>C</classname>.
508 <!--
509 <classname>R</classname> &prod; <classname>S</classname> =
510 &pi;<subscript><classname>R</classname>.<classname>A</classname>,<classname>R</classname>.<classname>B</classname>,<classname>R</classname>.<classname>C</classname>,<classname>S</classname>.<classname>D</classname>,<classname>S</classname>.<classname>E</classname></subscript>(&sigma;<subscript><classname>R</classname>.<classname>C</classname>=<classname>S</classname>.<classname>C</classname></subscript>(<classname>R</classname> &times; <classname>S</classname>)).
512 R &prod; S = &pi;<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(&sigma;<subscript>R.C=S.C</subscript>(R &times; S)).
513 What are we doing here? We first calculate the Cartesian
514 product
515 <classname>R</classname> &times; <classname>S</classname>.
516 Then we select those tuples whose values for the common
517 attribute <classname>C</classname> are equal
518 (&sigma;<subscript>R.C = S.C</subscript>).
519 Now we have a table
520 that contains the attribute <classname>C</classname>
521 two times and we correct this by
522 projecting out the duplicate column.
523 </para>
525 <example>
526 <title id="join-example">An Inner Join</title>
528 <para>
529 Let's have a look at the tables that are produced by evaluating the steps
530 necessary for a join.
531 Let the following two tables be given:
533 <programlisting>
534 R: S:
535 A | B | C C | D | E
536 ---+---+--- ---+---+---
537 1 | 2 | 3 3 | a | b
538 4 | 5 | 6 6 | c | d
539 7 | 8 | 9
540 </programlisting>
541 </para>
542 </example>
544 <para>
545 First we calculate the Cartesian product
546 <classname>R</classname> &times; <classname>S</classname> and
547 get:
549 <programlisting>
550 R x S:
551 A | B | R.C | S.C | D | E
552 ---+---+-----+-----+---+---
553 1 | 2 | 3 | 3 | a | b
554 1 | 2 | 3 | 6 | c | d
555 4 | 5 | 6 | 3 | a | b
556 4 | 5 | 6 | 6 | c | d
557 7 | 8 | 9 | 3 | a | b
558 7 | 8 | 9 | 6 | c | d
559 </programlisting>
560 </para>
562 <para>
563 After the selection
564 &sigma;<subscript>R.C=S.C</subscript>(R &times; S)
565 we get:
567 <programlisting>
568 A | B | R.C | S.C | D | E
569 ---+---+-----+-----+---+---
570 1 | 2 | 3 | 3 | a | b
571 4 | 5 | 6 | 6 | c | d
572 </programlisting>
573 </para>
575 <para>
576 To remove the duplicate column
577 <classname>S</classname>.<classname>C</classname>
578 we project it out by the following operation:
579 &pi;<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(&sigma;<subscript>R.C=S.C</subscript>(R &times; S))
580 and get:
582 <programlisting>
583 A | B | C | D | E
584 ---+---+---+---+---
585 1 | 2 | 3 | a | b
586 4 | 5 | 6 | c | d
587 </programlisting>
588 </para>
589 </listitem>
591 <listitem>
592 <para>
593 DIVIDE (&divide;): Let <classname>R</classname> be a table
594 with the attributes A, B, C, and D and let
595 <classname>S</classname> be a table with the attributes
596 C and D.
597 Then we define the division as:
599 <programlisting>
600 R &divide; S = {t &mid; &forall; t<subscript>s</subscript> &isin; S &exist; t<subscript>r</subscript> &isin; R
601 </programlisting>
603 such that
604 t<subscript>r</subscript>(A,B)=t&and;t<subscript>r</subscript>(C,D)=t<subscript>s</subscript>}
605 where
606 t<subscript>r</subscript>(x,y)
607 denotes a
608 tuple of table <classname>R</classname> that consists only of
609 the components <literal>x</literal> and <literal>y</literal>.
610 Note that the tuple <literal>t</literal> only consists of the
611 components <classname>A</classname> and
612 <classname>B</classname> of relation <classname>R</classname>.
613 </para>
615 <para id="divide-example">
616 Given the following tables
618 <programlisting>
619 R: S:
620 A | B | C | D C | D
621 ---+---+---+--- ---+---
622 a | b | c | d c | d
623 a | b | e | f e | f
624 b | c | e | f
625 e | d | c | d
626 e | d | e | f
627 a | b | d | e
628 </programlisting>
630 R &divide; S
631 is derived as
633 <programlisting>
634 A | B
635 ---+---
636 a | b
637 e | d
638 </programlisting>
639 </para>
640 </listitem>
641 </itemizedlist>
642 </para>
644 <para>
645 For a more detailed description and definition of the relational
646 algebra refer to [<xref linkend="ULL88" endterm="ULL88">] or
647 [<xref linkend="DATE04" endterm="DATE04">].
648 </para>
650 <example>
651 <title id="suppl-rel-alg">A Query Using Relational Algebra</title>
652 <para>
653 Recall that we formulated all those relational operators to be able to
654 retrieve data from the database. Let's return to our example from
655 the previous
656 section (<xref linkend="operations" endterm="operations">)
657 where someone wanted to know the names of all
658 suppliers that sell the part <literal>Screw</literal>.
659 This question can be answered
660 using relational algebra by the following operation:
662 <programlisting>
663 &pi;<subscript>SUPPLIER.SNAME</subscript>(&sigma;<subscript>PART.PNAME='Screw'</subscript>(SUPPLIER &prod; SELLS &prod; PART))
664 </programlisting>
665 </para>
667 <para>
668 We call such an operation a query. If we evaluate the above query
669 against the our example tables
670 (<xref linkend="supplier-fig" endterm="supplier-fig">)
671 we will obtain the following result:
673 <programlisting>
674 SNAME
675 -------
676 Smith
677 Adams
678 </programlisting>
679 </para>
680 </example>
681 </sect2>
683 <sect2 id="rel-calc">
684 <title>Relational Calculus</title>
686 <para>
687 The relational calculus is based on the
688 <firstterm>first order logic</firstterm>. There are
689 two variants of the relational calculus:
691 <itemizedlist>
692 <listitem>
693 <para>
694 The <firstterm>Domain Relational Calculus</firstterm>
695 (<acronym>DRC</acronym>), where variables
696 stand for components (attributes) of the tuples.
697 </para>
698 </listitem>
700 <listitem>
701 <para>
702 The <firstterm>Tuple Relational Calculus</firstterm>
703 (<acronym>TRC</acronym>), where variables stand for tuples.
704 </para>
705 </listitem>
706 </itemizedlist>
707 </para>
709 <para>
710 We want to discuss the tuple relational calculus only because it is
711 the one underlying the most relational languages. For a detailed
712 discussion on <acronym>DRC</acronym> (and also
713 <acronym>TRC</acronym>) see
714 <xref linkend="DATE04" endterm="DATE04">
716 <xref linkend="ULL88" endterm="ULL88">.
717 </para>
718 </sect2>
720 <sect2>
721 <title>Tuple Relational Calculus</title>
723 <para>
724 The queries used in <acronym>TRC</acronym> are of the following
725 form:
727 <programlisting>
728 x(A) &mid; F(x)
729 </programlisting>
731 where <literal>x</literal> is a tuple variable
732 <classname>A</classname> is a set of attributes and <literal>F</literal> is a
733 formula. The resulting relation consists of all tuples
734 <literal>t(A)</literal> that satisfy <literal>F(t)</literal>.
735 </para>
737 <para>
738 If we want to answer the question from example
739 <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">
740 using <acronym>TRC</acronym> we formulate the following query:
742 <programlisting>
743 {x(SNAME) &mid; x &isin; SUPPLIER &and;
744 &exist; y &isin; SELLS &exist; z &isin; PART (y(SNO)=x(SNO) &and;
745 z(PNO)=y(PNO) &and;
746 z(PNAME)='Screw')}
747 </programlisting>
748 </para>
750 <para>
751 Evaluating the query against the tables from
752 <xref linkend="supplier-fig" endterm="supplier-fig">
753 again leads to the same result
754 as in
755 <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">.
756 </para>
757 </sect2>
759 <sect2 id="alg-vs-calc">
760 <title>Relational Algebra vs. Relational Calculus</title>
762 <para>
763 The relational algebra and the relational calculus have the same
764 <firstterm>expressive power</firstterm>; i.e. all queries that
765 can be formulated using relational algebra can also be formulated
766 using the relational calculus and vice versa.
767 This was first proved by E. F. Codd in
768 1972. This proof is based on an algorithm (<quote>Codd's reduction
769 algorithm</quote>) by which an arbitrary expression of the relational
770 calculus can be reduced to a semantically equivalent expression of
771 relational algebra. For a more detailed discussion on that refer to
772 <xref linkend="DATE04" endterm="DATE04">
774 <xref linkend="ULL88" endterm="ULL88">.
775 </para>
777 <para>
778 It is sometimes said that languages based on the relational
779 calculus are <quote>higher level</quote> or <quote>more
780 declarative</quote> than languages based on relational algebra
781 because the algebra (partially) specifies the order of operations
782 while the calculus leaves it to a compiler or interpreter to
783 determine the most efficient order of evaluation.
784 </para>
785 </sect2>
786 </sect1>
788 <sect1 id="sql-language">
789 <title>The <acronym>SQL</acronym> Language</title>
791 <para>
792 As is the case with most modern relational languages,
793 <acronym>SQL</acronym> is based on the tuple
794 relational calculus. As a result every query that can be formulated
795 using the tuple relational calculus (or equivalently, relational
796 algebra) can also be formulated using
797 <acronym>SQL</acronym>. There are, however,
798 capabilities beyond the scope of relational algebra or calculus. Here
799 is a list of some additional features provided by
800 <acronym>SQL</acronym> that are not
801 part of relational algebra or calculus:
803 <itemizedlist>
804 <listitem>
805 <para>
806 Commands for insertion, deletion or modification of data.
807 </para>
808 </listitem>
810 <listitem>
811 <para>
812 Arithmetic capability: In <acronym>SQL</acronym> it is possible
813 to involve
814 arithmetic operations as well as comparisons, e.g.
816 <programlisting>
817 A &lt; B + 3.
818 </programlisting>
820 Note
821 that + or other arithmetic operators appear neither in relational
822 algebra nor in relational calculus.
823 </para>
824 </listitem>
826 <listitem>
827 <para>
828 Assignment and Print Commands: It is possible to print a
829 relation constructed by a query and to assign a computed relation to a
830 relation name.
831 </para>
832 </listitem>
834 <listitem>
835 <para>
836 Aggregate Functions: Operations such as
837 <firstterm>average</firstterm>, <firstterm>sum</firstterm>,
838 <firstterm>max</firstterm>, etc. can be applied to columns of a
839 relation to
840 obtain a single quantity.
841 </para>
842 </listitem>
843 </itemizedlist>
844 </para>
846 <sect2 id="select">
847 <title id="select-title">Select</title>
849 <para>
850 The most often used command in <acronym>SQL</acronym> is the
851 <command>SELECT</command> statement,
852 used to retrieve data. The syntax is:
854 <synopsis>
855 SELECT [ ALL | DISTINCT [ ON ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) ] ]
856 * | <replaceable class="PARAMETER">expression</replaceable> [ [ AS ] <replaceable class="PARAMETER">output_name</replaceable> ] [, ...]
857 [ INTO [ TEMPORARY | TEMP ] [ TABLE ] <replaceable class="PARAMETER">new_table</replaceable> ]
858 [ FROM <replaceable class="PARAMETER">from_item</replaceable> [, ...] ]
859 [ WHERE <replaceable class="PARAMETER">condition</replaceable> ]
860 [ GROUP BY <replaceable class="PARAMETER">expression</replaceable> [, ...] ]
861 [ HAVING <replaceable class="PARAMETER">condition</replaceable> [, ...] ]
862 [ { UNION | INTERSECT | EXCEPT } [ ALL ] <replaceable class="PARAMETER">select</replaceable> ]
863 [ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ]
864 [ LIMIT { <replaceable class="PARAMETER">count</replaceable> | ALL } ]
865 [ OFFSET <replaceable class="PARAMETER">start</replaceable> ]
866 [ FOR { UPDATE | SHARE } [ OF <replaceable class="parameter">table_name</replaceable> [, ...] ] [ NOWAIT ] [...] ]
867 </synopsis>
868 </para>
870 <para>
871 Now we will illustrate the complex syntax of the
872 <command>SELECT</command> statement with various examples. The
873 tables used for the examples are defined in <xref
874 linkend="supplier-fig" endterm="supplier-fig">.
875 </para>
877 <sect3>
878 <title>Simple Selects</title>
880 <para>
881 Here are some simple examples using a <command>SELECT</command> statement:
883 <example>
884 <title id="simple-query">Simple Query with Qualification</title>
885 <para>
886 To retrieve all tuples from table PART where the attribute PRICE is
887 greater than 10 we formulate the following query:
889 <programlisting>
890 SELECT * FROM PART
891 WHERE PRICE &gt; 10;
892 </programlisting>
894 and get the table:
896 <programlisting>
897 PNO | PNAME | PRICE
898 -----+---------+--------
899 3 | Bolt | 15
900 4 | Cam | 25
901 </programlisting>
902 </para>
904 <para>
905 Using <quote>*</quote> in the <command>SELECT</command> statement
906 will deliver all attributes from the table. If we want to retrieve
907 only the attributes PNAME and PRICE from table PART we use the
908 statement:
910 <programlisting>
911 SELECT PNAME, PRICE
912 FROM PART
913 WHERE PRICE &gt; 10;
914 </programlisting>
916 In this case the result is:
918 <programlisting>
919 PNAME | PRICE
920 --------+--------
921 Bolt | 15
922 Cam | 25
923 </programlisting>
925 Note that the <acronym>SQL</acronym> <command>SELECT</command>
926 corresponds to the <quote>projection</quote> in relational algebra
927 not to the <quote>selection</quote> (see <xref linkend="rel-alg"
928 endterm="rel-alg"> for more details).
929 </para>
931 <para>
932 The qualifications in the WHERE clause can also be logically connected
933 using the keywords OR, AND, and NOT:
935 <programlisting>
936 SELECT PNAME, PRICE
937 FROM PART
938 WHERE PNAME = 'Bolt' AND
939 (PRICE = 0 OR PRICE &lt;= 15);
940 </programlisting>
942 will lead to the result:
944 <programlisting>
945 PNAME | PRICE
946 --------+--------
947 Bolt | 15
948 </programlisting>
949 </para>
951 <para>
952 Arithmetic operations can be used in the target list and in the WHERE
953 clause. For example if we want to know how much it would cost if we
954 take two pieces of a part we could use the following query:
956 <programlisting>
957 SELECT PNAME, PRICE * 2 AS DOUBLE
958 FROM PART
959 WHERE PRICE * 2 &lt; 50;
960 </programlisting>
962 and we get:
964 <programlisting>
965 PNAME | DOUBLE
966 --------+---------
967 Screw | 20
968 Nut | 16
969 Bolt | 30
970 </programlisting>
972 Note that the word DOUBLE after the keyword AS is the new title of the
973 second column. This technique can be used for every element of the
974 target list to assign a new title to the resulting
975 column. This new title
976 is often referred to as alias. The alias cannot be used throughout the
977 rest of the query.
978 </para>
979 </example>
980 </para>
981 </sect3>
983 <sect3>
984 <title>Joins</title>
986 <para id="simple-join">
987 The following example shows how <firstterm>joins</firstterm> are
988 realized in <acronym>SQL</acronym>.
989 </para>
991 <para>
992 To join the three tables SUPPLIER, PART and SELLS over their common
993 attributes we formulate the following statement:
995 <programlisting>
996 SELECT S.SNAME, P.PNAME
997 FROM SUPPLIER S, PART P, SELLS SE
998 WHERE S.SNO = SE.SNO AND
999 P.PNO = SE.PNO;
1000 </programlisting>
1002 and get the following table as a result:
1004 <programlisting>
1005 SNAME | PNAME
1006 -------+-------
1007 Smith | Screw
1008 Smith | Nut
1009 Jones | Cam
1010 Adams | Screw
1011 Adams | Bolt
1012 Blake | Nut
1013 Blake | Bolt
1014 Blake | Cam
1015 </programlisting>
1016 </para>
1018 <para>
1019 In the FROM clause we introduced an alias name for every relation
1020 because there are common named attributes (SNO and PNO) among the
1021 relations. Now we can distinguish between the common named attributes
1022 by simply prefixing the attribute name with the alias name followed by
1023 a dot. The join is calculated in the same way as shown in
1024 <xref linkend="join-example" endterm="join-example">.
1025 First the Cartesian product
1027 SUPPLIER &times; PART &times; SELLS
1029 is derived. Now only those tuples satisfying the
1030 conditions given in the WHERE clause are selected (i.e. the common
1031 named attributes have to be equal). Finally we project out all
1032 columns but S.SNAME and P.PNAME.
1033 </para>
1035 <para>
1036 Another way to perform joins is to use the SQL JOIN syntax as follows:
1037 <programlisting>
1038 select sname, pname from supplier
1039 JOIN sells USING (sno)
1040 JOIN part USING (pno);
1041 </programlisting>
1042 giving again:
1043 <programlisting>
1044 sname | pname
1045 -------+-------
1046 Smith | Screw
1047 Adams | Screw
1048 Smith | Nut
1049 Blake | Nut
1050 Adams | Bolt
1051 Blake | Bolt
1052 Jones | Cam
1053 Blake | Cam
1054 (8 rows)
1055 </programlisting>
1056 </para>
1058 <para>
1059 A joined table, created using JOIN syntax, is a table reference list
1060 item that occurs in a FROM clause and before any WHERE, GROUP BY,
1061 or HAVING clause. Other table references, including table names or
1062 other JOIN clauses, can be included in the FROM clause if separated
1063 by commas. JOINed tables are logically like any other
1064 table listed in the FROM clause.
1065 </para>
1067 <para>
1068 SQL JOINs come in two main types, CROSS JOINs (unqualified joins)
1069 and <firstterm>qualified JOINs</>. Qualified joins can be further
1070 subdivided based on the way in which the <firstterm>join condition</>
1071 is specified (ON, USING, or NATURAL) and the way in which it is
1072 applied (INNER or OUTER join).
1073 </para>
1075 <variablelist>
1076 <title>Join Types</title>
1077 <varlistentry>
1078 <term>CROSS JOIN</term>
1079 <listitem>
1080 <cmdsynopsis>
1081 <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg>
1082 <command> CROSS JOIN </command>
1083 <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg>
1084 </cmdsynopsis>
1086 <para>
1087 A cross join takes two tables T1 and T2 having N and M rows
1088 respectively, and returns a joined table containing all
1089 N*M possible joined rows. For each row R1 of T1, each row
1090 R2 of T2 is joined with R1 to yield a joined table row JR
1091 consisting of all fields in R1 and R2. A CROSS JOIN is
1092 equivalent to an INNER JOIN ON TRUE.
1093 </para>
1094 </listitem>
1095 </varlistentry>
1097 <varlistentry>
1098 <term>Qualified JOINs</term>
1099 <listitem>
1101 <cmdsynopsis>
1102 <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg>
1103 <arg choice="opt"> NATURAL </arg>
1104 <group choice="opt">
1105 <arg choice="opt"> INNER </arg>
1106 <arg>
1107 <group choice="req">
1108 <arg choice="plain"> LEFT </arg>
1109 <arg choice="plain"> RIGHT </arg>
1110 <arg choice="plain"> FULL </arg>
1111 </group>
1112 <arg choice="opt"> OUTER </arg>
1113 </arg>
1114 </group>
1115 <command> JOIN </command>
1116 <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg>
1117 <group choice="req">
1118 <arg> ON <replaceable>search condition</replaceable></arg>
1119 <arg> USING ( <replaceable>join column list</replaceable> ) </arg>
1120 </group>
1121 </cmdsynopsis>
1123 <para>
1124 A qualified JOIN must specify its join condition
1125 by providing one (and only one) of NATURAL, ON, or
1126 USING. The ON clause
1127 takes a <replaceable>search condition</replaceable>,
1128 which is the same as in a WHERE clause. The USING
1129 clause takes a comma-separated list of column names,
1130 which the joined tables must have in common, and joins
1131 the tables on equality of those columns. NATURAL is
1132 shorthand for a USING clause that lists all the common
1133 column names of the two tables. A side-effect of both
1134 USING and NATURAL is that only one copy of each joined
1135 column is emitted into the result table (compare the
1136 relational-algebra definition of JOIN, shown earlier).
1137 </para>
1139 <!-- begin join semantics -->
1140 <variablelist>
1141 <varlistentry>
1142 <term>
1143 <cmdsynopsis>
1144 <arg> INNER </arg>
1145 <command> JOIN </command>
1146 </cmdsynopsis>
1147 </term>
1148 <listitem>
1149 <para>
1150 For each row R1 of T1, the joined table has a row for each row
1151 in T2 that satisfies the join condition with R1.
1152 </para>
1153 <tip>
1154 <para>
1155 The words INNER and OUTER are optional for all JOINs.
1156 INNER is the default. LEFT, RIGHT, and FULL imply an
1157 OUTER JOIN.
1158 </para>
1159 </tip>
1160 </listitem>
1161 </varlistentry>
1162 <varlistentry>
1163 <term>
1164 <cmdsynopsis>
1165 <arg choice="plain"> LEFT </arg>
1166 <arg> OUTER </arg>
1167 <command> JOIN </command>
1168 </cmdsynopsis>
1169 </term>
1170 <listitem>
1171 <para>
1172 First, an INNER JOIN is performed.
1173 Then, for each row in T1 that does not satisfy the join
1174 condition with any row in T2, an additional joined row is
1175 returned with null fields in the columns from T2.
1176 </para>
1177 <tip>
1178 <para>
1179 The joined table unconditionally has a row for each row in T1.
1180 </para>
1181 </tip>
1182 </listitem>
1183 </varlistentry>
1184 <varlistentry>
1185 <term>
1186 <cmdsynopsis>
1187 <arg choice="plain"> RIGHT </arg>
1188 <arg> OUTER </arg>
1189 <command> JOIN </command>
1190 </cmdsynopsis>
1191 </term>
1192 <listitem>
1193 <para>
1194 First, an INNER JOIN is performed.
1195 Then, for each row in T2 that does not satisfy the join
1196 condition with any row in T1, an additional joined row is
1197 returned with null fields in the columns from T1.
1198 </para>
1199 <tip>
1200 <para>
1201 The joined table unconditionally has a row for each row in T2.
1202 </para>
1203 </tip>
1204 </listitem>
1205 </varlistentry>
1206 <varlistentry>
1207 <term>
1208 <cmdsynopsis>
1209 <arg choice="plain"> FULL </arg>
1210 <arg> OUTER </arg>
1211 <command> JOIN </command>
1212 </cmdsynopsis>
1213 </term>
1214 <listitem>
1215 <para>
1216 First, an INNER JOIN is performed.
1217 Then, for each row in T1 that does not satisfy the join
1218 condition with any row in T2, an additional joined row is
1219 returned with null fields in the columns from T2.
1220 Also, for each row in T2 that does not satisfy the join
1221 condition with any row in T1, an additional joined row is
1222 returned with null fields in the columns from T1.
1223 </para>
1224 <tip>
1225 <para>
1226 The joined table unconditionally has a row for every row of T1
1227 and a row for every row of T2.
1228 </para>
1229 </tip>
1230 </listitem>
1231 </varlistentry>
1232 </variablelist>
1233 <!-- end join semantics -->
1235 </listitem>
1236 </varlistentry>
1237 </variablelist>
1239 <para>
1240 JOINs of all types can be chained together or nested where either or both of
1241 <replaceable class="parameter">T1</replaceable> and
1242 <replaceable class="parameter">T2</replaceable> can be JOINed tables.
1243 Parenthesis can be used around JOIN clauses to control the order
1244 of JOINs which are otherwise processed left to right.
1245 </para>
1247 </sect3>
1249 <sect3>
1250 <title id="aggregates-tutorial">Aggregate Functions</title>
1252 <para>
1253 <acronym>SQL</acronym> provides aggregate functions such as AVG,
1254 COUNT, SUM, MIN, and MAX. The argument(s) of an aggregate function
1255 are evaluated at each row that satisfies the WHERE
1256 clause, and the aggregate function is calculated over this set
1257 of input values. Normally, an aggregate delivers a single
1258 result for a whole <command>SELECT</command> statement. But if
1259 grouping is specified in the query, then a separate calculation
1260 is done over the rows of each group, and an aggregate result is
1261 delivered per group (see next section).
1263 <example>
1264 <title id="aggregates-example">Aggregates</title>
1266 <para>
1267 If we want to know the average cost of all parts in table PART we use
1268 the following query:
1270 <programlisting>
1271 SELECT AVG(PRICE) AS AVG_PRICE
1272 FROM PART;
1273 </programlisting>
1274 </para>
1276 <para>
1277 The result is:
1279 <programlisting>
1280 AVG_PRICE
1281 -----------
1282 14.5
1283 </programlisting>
1284 </para>
1286 <para>
1287 If we want to know how many parts are defined in table PART we use
1288 the statement:
1290 <programlisting>
1291 SELECT COUNT(PNO)
1292 FROM PART;
1293 </programlisting>
1295 and get:
1297 <programlisting>
1298 COUNT
1299 -------
1301 </programlisting>
1303 </para>
1304 </example>
1305 </para>
1306 </sect3>
1308 <sect3>
1309 <title>Aggregation by Groups</title>
1311 <para>
1312 <acronym>SQL</acronym> allows one to partition the tuples of a table
1313 into groups. Then the
1314 aggregate functions described above can be applied to the groups &mdash;
1315 i.e. the value of the aggregate function is no longer calculated over
1316 all the values of the specified column but over all values of a
1317 group. Thus the aggregate function is evaluated separately for every
1318 group.
1319 </para>
1321 <para>
1322 The partitioning of the tuples into groups is done by using the
1323 keywords <command>GROUP BY</command> followed by a list of
1324 attributes that define the
1325 groups. If we have
1326 <command>GROUP BY A<subscript>1</subscript>, &tdot;, A<subscript>k</subscript></command>
1327 we partition
1328 the relation into groups, such that two tuples are in the same group
1329 if and only if they agree on all the attributes
1330 A<subscript>1</subscript>, &tdot;, A<subscript>k</subscript>.
1332 <example>
1333 <title id="aggregates-groupby">Aggregates</title>
1334 <para>
1335 If we want to know how many parts are sold by every supplier we
1336 formulate the query:
1338 <programlisting>
1339 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1340 FROM SUPPLIER S, SELLS SE
1341 WHERE S.SNO = SE.SNO
1342 GROUP BY S.SNO, S.SNAME;
1343 </programlisting>
1345 and get:
1347 <programlisting>
1348 SNO | SNAME | COUNT
1349 -----+-------+-------
1350 1 | Smith | 2
1351 2 | Jones | 1
1352 3 | Adams | 2
1353 4 | Blake | 3
1354 </programlisting>
1355 </para>
1357 <para>
1358 Now let's have a look of what is happening here.
1359 First the join of the
1360 tables SUPPLIER and SELLS is derived:
1362 <programlisting>
1363 S.SNO | S.SNAME | SE.PNO
1364 -------+---------+--------
1365 1 | Smith | 1
1366 1 | Smith | 2
1367 2 | Jones | 4
1368 3 | Adams | 1
1369 3 | Adams | 3
1370 4 | Blake | 2
1371 4 | Blake | 3
1372 4 | Blake | 4
1373 </programlisting>
1374 </para>
1376 <para>
1377 Next we partition the tuples into groups by putting all tuples
1378 together that agree on both attributes S.SNO and S.SNAME:
1380 <programlisting>
1381 S.SNO | S.SNAME | SE.PNO
1382 -------+---------+--------
1383 1 | Smith | 1
1385 --------------------------
1386 2 | Jones | 4
1387 --------------------------
1388 3 | Adams | 1
1390 --------------------------
1391 4 | Blake | 2
1394 </programlisting>
1395 </para>
1397 <para>
1398 In our example we got four groups and now we can apply the aggregate
1399 function COUNT to every group leading to the final result of the query
1400 given above.
1401 </para>
1402 </example>
1403 </para>
1405 <para>
1406 Note that for a query using GROUP BY and aggregate
1407 functions to make sense, the target list can only refer directly to
1408 the attributes being grouped by. Other attributes can only be used
1409 inside the arguments of aggregate functions. Otherwise there would
1410 not be a unique value to associate with the other attributes.
1411 </para>
1413 <para>
1414 Also observe that it makes no sense to ask for an aggregate of
1415 an aggregate, e.g., AVG(MAX(sno)), because a
1416 <command>SELECT</command> only does one pass of grouping and
1417 aggregation. You can get a result of this kind by using a
1418 temporary table or a sub-SELECT in the FROM clause to do the
1419 first level of aggregation.
1420 </para>
1421 </sect3>
1423 <sect3>
1424 <title>Having</title>
1426 <para>
1427 The HAVING clause works much like the WHERE clause and is used to
1428 consider only those groups satisfying the qualification given in the
1429 HAVING clause. Essentially, WHERE filters out unwanted input rows
1430 before grouping and aggregation are done, whereas HAVING filters out
1431 unwanted group rows post-GROUP. Therefore, WHERE cannot refer to the
1432 results of aggregate functions. On the other hand, there's no point
1433 in writing a HAVING condition that doesn't involve an aggregate
1434 function! If your condition doesn't involve aggregates, you might
1435 as well write it in WHERE, and thereby avoid the computation of
1436 aggregates for groups that you're just going to throw away anyway.
1438 <example>
1439 <title id="having-example">Having</title>
1441 <para>
1442 If we want only those suppliers selling more than one part we use the
1443 query:
1445 <programlisting>
1446 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1447 FROM SUPPLIER S, SELLS SE
1448 WHERE S.SNO = SE.SNO
1449 GROUP BY S.SNO, S.SNAME
1450 HAVING COUNT(SE.PNO) &gt; 1;
1451 </programlisting>
1453 and get:
1455 <programlisting>
1456 SNO | SNAME | COUNT
1457 -----+-------+-------
1458 1 | Smith | 2
1459 3 | Adams | 2
1460 4 | Blake | 3
1461 </programlisting>
1462 </para>
1463 </example>
1464 </para>
1465 </sect3>
1467 <sect3>
1468 <title>Subqueries</title>
1470 <para>
1471 In the WHERE and HAVING clauses the use of subqueries (subselects) is
1472 allowed in every place where a value is expected. In this case the
1473 value must be derived by evaluating the subquery first. The usage of
1474 subqueries extends the expressive power of
1475 <acronym>SQL</acronym>.
1477 <example>
1478 <title id="subselect-example">Subselect</title>
1480 <para>
1481 If we want to know all parts having a greater price than the part
1482 named 'Screw' we use the query:
1484 <programlisting>
1485 SELECT *
1486 FROM PART
1487 WHERE PRICE &gt; (SELECT PRICE FROM PART
1488 WHERE PNAME='Screw');
1489 </programlisting>
1490 </para>
1492 <para>
1493 The result is:
1495 <programlisting>
1496 PNO | PNAME | PRICE
1497 -----+---------+--------
1498 3 | Bolt | 15
1499 4 | Cam | 25
1500 </programlisting>
1501 </para>
1503 <para>
1504 When we look at the above query we can see the keyword
1505 <command>SELECT</command> two times. The first one at the
1506 beginning of the query - we will refer to it as outer
1507 <command>SELECT</command> - and the one in the WHERE clause which
1508 begins a nested query - we will refer to it as inner
1509 <command>SELECT</command>. For every tuple of the outer
1510 <command>SELECT</command> the inner <command>SELECT</command> has
1511 to be evaluated. After every evaluation we know the price of the
1512 tuple named 'Screw' and we can check if the price of the actual
1513 tuple is greater. (Actually, in this example the inner query need
1514 only be evaluated once, since it does not depend on the state of
1515 the outer query.)
1516 </para>
1518 <para>
1519 If we want to know all suppliers that do not sell any part
1520 (e.g. to be able to remove these suppliers from the database) we use:
1522 <programlisting>
1523 SELECT *
1524 FROM SUPPLIER S
1525 WHERE NOT EXISTS
1526 (SELECT * FROM SELLS SE
1527 WHERE SE.SNO = S.SNO);
1528 </programlisting>
1529 </para>
1531 <para>
1532 In our example the result will be empty because every supplier
1533 sells at least one part. Note that we use S.SNO from the outer
1534 <command>SELECT</command> within the WHERE clause of the inner
1535 <command>SELECT</command>. Here the subquery must be evaluated
1536 afresh for each tuple from the outer query, i.e. the value for
1537 S.SNO is always taken from the current tuple of the outer
1538 <command>SELECT</command>.
1539 </para>
1540 </example>
1541 </para>
1542 </sect3>
1544 <sect3>
1545 <title>Subqueries in FROM</title>
1547 <para>
1548 A somewhat different way of using subqueries is to put them in the
1549 FROM clause. This is a useful feature because a subquery of this
1550 kind can output multiple columns and rows, whereas a subquery used
1551 in an expression must deliver just a single result. It also lets
1552 us get more than one round of grouping/aggregation without resorting
1553 to a temporary table.
1555 <example>
1556 <title id="subselect-in-from-example">Subselect in FROM</title>
1558 <para>
1559 If we want to know the highest average part price among all our
1560 suppliers, we cannot write MAX(AVG(PRICE)), but we can write:
1562 <programlisting>
1563 SELECT MAX(subtable.avgprice)
1564 FROM (SELECT AVG(P.PRICE) AS avgprice
1565 FROM SUPPLIER S, PART P, SELLS SE
1566 WHERE S.SNO = SE.SNO AND
1567 P.PNO = SE.PNO
1568 GROUP BY S.SNO) subtable;
1569 </programlisting>
1571 The subquery returns one row per supplier (because of its GROUP BY)
1572 and then we aggregate over those rows in the outer query.
1573 </para>
1574 </example>
1575 </para>
1576 </sect3>
1578 <sect3>
1579 <title>Union, Intersect, Except</title>
1581 <para>
1582 These operations calculate the union, intersection and set theoretic
1583 difference of the tuples derived by two subqueries.
1585 <example>
1586 <title id="union-example">Union, Intersect, Except</title>
1588 <para>
1589 The following query is an example for UNION:
1591 <programlisting>
1592 SELECT S.SNO, S.SNAME, S.CITY
1593 FROM SUPPLIER S
1594 WHERE S.SNAME = 'Jones'
1595 UNION
1596 SELECT S.SNO, S.SNAME, S.CITY
1597 FROM SUPPLIER S
1598 WHERE S.SNAME = 'Adams';
1599 </programlisting>
1601 gives the result:
1603 <programlisting>
1604 SNO | SNAME | CITY
1605 -----+-------+--------
1606 2 | Jones | Paris
1607 3 | Adams | Vienna
1608 </programlisting>
1609 </para>
1611 <para>
1612 Here is an example for INTERSECT:
1614 <programlisting>
1615 SELECT S.SNO, S.SNAME, S.CITY
1616 FROM SUPPLIER S
1617 WHERE S.SNO &gt; 1
1618 INTERSECT
1619 SELECT S.SNO, S.SNAME, S.CITY
1620 FROM SUPPLIER S
1621 WHERE S.SNO &lt; 3;
1622 </programlisting>
1624 gives the result:
1626 <programlisting>
1627 SNO | SNAME | CITY
1628 -----+-------+--------
1629 2 | Jones | Paris
1630 </programlisting>
1632 The only tuple returned by both parts of the query is the one having SNO=2.
1633 </para>
1635 <para>
1636 Finally an example for EXCEPT:
1638 <programlisting>
1639 SELECT S.SNO, S.SNAME, S.CITY
1640 FROM SUPPLIER S
1641 WHERE S.SNO &gt; 1
1642 EXCEPT
1643 SELECT S.SNO, S.SNAME, S.CITY
1644 FROM SUPPLIER S
1645 WHERE S.SNO &gt; 3;
1646 </programlisting>
1648 gives the result:
1650 <programlisting>
1651 SNO | SNAME | CITY
1652 -----+-------+--------
1653 2 | Jones | Paris
1654 3 | Adams | Vienna
1655 </programlisting>
1656 </para>
1657 </example>
1658 </para>
1659 </sect3>
1660 </sect2>
1662 <sect2 id="datadef">
1663 <title>Data Definition</title>
1665 <para>
1666 There is a set of commands used for data definition included in the
1667 <acronym>SQL</acronym> language.
1668 </para>
1670 <sect3 id="create">
1671 <title id="create-title">Create Table</title>
1673 <para>
1674 The most fundamental command for data definition is the
1675 one that creates a new relation (a new table). The syntax of the
1676 <command>CREATE TABLE</command> command is:
1678 <synopsis>
1679 CREATE TABLE <replaceable class="parameter">table_name</replaceable>
1680 (<replaceable class="parameter">name_of_attr_1</replaceable> <replaceable class="parameter">type_of_attr_1</replaceable>
1681 [, <replaceable class="parameter">name_of_attr_2</replaceable> <replaceable class="parameter">type_of_attr_2</replaceable>
1682 [, ...]]);
1683 </synopsis>
1685 <example>
1686 <title id="table-create">Table Creation</title>
1688 <para>
1689 To create the tables defined in
1690 <xref linkend="supplier-fig" endterm="supplier-fig"> the
1691 following <acronym>SQL</acronym> statements are used:
1693 <programlisting>
1694 CREATE TABLE SUPPLIER
1695 (SNO INTEGER,
1696 SNAME VARCHAR(20),
1697 CITY VARCHAR(20));
1698 </programlisting>
1700 <programlisting>
1701 CREATE TABLE PART
1702 (PNO INTEGER,
1703 PNAME VARCHAR(20),
1704 PRICE DECIMAL(4 , 2));
1705 </programlisting>
1707 <programlisting>
1708 CREATE TABLE SELLS
1709 (SNO INTEGER,
1710 PNO INTEGER);
1711 </programlisting>
1712 </para>
1713 </example>
1714 </para>
1715 </sect3>
1717 <sect3>
1718 <title>Data Types in <acronym>SQL</acronym></title>
1720 <para>
1721 The following is a list of some data types that are supported by
1722 <acronym>SQL</acronym>:
1724 <itemizedlist>
1725 <listitem>
1726 <para>
1727 INTEGER: signed fullword binary integer (31 bits precision).
1728 </para>
1729 </listitem>
1731 <listitem>
1732 <para>
1733 SMALLINT: signed halfword binary integer (15 bits precision).
1734 </para>
1735 </listitem>
1737 <listitem>
1738 <para>
1739 DECIMAL (<replaceable class="parameter">p</replaceable>[,<replaceable class="parameter">q</replaceable>]):
1740 signed packed decimal number of up to
1741 <replaceable class="parameter">p</replaceable>
1742 digits, with
1743 <replaceable class="parameter">q</replaceable>
1744 digits to the right of the decimal point.
1745 If <replaceable class="parameter">q</replaceable>
1746 is omitted it is assumed to be 0.
1747 </para>
1748 </listitem>
1750 <listitem>
1751 <para>
1752 FLOAT: signed doubleword floating point number.
1753 </para>
1754 </listitem>
1756 <listitem>
1757 <para>
1758 VARCHAR(<replaceable class="parameter">n</replaceable>):
1759 varying length character string of maximum length
1760 <replaceable class="parameter">n</replaceable>.
1761 </para>
1762 </listitem>
1764 <listitem>
1765 <para>
1766 CHAR(<replaceable class="parameter">n</replaceable>):
1767 fixed length character string of length
1768 <replaceable class="parameter">n</replaceable>.
1769 </para>
1770 </listitem>
1772 </itemizedlist>
1773 </para>
1774 </sect3>
1776 <sect3>
1777 <title>Create Index</title>
1779 <para>
1780 Indexes are used to speed up access to a relation. If a relation <classname>R</classname>
1781 has an index on attribute <classname>A</classname> then we can
1782 retrieve all tuples <replaceable>t</replaceable>
1783 having
1784 <replaceable>t</replaceable>(<classname>A</classname>) = <replaceable>a</replaceable>
1785 in time roughly proportional to the number of such
1786 tuples <replaceable>t</replaceable>
1787 rather than in time proportional to the size of <classname>R</classname>.
1788 </para>
1790 <para>
1791 To create an index in <acronym>SQL</acronym>
1792 the <command>CREATE INDEX</command> command is used. The syntax is:
1794 <programlisting>
1795 CREATE INDEX <replaceable class="parameter">index_name</replaceable>
1796 ON <replaceable class="parameter">table_name</replaceable> ( <replaceable class="parameter">name_of_attribute</replaceable> );
1797 </programlisting>
1798 </para>
1800 <para>
1801 <example>
1802 <title id="index-create">Create Index</title>
1804 <para>
1805 To create an index named I on attribute SNAME of relation SUPPLIER
1806 we use the following statement:
1808 <programlisting>
1809 CREATE INDEX I ON SUPPLIER (SNAME);
1810 </programlisting>
1811 </para>
1813 <para>
1814 The created index is maintained automatically, i.e. whenever a new
1815 tuple is inserted into the relation SUPPLIER the index I is
1816 adapted. Note that the only changes a user can perceive when an
1817 index is present are increased speed for <command>SELECT</command>
1818 and decreases in speed of updates.
1819 </para>
1820 </example>
1821 </para>
1822 </sect3>
1824 <sect3>
1825 <title>Create View</title>
1827 <para>
1828 A view can be regarded as a <firstterm>virtual table</firstterm>,
1829 i.e. a table that
1830 does not <emphasis>physically</emphasis> exist in the database
1831 but looks to the user
1832 as if it does. By contrast, when we talk of a
1833 <firstterm>base table</firstterm> there is
1834 really a physically stored counterpart of each row of the table
1835 somewhere in the physical storage.
1836 </para>
1838 <para>
1839 Views do not have their own, physically separate, distinguishable
1840 stored data. Instead, the system stores the definition of the
1841 view (i.e. the rules about how to access physically stored base
1842 tables in order to materialize the view) somewhere in the system
1843 catalogs (see
1844 <xref linkend="tutorial-catalogs-title" endterm="tutorial-catalogs-title">). For a
1845 discussion on different techniques to implement views refer to
1846 <!--
1847 section
1848 <xref linkend="view-impl" endterm="view-impl">.
1850 <citetitle>SIM98</citetitle>.
1851 </para>
1853 <para>
1854 In <acronym>SQL</acronym> the <command>CREATE VIEW</command>
1855 command is used to define a view. The syntax
1858 <programlisting>
1859 CREATE VIEW <replaceable class="parameter">view_name</replaceable>
1860 AS <replaceable class="parameter">select_stmt</replaceable>
1861 </programlisting>
1863 where <replaceable class="parameter">select_stmt</replaceable>
1864 is a valid select statement as defined
1865 in <xref linkend="select-title" endterm="select-title">.
1866 Note that <replaceable class="parameter">select_stmt</replaceable> is
1867 not executed when the view is created. It is just stored in the
1868 <firstterm>system catalogs</firstterm>
1869 and is executed whenever a query against the view is made.
1870 </para>
1872 <para>
1873 Let the following view definition be given (we use
1874 the tables from
1875 <xref linkend="supplier-fig" endterm="supplier-fig"> again):
1877 <programlisting>
1878 CREATE VIEW London_Suppliers
1879 AS SELECT S.SNAME, P.PNAME
1880 FROM SUPPLIER S, PART P, SELLS SE
1881 WHERE S.SNO = SE.SNO AND
1882 P.PNO = SE.PNO AND
1883 S.CITY = 'London';
1884 </programlisting>
1885 </para>
1887 <para>
1888 Now we can use this <firstterm>virtual relation</firstterm>
1889 <classname>London_Suppliers</classname> as
1890 if it were another base table:
1892 <programlisting>
1893 SELECT * FROM London_Suppliers
1894 WHERE PNAME = 'Screw';
1895 </programlisting>
1897 which will return the following table:
1899 <programlisting>
1900 SNAME | PNAME
1901 -------+-------
1902 Smith | Screw
1903 </programlisting>
1904 </para>
1906 <para>
1907 To calculate this result the database system has to do a
1908 <emphasis>hidden</emphasis>
1909 access to the base tables SUPPLIER, SELLS and PART first. It
1910 does so by executing the query given in the view definition against
1911 those base tables. After that the additional qualifications
1912 (given in the
1913 query against the view) can be applied to obtain the resulting
1914 table.
1915 </para>
1916 </sect3>
1918 <sect3>
1919 <title>Drop Table, Drop Index, Drop View</title>
1921 <para>
1922 To destroy a table (including all tuples stored in that table) the
1923 <command>DROP TABLE</command> command is used:
1925 <programlisting>
1926 DROP TABLE <replaceable class="parameter">table_name</replaceable>;
1927 </programlisting>
1928 </para>
1930 <para>
1931 To destroy the SUPPLIER table use the following statement:
1933 <programlisting>
1934 DROP TABLE SUPPLIER;
1935 </programlisting>
1936 </para>
1938 <para>
1939 The <command>DROP INDEX</command> command is used to destroy an index:
1941 <programlisting>
1942 DROP INDEX <replaceable class="parameter">index_name</replaceable>;
1943 </programlisting>
1944 </para>
1946 <para>
1947 Finally to destroy a given view use the command <command>DROP
1948 VIEW</command>:
1950 <programlisting>
1951 DROP VIEW <replaceable class="parameter">view_name</replaceable>;
1952 </programlisting>
1953 </para>
1954 </sect3>
1955 </sect2>
1957 <sect2>
1958 <title>Data Manipulation</title>
1960 <sect3>
1961 <title>Insert Into</title>
1963 <para>
1964 Once a table is created (see
1965 <xref linkend="create-title" endterm="create-title">), it can be filled
1966 with tuples using the command <command>INSERT INTO</command>.
1967 The syntax is:
1969 <programlisting>
1970 INSERT INTO <replaceable class="parameter">table_name</replaceable> (<replaceable class="parameter">name_of_attr_1</replaceable>
1971 [, <replaceable class="parameter">name_of_attr_2</replaceable> [,...]])
1972 VALUES (<replaceable class="parameter">val_attr_1</replaceable> [, <replaceable class="parameter">val_attr_2</replaceable> [, ...]]);
1973 </programlisting>
1974 </para>
1976 <para>
1977 To insert the first tuple into the relation SUPPLIER (from
1978 <xref linkend="supplier-fig" endterm="supplier-fig">) we use the
1979 following statement:
1981 <programlisting>
1982 INSERT INTO SUPPLIER (SNO, SNAME, CITY)
1983 VALUES (1, 'Smith', 'London');
1984 </programlisting>
1985 </para>
1987 <para>
1988 To insert the first tuple into the relation SELLS we use:
1990 <programlisting>
1991 INSERT INTO SELLS (SNO, PNO)
1992 VALUES (1, 1);
1993 </programlisting>
1994 </para>
1995 </sect3>
1997 <sect3>
1998 <title>Update</title>
2000 <para>
2001 To change one or more attribute values of tuples in a relation the
2002 <command>UPDATE</command> command is used. The syntax is:
2004 <programlisting>
2005 UPDATE <replaceable class="parameter">table_name</replaceable>
2006 SET <replaceable class="parameter">name_of_attr_1</replaceable> = <replaceable class="parameter">value_1</replaceable>
2007 [, ... [, <replaceable class="parameter">name_of_attr_k</replaceable> = <replaceable class="parameter">value_k</replaceable>]]
2008 WHERE <replaceable class="parameter">condition</replaceable>;
2009 </programlisting>
2010 </para>
2012 <para>
2013 To change the value of attribute PRICE of the part 'Screw' in the
2014 relation PART we use:
2016 <programlisting>
2017 UPDATE PART
2018 SET PRICE = 15
2019 WHERE PNAME = 'Screw';
2020 </programlisting>
2021 </para>
2023 <para>
2024 The new value of attribute PRICE of the tuple whose name is 'Screw' is
2025 now 15.
2026 </para>
2027 </sect3>
2029 <sect3>
2030 <title>Delete</title>
2032 <para>
2033 To delete a tuple from a particular table use the command DELETE
2034 FROM. The syntax is:
2036 <programlisting>
2037 DELETE FROM <replaceable class="parameter">table_name</replaceable>
2038 WHERE <replaceable class="parameter">condition</replaceable>;
2039 </programlisting>
2040 </para>
2042 <para>
2043 To delete the supplier called 'Smith' of the table SUPPLIER the
2044 following statement is used:
2046 <programlisting>
2047 DELETE FROM SUPPLIER
2048 WHERE SNAME = 'Smith';
2049 </programlisting>
2050 </para>
2051 </sect3>
2052 </sect2>
2054 <sect2 id="tutorial-catalogs">
2055 <title id="tutorial-catalogs-title">System Catalogs</title>
2057 <para>
2058 In every <acronym>SQL</acronym> database system
2059 <firstterm>system catalogs</firstterm> are used to keep
2060 track of which tables, views indexes etc. are defined in the
2061 database. These system catalogs can be queried as if they were normal
2062 relations. For example there is one catalog used for the definition of
2063 views. This catalog stores the query from the view definition. Whenever
2064 a query against a view is made, the system first gets the
2065 <firstterm>view definition query</firstterm> out of the catalog
2066 and materializes the view
2067 before proceeding with the user query (see
2068 <!--
2069 section
2070 <xref linkend="view-impl" endterm="view-impl">.
2071 <citetitle>SIM98</citetitle>
2073 <xref linkend="SIM98" endterm="SIM98">
2074 for a more detailed
2075 description). For more information about system catalogs refer to
2076 <xref linkend="DATE04" endterm="DATE04">.
2077 </para>
2078 </sect2>
2080 <sect2>
2081 <title>Embedded <acronym>SQL</acronym></title>
2083 <para>
2084 In this section we will sketch how <acronym>SQL</acronym> can be
2085 embedded into a host language (e.g. <literal>C</literal>).
2086 There are two main reasons why we want to use <acronym>SQL</acronym>
2087 from a host language:
2089 <itemizedlist>
2090 <listitem>
2091 <para>
2092 There are queries that cannot be formulated using pure <acronym>SQL</acronym>
2093 (i.e. recursive queries). To be able to perform such queries we need a
2094 host language with a greater expressive power than
2095 <acronym>SQL</acronym>.
2096 </para>
2097 </listitem>
2099 <listitem>
2100 <para>
2101 We simply want to access a database from some application that
2102 is written in the host language (e.g. a ticket reservation system
2103 with a graphical user interface is written in C and the information
2104 about which tickets are still left is stored in a database that can be
2105 accessed using embedded <acronym>SQL</acronym>).
2106 </para>
2107 </listitem>
2108 </itemizedlist>
2109 </para>
2111 <para>
2112 A program using embedded <acronym>SQL</acronym>
2113 in a host language consists of statements
2114 of the host language and of
2115 <firstterm>embedded <acronym>SQL</acronym></firstterm>
2116 (<acronym>ESQL</acronym>) statements. Every <acronym>ESQL</acronym>
2117 statement begins with the keywords <command>EXEC SQL</command>.
2118 The <acronym>ESQL</acronym> statements are
2119 transformed to statements of the host language
2120 by a <firstterm>precompiler</firstterm>
2121 (which usually inserts
2122 calls to library routines that perform the various <acronym>SQL</acronym>
2123 commands).
2124 </para>
2126 <para>
2127 When we look at the examples throughout
2128 <xref linkend="select-title" endterm="select-title"> we
2129 realize that the result of the queries is very often a set of
2130 tuples. Most host languages are not designed to operate on sets so we
2131 need a mechanism to access every single tuple of the set of tuples
2132 returned by a SELECT statement. This mechanism can be provided by
2133 declaring a <firstterm>cursor</firstterm>.
2134 After that we can use the <command>FETCH</command> command to
2135 retrieve a tuple and set the cursor to the next tuple.
2136 </para>
2138 <para>
2139 For a detailed discussion on embedded <acronym>SQL</acronym>
2140 refer to
2141 <xref linkend="DATE97" endterm="DATE97">,
2142 <xref linkend="DATE04" endterm="DATE04">,
2144 <xref linkend="ULL88" endterm="ULL88">.
2145 </para>
2146 </sect2>
2147 </sect1>
2148 </chapter>