doc/src/sgml/sql.sgml

   1 <!-- $PostgreSQL$ -->
   2
   3  <chapter id="sql-intro">
   4   <title>SQL</title>
   5
   6   <abstract>
   7    <para>
   8     This chapter introduces the mathematical concepts behind
   9     relational databases. It is not required reading, so if you bog
  10     down or want to get straight to some simple examples feel free to
  11     jump ahead to the next chapter and come back when you have more
  12     time and patience. This stuff is supposed to be fun!
  13    </para>
  14
  15    <para>
  16     This material originally appeared as a part of
  17     Stefan Simkovics' Master's Thesis
  18     (<xref linkend="SIM98" endterm="SIM98">).
  19    </para>
  20   </abstract>
  21
  22   <para>
  23    <acronym>SQL</acronym> has become the most popular relational query
  24    language.
  25    The name <quote><acronym>SQL</acronym></quote> is an abbreviation for
  26    <firstterm>Structured Query Language</firstterm>.
  27    In 1974 Donald Chamberlin and others defined the
  28    language SEQUEL (<firstterm>Structured English Query
  29     Language</firstterm>) at IBM
  30    Research. This language was first implemented in an IBM
  31    prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version
  32    of SEQUEL called SEQUEL/2 was defined and the name was changed to
  33    <acronym>SQL</acronym>
  34    subsequently.
  35   </para>
  36
  37   <para>
  38    A new prototype called System R was developed by IBM in 1977. System R
  39    implemented a large subset of SEQUEL/2 (now <acronym>SQL</acronym>)
  40    and a number of
  41    changes were made to <acronym>SQL</acronym> during the project.
  42    System R was installed in
  43    a number of user sites, both internal IBM sites and also some selected
  44    customer sites. Thanks to the success and acceptance of System R at
  45    those user sites IBM started to develop commercial products that
  46    implemented the <acronym>SQL</acronym> language based on the System
  47    R technology.
  48   </para>
  49
  50   <para>
  51    Over the next years IBM and also a number of other vendors announced
  52    <acronym>SQL</acronym> products such as
  53    <productname>SQL/DS</productname> (IBM),
  54    <productname>DB2</productname> (IBM),
  55    <productname>ORACLE</productname> (Oracle Corp.),
  56    <productname>DG/SQL</productname> (Data General Corp.),
  57    and <productname>SYBASE</productname> (Sybase Inc.).
  58   </para>
  59
  60   <para>
  61    <acronym>SQL</acronym> is also an official standard now. In 1982
  62    the American National
  63    Standards Institute (<acronym>ANSI</acronym>) chartered its
  64    Database Committee X3H2 to
  65    develop a proposal for a standard relational language. This proposal
  66    was ratified in 1986 and consisted essentially of the IBM dialect of
  67    <acronym>SQL</acronym>. In 1987 this <acronym>ANSI</acronym>
  68    standard was also accepted as an international
  69    standard by the International Organization for Standardization
  70    (<acronym>ISO</acronym>).
  71    This original standard version of <acronym>SQL</acronym> is often
  72    referred to,
  73    informally, as <quote><abbrev>SQL/86</abbrev></quote>. In 1989 the original
  74    standard was extended
  75    and this new standard is often, again informally, referred to as
  76    <quote><abbrev>SQL/89</abbrev></quote>. Also in 1989, a related standard called
  77    <firstterm>Database Language Embedded <acronym>SQL</acronym></firstterm>
  78    (<acronym>ESQL</acronym>) was developed.
  79   </para>
  80
  81   <para>
  82    The <acronym>ISO</acronym> and <acronym>ANSI</acronym> committees
  83    have been working for many years on the
  84    definition of a greatly expanded version of the original standard,
  85    referred to informally as <firstterm><acronym>SQL2</acronym></firstterm>
  86    or <firstterm><acronym>SQL/92</acronym></firstterm>. This version became a
  87    ratified standard - <quote>International Standard ISO/IEC 9075:1992,
  88    Database Language <acronym>SQL</acronym></quote> - in late 1992.
  89    <acronym>SQL/92</acronym> is the version
  90    normally meant when people refer to <quote>the <acronym>SQL</acronym>
  91    standard</quote>. A detailed
  92    description of <acronym>SQL/92</acronym> is given in
  93    <xref linkend="DATE97" endterm="DATE97">. At the time of
  94    writing this document a new standard informally referred to
  95    as <firstterm><acronym>SQL3</acronym></firstterm>
  96    is under development. It is planned to make <acronym>SQL</acronym>
  97    a Turing-complete
  98    language, i.e. all computable queries (e.g. recursive queries) will be
  99    possible. This has now been completed as SQL:2003.
 100   </para>
 101
 102   <sect1 id="rel-model">
 103    <title>The Relational Data Model</title>
 104
 105   <para>
 106     As mentioned before, <acronym>SQL</acronym> is a relational
 107     language. That means it is
 108     based on the <firstterm>relational data model</firstterm>
 109     first published by E.F. Codd in
 110     1970. We will give a formal description of the relational model
 111     later (in
 112     <xref linkend="formal-notion" endterm="formal-notion">)
 113     but first we want to have a look at it from a more intuitive
 114     point of view.
 115   </para>
 116
 117   <para>
 118     A <firstterm>relational database</firstterm> is a database that is
 119     perceived by its
 120     users as a <firstterm>collection of tables</firstterm> (and
 121     nothing else but tables).
 122     A table consists of rows and columns where each row represents a
 123     record and each column represents an attribute of the records
 124     contained in the table.
 125     <xref linkend="supplier-fig" endterm="supplier-fig">
 126     shows an example of a database consisting of three tables:
 127
 128     <itemizedlist>
 129      <listitem>
 130       <para>
 131        SUPPLIER is a table storing the number
 132        (SNO), the name (SNAME) and the city (CITY) of a supplier.
 133       </para>
 134      </listitem>
 135
 136      <listitem>
 137       <para>
 138        PART is a table storing the number (PNO) the name (PNAME) and
 139        the price (PRICE) of a part.
 140       </para>
 141      </listitem>
 142
 143      <listitem>
 144       <para>
 145        SELLS stores information about which part (PNO) is sold by which
 146        supplier (SNO).
 147        It serves in a sense to connect the other two tables together.
 148       </para>
 149      </listitem>
 150     </itemizedlist>
 151
 152     <example>
 153      <title id="supplier-fig">The Suppliers and Parts Database</title>
 154      <programlisting>
 155 SUPPLIER:                   SELLS:
 156  SNO |  SNAME  |  CITY       SNO | PNO
 157 ----+---------+--------     -----+-----
 158  1  |  Smith  | London        1  |  1
 159  2  |  Jones  | Paris         1  |  2
 160  3  |  Adams  | Vienna        2  |  4
 161  4  |  Blake  | Rome          3  |  1
 162                               3  |  3
 163                               4  |  2
 164 PART:                         4  |  3
 165  PNO |  PNAME  |  PRICE       4  |  4
 166 ----+---------+---------
 167  1  |  Screw  |   10
 168  2  |  Nut    |    8
 169  3  |  Bolt   |   15
 170  4  |  Cam    |   25
 171      </programlisting>
 172     </example>
 173    </para>
 174
 175    <para>
 176     The tables PART and SUPPLIER can be regarded as
 177     <firstterm>entities</firstterm> and
 178     SELLS can be regarded as a <firstterm>relationship</firstterm>
 179     between a particular
 180     part and a particular supplier.
 181    </para>
 182
 183    <para>
 184     As we will see later, <acronym>SQL</acronym> operates on tables
 185     like the ones just
 186     defined but before that we will study the theory of the relational
 187     model.
 188    </para>
 189   </sect1>
 190
 191   <sect1 id="relmodel-formal">
 192    <title id="formal-notion">Relational Data Model Formalities</title>
 193
 194    <para>
 195     The mathematical concept underlying the relational model is the
 196     set-theoretic <firstterm>relation</firstterm> which is a subset of
 197     the Cartesian
 198     product of a list of domains. This set-theoretic relation gives
 199     the model its name (do not confuse it with the relationship from the
 200     <firstterm>Entity-Relationship model</firstterm>).
 201     Formally a domain is simply a set of
 202     values. For example the set of integers is a domain. Also the set of
 203     character strings of length 20 and the real numbers are examples of
 204     domains.
 205    </para>
 206
 207    <para>
 208 <!--
 209 \begin{definition}
 210 The <firstterm>Cartesian product</firstterm> of domains $D_{1},
 211     D_{2},\ldots, D_{k}$ written
 212 \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set of
 213 all $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \in
 214 D_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}.
 215 \end{definition}
 216 -->
 217     The <firstterm>Cartesian product</firstterm> of domains
 218     <parameter>D<subscript>1</subscript></parameter>,
 219     <parameter>D<subscript>2</subscript></parameter>,
 220     ...
 221     <parameter>D<subscript>k</subscript></parameter>,
 222     written
 223     <parameter>D<subscript>1</subscript></parameter> &times;
 224     <parameter>D<subscript>2</subscript></parameter> &times;
 225     ... &times;
 226     <parameter>D<subscript>k</subscript></parameter>
 227     is the set of all k-tuples
 228     <parameter>v<subscript>1</subscript></parameter>,
 229     <parameter>v<subscript>2</subscript></parameter>,
 230     ...
 231     <parameter>v<subscript>k</subscript></parameter>,
 232     such that
 233     <parameter>v<subscript>1</subscript></parameter> &isin;
 234     <parameter>D<subscript>1</subscript></parameter>,
 235     <parameter>v<subscript>2</subscript></parameter> &isin;
 236     <parameter>D<subscript>2</subscript></parameter>,
 237     ...
 238     <parameter>v<subscript>k</subscript></parameter> &isin;
 239     <parameter>D<subscript>k</subscript></parameter>.
 240    </para>
 241
 242    <para>
 243     For example, when we have
 244 <!--
 245  $k=2$, $D_{1}=\{0,1\}$ and
 246 $D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is
 247 $\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$.
 248 -->
 249     <parameter>k</parameter>=2,
 250     <parameter>D<subscript>1</subscript></parameter>=<literal>{0,1}</literal> and
 251     <parameter>D<subscript>2</subscript></parameter>=<literal>{a,b,c}</literal> then
 252     <parameter>D<subscript>1</subscript></parameter> &times;
 253     <parameter>D<subscript>2</subscript></parameter> is
 254     <literal>{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)}</literal>.
 255    </para>
 256
 257    <para>
 258 <!--
 259 \begin{definition}
 260 A Relation is any subset of the Cartesian product of one or more
 261 domains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$}
 262 \end{definition}
 263 -->
 264     A Relation is any subset of the Cartesian product of one or more
 265     domains: <parameter>R</parameter> &sube;
 266     <parameter>D<subscript>1</subscript></parameter> &times;
 267     <parameter>D<subscript>2</subscript></parameter> &times;
 268     ... &times;
 269     <parameter>D<subscript>k</subscript></parameter>.
 270    </para>
 271
 272    <para>
 273     For example <literal>{(0,a),(0,b),(1,a)}</literal> is a relation;
 274     it is in fact a subset of
 275     <parameter>D<subscript>1</subscript></parameter> &times;
 276     <parameter>D<subscript>2</subscript></parameter>
 277     mentioned above.
 278    </para>
 279
 280    <para>
 281     The members of a relation are called tuples. Each relation of some
 282     Cartesian product
 283     <parameter>D<subscript>1</subscript></parameter> &times;
 284     <parameter>D<subscript>2</subscript></parameter> &times;
 285     ... &times;
 286     <parameter>D<subscript>k</subscript></parameter>
 287     is said to have arity <literal>k</literal> and is therefore a set
 288     of <literal>k</literal>-tuples.
 289    </para>
 290
 291    <para>
 292     A relation can be viewed as a table (as we already did, remember
 293     <xref linkend="supplier-fig" endterm="supplier-fig"> where
 294     every tuple is represented by a row and every column corresponds to
 295     one component of a tuple. Giving names (called attributes) to the
 296     columns leads to the definition of a
 297     <firstterm>relation scheme</firstterm>.
 298    </para>
 299
 300    <para>
 301 <!--
 302 \begin{definition}
 303 A {\it relation scheme} $R$ is a finite set of attributes
 304 \mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ for
 305 each attribute $A_{i}, 1 \le i \le k$ where the values of the
 306 attributes are taken from. We often write a relation scheme as
 307 \mbox{$R(A_{1},A_{2},\ldots,A_{k})$}.
 308 \end{definition}
 309 -->
 310     A <firstterm>relation scheme</firstterm> <literal>R</literal> is a
 311     finite set of attributes
 312     <parameter>A<subscript>1</subscript></parameter>,
 313     <parameter>A<subscript>2</subscript></parameter>,
 314     ...
 315     <parameter>A<subscript>k</subscript></parameter>.
 316     There is a domain
 317     <parameter>D<subscript>i</subscript></parameter>,
 318     for each attribute
 319     <parameter>A<subscript>i</subscript></parameter>,
 320     1 &lt;= <literal>i</literal> &lt;= <literal>k</literal>,
 321     where the values of the attributes are taken from. We often write
 322     a relation scheme as
 323     <literal>R(<parameter>A<subscript>1</subscript></parameter>,
 324     <parameter>A<subscript>2</subscript></parameter>,
 325     ...
 326     <parameter>A<subscript>k</subscript></parameter>)</literal>.
 327
 328     <note>
 329      <para>
 330       A <firstterm>relation scheme</firstterm> is just a kind of template
 331       whereas a <firstterm>relation</firstterm> is an instance of a
 332       <firstterm>relation
 333        scheme</firstterm>. The relation consists of tuples (and can
 334       therefore be
 335       viewed as a table); not so the relation scheme.
 336      </para>
 337     </note>
 338    </para>
 339
 340    <sect2>
 341     <title id="domains">Domains vs. Data Types</title>
 342
 343     <para>
 344      We often talked about <firstterm>domains</firstterm>
 345      in the last section. Recall that a
 346      domain is, formally, just a set of values (e.g., the set of integers or
 347      the real numbers). In terms of database systems we often talk of
 348      <firstterm>data types</firstterm> instead of domains.
 349      When we define a table we have to make
 350      a decision about which attributes to include. Additionally we
 351      have to decide which kind of data is going to be stored as
 352      attribute values. For example the values of
 353      <classname>SNAME</classname> from the table
 354      <classname>SUPPLIER</classname> will be character strings,
 355      whereas <classname>SNO</classname> will store
 356      integers. We define this by assigning a data type to each
 357      attribute. The type of <classname>SNAME</classname> will be
 358      <type>VARCHAR(20)</type> (this is the <acronym>SQL</acronym> type
 359      for character strings of length &lt;= 20),
 360      the type of <classname>SNO</classname> will be
 361      <type>INTEGER</type>. With the assignment of a data type we also
 362      have selected
 363      a domain for an attribute. The domain of
 364      <classname>SNAME</classname> is the set of all
 365      character strings of length &lt;= 20,
 366      the domain of <classname>SNO</classname> is the set of
 367      all integer numbers.
 368     </para>
 369    </sect2>
 370   </sect1>
 371
 372   <sect1 id="relmodel-oper">
 373    <title id="operations">Operations in the Relational Data Model</title>
 374
 375    <para>
 376     In the previous section
 377     (<xref linkend="formal-notion" endterm="formal-notion">)
 378     we defined the mathematical notion of
 379     the relational model. Now we know how the data can be stored using a
 380     relational data model but we do not know what to do with all these
 381     tables to retrieve something from the database yet. For example somebody
 382     could ask for the names of all suppliers that sell the part
 383     'Screw'. Therefore two rather different kinds of notations for
 384     expressing operations on relations have been defined:
 385
 386     <itemizedlist>
 387      <listitem>
 388       <para>
 389        The <firstterm>Relational Algebra</firstterm> which is an
 390        algebraic notation,
 391        where queries are expressed by applying specialized operators to the
 392        relations.
 393       </para>
 394      </listitem>
 395
 396      <listitem>
 397       <para>
 398        The <firstterm>Relational Calculus</firstterm> which is a
 399        logical notation,
 400        where queries are expressed by formulating some logical restrictions
 401        that the tuples in the answer must satisfy.
 402       </para>
 403     </listitem>
 404     </itemizedlist>
 405    </para>
 406
 407    <sect2>
 408     <title id="rel-alg">Relational Algebra</title>
 409
 410     <para>
 411      The <firstterm>Relational Algebra</firstterm> was introduced by
 412      E. F. Codd in 1972. It consists of a set of operations on relations:
 413
 414      <itemizedlist>
 415       <listitem>
 416        <para>
 417         SELECT (&sigma;): extracts <firstterm>tuples</firstterm> from
 418         a relation that
 419         satisfy a given restriction. Let <parameter>R</parameter> be a
 420         table that contains an attribute
 421         <parameter>A</parameter>.
 422 &sigma;<subscript>A=a</subscript>(R) = {t &isin; R &mid; t(A) = a}
 423         where <literal>t</literal> denotes a
 424         tuple of <parameter>R</parameter> and <literal>t(A)</literal>
 425         denotes the value of attribute <parameter>A</parameter> of
 426         tuple <literal>t</literal>.
 427        </para>
 428       </listitem>
 429
 430       <listitem>
 431        <para>
 432         PROJECT (&pi;): extracts specified
 433         <firstterm>attributes</firstterm> (columns) from a
 434         relation. Let <classname>R</classname> be a relation
 435         that contains an attribute <classname>X</classname>.
 436         &pi;<subscript>X</subscript>(<classname>R</classname>) = {t(X) &mid; t &isin; <classname>R</classname>},
 437         where <literal>t</literal>(<classname>X</classname>) denotes the value of
 438         attribute <classname>X</classname> of tuple <literal>t</literal>.
 439        </para>
 440       </listitem>
 441
 442       <listitem>
 443        <para>
 444         PRODUCT (&times;): builds the Cartesian product of two
 445         relations. Let <classname>R</classname> be a table with arity
 446         <literal>k</literal><subscript>1</subscript> and let
 447         <classname>S</classname> be a table with
 448         arity <literal>k</literal><subscript>2</subscript>.
 449         <classname>R</classname> &times; <classname>S</classname>
 450         is the set of all
 451         <literal>k</literal><subscript>1</subscript>
 452         + <literal>k</literal><subscript>2</subscript>-tuples
 453         whose first <literal>k</literal><subscript>1</subscript>
 454         components form a tuple in <classname>R</classname> and whose last
 455         <literal>k</literal><subscript>2</subscript> components form a
 456         tuple in <classname>S</classname>.
 457        </para>
 458       </listitem>
 459
 460       <listitem>
 461        <para>
 462         UNION (&cup;): builds the set-theoretic union of two
 463         tables. Given the tables <classname>R</classname> and
 464         <classname>S</classname> (both must have the same arity),
 465         the union <classname>R</classname> &cup; <classname>S</classname>
 466         is the set of tuples that are in <classname>R</classname>
 467         or <classname>S</classname> or both.
 468        </para>
 469       </listitem>
 470
 471       <listitem>
 472        <para>
 473         INTERSECT (&cap;): builds the set-theoretic intersection of two
 474         tables. Given the tables <classname>R</classname> and
 475         <classname>S</classname>,
 476         <classname>R</classname> &cap; <classname>S</classname> is the
 477         set of tuples
 478         that are in <classname>R</classname> and in
 479         <classname>S</classname>.
 480         We again require that <classname>R</classname> and
 481         <classname>S</classname> have the
 482         same arity.
 483        </para>
 484       </listitem>
 485
 486       <listitem>
 487        <para>
 488         DIFFERENCE (&minus; or &setmn;): builds the set difference of
 489         two tables. Let <classname>R</classname> and <classname>S</classname>
 490         again be two tables with the same
 491         arity. <classname>R</classname> - <classname>S</classname>
 492         is the set of tuples in <classname>R</classname> but not in
 493         <classname>S</classname>.
 494        </para>
 495       </listitem>
 496
 497       <listitem>
 498        <para>
 499         JOIN (&prod;): connects two tables by their common
 500         attributes. Let <classname>R</classname> be a table with the
 501         attributes <classname>A</classname>,<classname>B</classname>
 502         and <classname>C</classname> and
 503         let <classname>S</classname> be a table with the attributes
 504         <classname>C</classname>,<classname>D</classname>
 505         and <classname>E</classname>. There is one
 506         attribute common to both relations,
 507         the attribute <classname>C</classname>.
 508 <!--
 509         <classname>R</classname> &prod; <classname>S</classname> =
 510         &pi;<subscript><classname>R</classname>.<classname>A</classname>,<classname>R</classname>.<classname>B</classname>,<classname>R</classname>.<classname>C</classname>,<classname>S</classname>.<classname>D</classname>,<classname>S</classname>.<classname>E</classname></subscript>(&sigma;<subscript><classname>R</classname>.<classname>C</classname>=<classname>S</classname>.<classname>C</classname></subscript>(<classname>R</classname> &times; <classname>S</classname>)).
 511 -->
 512         R &prod; S = &pi;<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(&sigma;<subscript>R.C=S.C</subscript>(R &times; S)).
 513         What are we doing here? We first calculate the Cartesian
 514         product
 515         <classname>R</classname> &times; <classname>S</classname>.
 516         Then we select those tuples whose values for the common
 517         attribute <classname>C</classname> are equal
 518         (&sigma;<subscript>R.C = S.C</subscript>).
 519         Now we have a table
 520         that contains the attribute <classname>C</classname>
 521         two times and we correct this by
 522         projecting out the duplicate column.
 523        </para>
 524
 525        <example>
 526         <title id="join-example">An Inner Join</title>
 527
 528         <para>
 529          Let's have a look at the tables that are produced by evaluating the steps
 530          necessary for a join.
 531          Let the following two tables be given:
 532
 533          <programlisting>
 534 R:                 S:
 535  A | B | C          C | D | E
 536 ---+---+---        ---+---+---
 537  1 | 2 | 3          3 | a | b
 538  4 | 5 | 6          6 | c | d
 539  7 | 8 | 9
 540          </programlisting>
 541         </para>
 542        </example>
 543
 544        <para>
 545         First we calculate the Cartesian product
 546         <classname>R</classname> &times; <classname>S</classname> and
 547         get:
 548
 549         <programlisting>
 550 R x S:
 551  A | B | R.C | S.C | D | E
 552 ---+---+-----+-----+---+---
 553  1 | 2 |  3  |  3  | a | b
 554  1 | 2 |  3  |  6  | c | d
 555  4 | 5 |  6  |  3  | a | b
 556  4 | 5 |  6  |  6  | c | d
 557  7 | 8 |  9  |  3  | a | b
 558  7 | 8 |  9  |  6  | c | d
 559         </programlisting>
 560        </para>
 561
 562        <para>
 563         After the selection
 564         &sigma;<subscript>R.C=S.C</subscript>(R &times; S)
 565         we get:
 566
 567         <programlisting>
 568  A | B | R.C | S.C | D | E
 569 ---+---+-----+-----+---+---
 570  1 | 2 |  3  |  3  | a | b
 571  4 | 5 |  6  |  6  | c | d
 572         </programlisting>
 573        </para>
 574
 575        <para>
 576         To remove the duplicate column
 577         <classname>S</classname>.<classname>C</classname>
 578         we project it out by the following operation:
 579         &pi;<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(&sigma;<subscript>R.C=S.C</subscript>(R &times; S))
 580         and get:
 581
 582         <programlisting>
 583  A | B | C | D | E
 584 ---+---+---+---+---
 585  1 | 2 | 3 | a | b
 586  4 | 5 | 6 | c | d
 587         </programlisting>
 588        </para>
 589       </listitem>
 590
 591       <listitem>
 592        <para>
 593         DIVIDE (&divide;): Let <classname>R</classname> be a table
 594         with the attributes A, B, C, and D and let
 595         <classname>S</classname> be a table with the attributes
 596         C and D.
 597         Then we define the division as:
 598
 599         <programlisting>
 600 R &divide; S = {t &mid; &forall; t<subscript>s</subscript> &isin; S &exist; t<subscript>r</subscript> &isin; R
 601         </programlisting>
 602
 603         such that
 604 t<subscript>r</subscript>(A,B)=t&and;t<subscript>r</subscript>(C,D)=t<subscript>s</subscript>}
 605         where
 606         t<subscript>r</subscript>(x,y)
 607         denotes a
 608         tuple of table <classname>R</classname> that consists only of
 609         the components <literal>x</literal> and <literal>y</literal>.
 610         Note that the tuple <literal>t</literal> only consists of the
 611         components <classname>A</classname> and
 612         <classname>B</classname> of relation <classname>R</classname>.
 613        </para>
 614
 615        <para id="divide-example">
 616         Given the following tables
 617
 618         <programlisting>
 619 R:                    S:
 620  A | B | C | D         C | D
 621 ---+---+---+---       ---+---
 622  a | b | c | d         c | d
 623  a | b | e | f         e | f
 624  b | c | e | f
 625  e | d | c | d
 626  e | d | e | f
 627  a | b | d | e
 628         </programlisting>
 629
 630         R &divide; S
 631         is derived as
 632
 633         <programlisting>
 634  A | B
 635 ---+---
 636  a | b
 637  e | d
 638         </programlisting>
 639        </para>
 640       </listitem>
 641      </itemizedlist>
 642     </para>
 643
 644     <para>
 645      For a more detailed description and definition of the relational
 646      algebra refer to [<xref linkend="ULL88" endterm="ULL88">] or
 647      [<xref linkend="DATE04" endterm="DATE04">].
 648     </para>
 649
 650     <example>
 651      <title id="suppl-rel-alg">A Query Using Relational Algebra</title>
 652      <para>
 653       Recall that we formulated all those relational operators to be able to
 654       retrieve data from the database. Let's return to our example from
 655       the previous
 656       section (<xref linkend="operations" endterm="operations">)
 657       where someone wanted to know the names of all
 658       suppliers that sell the part <literal>Screw</literal>.
 659       This question can be answered
 660       using relational algebra by the following operation:
 661
 662       <programlisting>
 663 &pi;<subscript>SUPPLIER.SNAME</subscript>(&sigma;<subscript>PART.PNAME='Screw'</subscript>(SUPPLIER &prod; SELLS &prod; PART))
 664       </programlisting>
 665      </para>
 666
 667      <para>
 668       We call such an operation a query. If we evaluate the above query
 669       against the our example tables
 670       (<xref linkend="supplier-fig" endterm="supplier-fig">)
 671       we will obtain the following result:
 672
 673       <programlisting>
 674  SNAME
 675 -------
 676  Smith
 677  Adams
 678       </programlisting>
 679      </para>
 680     </example>
 681    </sect2>
 682
 683    <sect2 id="rel-calc">
 684     <title>Relational Calculus</title>
 685
 686     <para>
 687      The relational calculus is based on the
 688      <firstterm>first order logic</firstterm>. There are
 689      two variants of the relational calculus:
 690
 691      <itemizedlist>
 692       <listitem>
 693        <para>
 694         The <firstterm>Domain Relational Calculus</firstterm>
 695         (<acronym>DRC</acronym>), where variables
 696         stand for components (attributes) of the tuples.
 697        </para>
 698       </listitem>
 699
 700       <listitem>
 701        <para>
 702         The <firstterm>Tuple Relational Calculus</firstterm>
 703         (<acronym>TRC</acronym>), where variables stand for tuples.
 704        </para>
 705       </listitem>
 706      </itemizedlist>
 707     </para>
 708
 709     <para>
 710      We want to discuss the tuple relational calculus only because it is
 711      the one underlying the most relational languages. For a detailed
 712      discussion on <acronym>DRC</acronym> (and also
 713      <acronym>TRC</acronym>) see
 714      <xref linkend="DATE04" endterm="DATE04">
 715      or
 716      <xref linkend="ULL88" endterm="ULL88">.
 717     </para>
 718    </sect2>
 719
 720    <sect2>
 721     <title>Tuple Relational Calculus</title>
 722
 723     <para>
 724      The queries used in <acronym>TRC</acronym> are of the following
 725      form:
 726
 727      <programlisting>
 728 x(A) &mid; F(x)
 729      </programlisting>
 730
 731      where <literal>x</literal> is a tuple variable
 732      <classname>A</classname> is a set of attributes and <literal>F</literal> is a
 733      formula. The resulting relation consists of all tuples
 734      <literal>t(A)</literal> that satisfy <literal>F(t)</literal>.
 735     </para>
 736
 737     <para>
 738      If we want to answer the question from example
 739      <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">
 740      using <acronym>TRC</acronym> we formulate the following query:
 741
 742      <programlisting>
 743 {x(SNAME) &mid; x &isin; SUPPLIER &and;
 744     &exist; y &isin; SELLS &exist; z &isin; PART (y(SNO)=x(SNO) &and;
 745     z(PNO)=y(PNO) &and;
 746     z(PNAME)='Screw')}
 747      </programlisting>
 748     </para>
 749
 750     <para>
 751      Evaluating the query against the tables from
 752      <xref linkend="supplier-fig" endterm="supplier-fig">
 753      again leads to the same result
 754      as in
 755      <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">.
 756     </para>
 757    </sect2>
 758
 759    <sect2 id="alg-vs-calc">
 760     <title>Relational Algebra vs. Relational Calculus</title>
 761
 762     <para>
 763      The relational algebra and the relational calculus have the same
 764      <firstterm>expressive power</firstterm>; i.e. all queries that
 765      can be formulated using relational algebra can also be formulated
 766      using the relational calculus and vice versa.
 767      This was first proved by E. F. Codd in
 768      1972. This proof is based on an algorithm (<quote>Codd's reduction
 769      algorithm</quote>) by which an arbitrary expression of the relational
 770      calculus can be reduced to a semantically equivalent expression of
 771      relational algebra. For a more detailed discussion on that refer to
 772      <xref linkend="DATE04" endterm="DATE04">
 773      and
 774      <xref linkend="ULL88" endterm="ULL88">.
 775     </para>
 776
 777     <para>
 778      It is sometimes said that languages based on the relational
 779      calculus are <quote>higher level</quote> or <quote>more
 780      declarative</quote> than languages based on relational algebra
 781      because the algebra (partially) specifies the order of operations
 782      while the calculus leaves it to a compiler or interpreter to
 783      determine the most efficient order of evaluation.
 784     </para>
 785    </sect2>
 786   </sect1>
 787
 788   <sect1 id="sql-language">
 789    <title>The <acronym>SQL</acronym> Language</title>
 790
 791    <para>
 792     As is the case with most modern relational languages,
 793     <acronym>SQL</acronym> is based on the tuple
 794     relational calculus. As a result every query that can be formulated
 795     using the tuple relational calculus (or equivalently, relational
 796     algebra) can also be formulated using
 797     <acronym>SQL</acronym>. There are, however,
 798     capabilities beyond the scope of relational algebra or calculus. Here
 799     is a list of some additional features provided by
 800     <acronym>SQL</acronym> that are not
 801     part of relational algebra or calculus:
 802
 803     <itemizedlist>
 804      <listitem>
 805       <para>
 806        Commands for insertion, deletion or modification of data.
 807       </para>
 808      </listitem>
 809
 810      <listitem>
 811       <para>
 812        Arithmetic capability: In <acronym>SQL</acronym> it is possible
 813        to involve
 814        arithmetic operations as well as comparisons, e.g.
 815
 816        <programlisting>
 817 A &lt; B + 3.
 818        </programlisting>
 819
 820        Note
 821        that + or other arithmetic operators appear neither in relational
 822        algebra nor in relational calculus.
 823       </para>
 824      </listitem>
 825
 826      <listitem>
 827       <para>
 828        Assignment and Print Commands: It is possible to print a
 829        relation constructed by a query and to assign a computed relation to a
 830        relation name.
 831       </para>
 832      </listitem>
 833
 834      <listitem>
 835       <para>
 836        Aggregate Functions: Operations such as
 837        <firstterm>average</firstterm>, <firstterm>sum</firstterm>,
 838        <firstterm>max</firstterm>, etc. can be applied to columns of a
 839        relation to
 840        obtain a single quantity.
 841       </para>
 842      </listitem>
 843     </itemizedlist>
 844    </para>
 845
 846    <sect2 id="select">
 847     <title id="select-title">Select</title>
 848
 849     <para>
 850      The most often used command in <acronym>SQL</acronym> is the
 851      <command>SELECT</command> statement,
 852      used to retrieve data. The syntax is:
 853
 854      <synopsis>
 855 SELECT [ ALL | DISTINCT [ ON ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) ] ]
 856     * | <replaceable class="PARAMETER">expression</replaceable> [ [ AS ] <replaceable class="PARAMETER">output_name</replaceable> ] [, ...]
 857     [ INTO [ TEMPORARY | TEMP ] [ TABLE ] <replaceable class="PARAMETER">new_table</replaceable> ]
 858     [ FROM <replaceable class="PARAMETER">from_item</replaceable> [, ...] ]
 859     [ WHERE <replaceable class="PARAMETER">condition</replaceable> ]
 860     [ GROUP BY <replaceable class="PARAMETER">expression</replaceable> [, ...] ]
 861     [ HAVING <replaceable class="PARAMETER">condition</replaceable> [, ...] ]
 862     [ { UNION | INTERSECT | EXCEPT } [ ALL ] <replaceable class="PARAMETER">select</replaceable> ]
 863     [ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ]
 864     [ LIMIT { <replaceable class="PARAMETER">count</replaceable> | ALL } ]
 865     [ OFFSET <replaceable class="PARAMETER">start</replaceable> ]
 866     [ FOR { UPDATE | SHARE } [ OF <replaceable class="parameter">table_name</replaceable> [, ...] ] [ NOWAIT ] [...] ]
 867      </synopsis>
 868     </para>
 869
 870     <para>
 871      Now we will illustrate the complex syntax of the
 872      <command>SELECT</command> statement with various examples. The
 873      tables used for the examples are defined in <xref
 874      linkend="supplier-fig" endterm="supplier-fig">.
 875     </para>
 876
 877     <sect3>
 878      <title>Simple Selects</title>
 879
 880      <para>
 881       Here are some simple examples using a <command>SELECT</command> statement:
 882
 883       <example>
 884        <title id="simple-query">Simple Query with Qualification</title>
 885        <para>
 886         To retrieve all tuples from table PART where the attribute PRICE is
 887         greater than 10 we formulate the following query:
 888
 889         <programlisting>
 890 SELECT * FROM PART
 891     WHERE PRICE &gt; 10;
 892         </programlisting>
 893
 894         and get the table:
 895
 896         <programlisting>
 897  PNO |  PNAME  |  PRICE
 898 -----+---------+--------
 899   3  |  Bolt   |   15
 900   4  |  Cam    |   25
 901         </programlisting>
 902        </para>
 903
 904        <para>
 905         Using <quote>*</quote> in the <command>SELECT</command> statement
 906         will deliver all attributes from the table. If we want to retrieve
 907         only the attributes PNAME and PRICE from table PART we use the
 908         statement:
 909
 910         <programlisting>
 911 SELECT PNAME, PRICE
 912     FROM PART
 913     WHERE PRICE &gt; 10;
 914         </programlisting>
 915
 916         In this case the result is:
 917
 918         <programlisting>
 919                       PNAME  |  PRICE
 920                      --------+--------
 921                       Bolt   |   15
 922                       Cam    |   25
 923         </programlisting>
 924
 925         Note that the <acronym>SQL</acronym> <command>SELECT</command>
 926         corresponds to the <quote>projection</quote> in relational algebra
 927         not to the <quote>selection</quote> (see <xref linkend="rel-alg"
 928         endterm="rel-alg"> for more details).
 929        </para>
 930
 931        <para>
 932         The qualifications in the WHERE clause can also be logically connected
 933         using the keywords OR, AND, and NOT:
 934
 935         <programlisting>
 936 SELECT PNAME, PRICE
 937     FROM PART
 938     WHERE PNAME = 'Bolt' AND
 939          (PRICE = 0 OR PRICE &lt;= 15);
 940         </programlisting>
 941
 942         will lead to the result:
 943
 944         <programlisting>
 945  PNAME  |  PRICE
 946 --------+--------
 947  Bolt   |   15
 948         </programlisting>
 949        </para>
 950
 951        <para>
 952         Arithmetic operations can be used in the target list and in the WHERE
 953         clause. For example if we want to know how much it would cost if we
 954         take two pieces of a part we could use the following query:
 955
 956         <programlisting>
 957 SELECT PNAME, PRICE * 2 AS DOUBLE
 958     FROM PART
 959     WHERE PRICE * 2 &lt; 50;
 960         </programlisting>
 961
 962         and we get:
 963
 964         <programlisting>
 965  PNAME  |  DOUBLE
 966 --------+---------
 967  Screw  |    20
 968  Nut    |    16
 969  Bolt   |    30
 970         </programlisting>
 971
 972         Note that the word DOUBLE after the keyword AS is the new title of the
 973         second column. This technique can be used for every element of the
 974         target list to assign a new title to the resulting
 975         column. This new title
 976         is often referred to as alias. The alias cannot be used throughout the
 977         rest of the query.
 978        </para>
 979       </example>
 980      </para>
 981     </sect3>
 982
 983     <sect3>
 984      <title>Joins</title>
 985
 986      <para id="simple-join">
 987       The following example shows how <firstterm>joins</firstterm> are
 988       realized in <acronym>SQL</acronym>.
 989      </para>
 990
 991      <para>
 992       To join the three tables SUPPLIER, PART and SELLS over their common
 993       attributes we formulate the following statement:
 994
 995       <programlisting>
 996 SELECT S.SNAME, P.PNAME
 997     FROM SUPPLIER S, PART P, SELLS SE
 998     WHERE S.SNO = SE.SNO AND
 999           P.PNO = SE.PNO;
1000       </programlisting>
1001
1002       and get the following table as a result:
1003
1004       <programlisting>
1005  SNAME | PNAME
1006 -------+-------
1007  Smith | Screw
1008  Smith | Nut
1009  Jones | Cam
1010  Adams | Screw
1011  Adams | Bolt
1012  Blake | Nut
1013  Blake | Bolt
1014  Blake | Cam
1015       </programlisting>
1016      </para>
1017
1018      <para>
1019       In the FROM clause we introduced an alias name for every relation
1020       because there are common named attributes (SNO and PNO) among the
1021       relations. Now we can distinguish between the common named attributes
1022       by simply prefixing the attribute name with the alias name followed by
1023       a dot. The join is calculated in the same way as shown in
1024       <xref linkend="join-example" endterm="join-example">.
1025       First the Cartesian product
1026
1027       SUPPLIER &times; PART &times; SELLS
1028
1029       is derived. Now only those tuples satisfying the
1030       conditions given in the WHERE clause are selected (i.e. the common
1031       named attributes have to be equal). Finally we project out all
1032       columns but S.SNAME and P.PNAME.
1033      </para>
1034
1035      <para>
1036      Another way to perform joins is to use the SQL JOIN syntax as follows:
1037      <programlisting>
1038 select sname, pname from supplier
1039     JOIN sells USING (sno)
1040     JOIN part USING (pno);
1041     </programlisting>
1042     giving again:
1043     <programlisting>
1044  sname | pname
1045 -------+-------
1046  Smith | Screw
1047  Adams | Screw
1048  Smith | Nut
1049  Blake | Nut
1050  Adams | Bolt
1051  Blake | Bolt
1052  Jones | Cam
1053  Blake | Cam
1054 (8 rows)
1055      </programlisting>
1056      </para>
1057
1058      <para>
1059      A joined table, created using JOIN syntax, is a table reference list
1060      item that occurs in a FROM clause and before any WHERE, GROUP BY,
1061      or HAVING clause.  Other table references, including table names or
1062      other JOIN clauses, can be included in the FROM clause if separated
1063      by commas.  JOINed tables are logically like any other
1064      table listed in the FROM clause.
1065      </para>
1066
1067      <para>
1068       SQL JOINs come in two main types, CROSS JOINs (unqualified joins)
1069       and <firstterm>qualified JOINs</>.  Qualified joins can be further
1070       subdivided based on the way in which the <firstterm>join condition</>
1071       is specified (ON, USING, or NATURAL) and the way in which it is
1072       applied (INNER or OUTER join).
1073      </para>
1074
1075     <variablelist>
1076         <title>Join Types</title>
1077         <varlistentry>
1078             <term>CROSS JOIN</term>
1079             <listitem>
1080             <cmdsynopsis>
1081                 <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg>
1082                 <command> CROSS JOIN </command>
1083                 <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg>
1084             </cmdsynopsis>
1085
1086             <para>
1087             A cross join takes two tables T1 and T2 having N and M rows
1088             respectively, and returns a joined table containing all
1089             N*M possible joined rows. For each row R1 of T1, each row
1090             R2 of T2 is joined with R1 to yield a joined table row JR
1091             consisting of all fields in R1 and R2. A CROSS JOIN is
1092             equivalent to an INNER JOIN ON TRUE.
1093             </para>
1094             </listitem>
1095         </varlistentry>
1096
1097         <varlistentry>
1098             <term>Qualified JOINs</term>
1099             <listitem>
1100
1101             <cmdsynopsis>
1102             <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg>
1103             <arg choice="opt"> NATURAL </arg>
1104             <group choice="opt">
1105                 <arg choice="opt"> INNER </arg>
1106                 <arg>
1107                 <group choice="req">
1108                     <arg choice="plain"> LEFT </arg>
1109                     <arg choice="plain"> RIGHT </arg>
1110                     <arg choice="plain"> FULL </arg>
1111                 </group>
1112                 <arg choice="opt"> OUTER </arg>
1113                     </arg>
1114                 </group>
1115             <command> JOIN </command>
1116             <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg>
1117             <group choice="req">
1118                 <arg> ON <replaceable>search condition</replaceable></arg>
1119                 <arg> USING ( <replaceable>join column list</replaceable> ) </arg>
1120             </group>
1121             </cmdsynopsis>
1122
1123             <para>
1124             A qualified JOIN must specify its join condition
1125             by providing one (and only one) of NATURAL, ON, or
1126             USING.  The ON clause
1127             takes a <replaceable>search condition</replaceable>,
1128             which is the same as in a WHERE clause.  The USING
1129             clause takes a comma-separated list of column names,
1130             which the joined tables must have in common, and joins
1131             the tables on equality of those columns.  NATURAL is
1132             shorthand for a USING clause that lists all the common
1133             column names of the two tables.  A side-effect of both
1134             USING and NATURAL is that only one copy of each joined
1135             column is emitted into the result table (compare the
1136             relational-algebra definition of JOIN, shown earlier).
1137             </para>
1138
1139             <!-- begin join semantics -->
1140             <variablelist>
1141             <varlistentry>
1142                 <term>
1143                     <cmdsynopsis>
1144                         <arg> INNER </arg>
1145                         <command> JOIN </command>
1146                     </cmdsynopsis>
1147                 </term>
1148                 <listitem>
1149                 <para>
1150                 For each row R1 of T1, the joined table has a row for each row
1151                 in T2 that satisfies the join condition with R1.
1152                 </para>
1153                 <tip>
1154                 <para>
1155                     The words INNER and OUTER are optional for all JOINs.
1156                     INNER is the default.  LEFT, RIGHT, and FULL imply an
1157                     OUTER JOIN.
1158                     </para>
1159                 </tip>
1160                 </listitem>
1161             </varlistentry>
1162             <varlistentry>
1163                 <term>
1164                     <cmdsynopsis>
1165                         <arg choice="plain"> LEFT </arg>
1166                         <arg> OUTER </arg>
1167                         <command> JOIN </command>
1168                     </cmdsynopsis>
1169                 </term>
1170                 <listitem>
1171                 <para>
1172                 First, an INNER JOIN is performed.
1173                 Then, for each row in T1 that does not satisfy the join
1174                 condition with any row in T2, an additional joined row is
1175                 returned with null fields in the columns from T2.
1176                 </para>
1177                 <tip>
1178                     <para>
1179                     The joined table unconditionally has a row for each row in T1.
1180                     </para>
1181                 </tip>
1182                 </listitem>
1183             </varlistentry>
1184             <varlistentry>
1185                 <term>
1186                     <cmdsynopsis>
1187                         <arg choice="plain"> RIGHT </arg>
1188                         <arg> OUTER </arg>
1189                         <command> JOIN </command>
1190                     </cmdsynopsis>
1191                 </term>
1192                 <listitem>
1193                 <para>
1194                 First, an INNER JOIN is performed.
1195                 Then, for each row in T2 that does not satisfy the join
1196                 condition with any row in T1, an additional joined row is
1197                 returned with null fields in the columns from T1.
1198                 </para>
1199                 <tip>
1200                     <para>
1201                     The joined table unconditionally has a row for each row in T2.
1202                     </para>
1203                 </tip>
1204                 </listitem>
1205             </varlistentry>
1206             <varlistentry>
1207                 <term>
1208                     <cmdsynopsis>
1209                         <arg choice="plain"> FULL </arg>
1210                         <arg> OUTER </arg>
1211                         <command> JOIN </command>
1212                     </cmdsynopsis>
1213                 </term>
1214                 <listitem>
1215                 <para>
1216                 First, an INNER JOIN is performed.
1217                 Then, for each row in T1 that does not satisfy the join
1218                 condition with any row in T2, an additional joined row is
1219                 returned with null fields in the columns from T2.
1220                 Also, for each row in T2 that does not satisfy the join
1221                 condition with any row in T1, an additional joined row is
1222                 returned with null fields in the columns from T1.
1223                 </para>
1224                 <tip>
1225                     <para>
1226                     The joined table unconditionally has a row for every row of T1
1227                     and a row for every row of T2.
1228                     </para>
1229                 </tip>
1230                 </listitem>
1231             </varlistentry>
1232             </variablelist>
1233             <!-- end join semantics -->
1234
1235             </listitem>
1236         </varlistentry>
1237      </variablelist>
1238
1239      <para>
1240      JOINs of all types can be chained together or nested where either or both of
1241      <replaceable class="parameter">T1</replaceable> and
1242      <replaceable class="parameter">T2</replaceable> can be JOINed tables.
1243      Parenthesis can be used around JOIN clauses to control the order
1244      of JOINs which are otherwise processed left to right.
1245      </para>
1246
1247     </sect3>
1248
1249     <sect3>
1250      <title id="aggregates-tutorial">Aggregate Functions</title>
1251
1252      <para>
1253       <acronym>SQL</acronym> provides aggregate functions such as AVG,
1254       COUNT, SUM, MIN, and MAX.  The argument(s) of an aggregate function
1255       are evaluated at each row that satisfies the WHERE
1256       clause, and the aggregate function is calculated over this set
1257       of input values.  Normally, an aggregate delivers a single
1258       result for a whole <command>SELECT</command> statement.  But if
1259       grouping is specified in the query, then a separate calculation
1260       is done over the rows of each group, and an aggregate result is
1261       delivered per group (see next section).
1262
1263       <example>
1264        <title id="aggregates-example">Aggregates</title>
1265
1266        <para>
1267         If we want to know the average cost of all parts in table PART we use
1268         the following query:
1269
1270         <programlisting>
1271 SELECT AVG(PRICE) AS AVG_PRICE
1272     FROM PART;
1273         </programlisting>
1274        </para>
1275
1276        <para>
1277         The result is:
1278
1279         <programlisting>
1280  AVG_PRICE
1281 -----------
1282    14.5
1283         </programlisting>
1284        </para>
1285
1286        <para>
1287         If we want to know how many parts are defined in table PART we use
1288         the statement:
1289
1290         <programlisting>
1291 SELECT COUNT(PNO)
1292     FROM PART;
1293         </programlisting>
1294
1295         and get:
1296
1297         <programlisting>
1298  COUNT
1299 -------
1300    4
1301         </programlisting>
1302
1303        </para>
1304       </example>
1305      </para>
1306     </sect3>
1307
1308     <sect3>
1309      <title>Aggregation by Groups</title>
1310
1311      <para>
1312       <acronym>SQL</acronym> allows one to partition the tuples of a table
1313       into groups. Then the
1314       aggregate functions described above can be applied to the groups &mdash;
1315       i.e. the value of the aggregate function is no longer calculated over
1316       all the values of the specified column but over all values of a
1317       group. Thus the aggregate function is evaluated separately for every
1318       group.
1319      </para>
1320
1321      <para>
1322       The partitioning of the tuples into groups is done by using the
1323       keywords <command>GROUP BY</command> followed by a list of
1324       attributes that define the
1325       groups. If we have
1326       <command>GROUP BY A<subscript>1</subscript>, &tdot;, A<subscript>k</subscript></command>
1327       we partition
1328       the relation into groups, such that two tuples are in the same group
1329       if and only if they agree on all the attributes
1330       A<subscript>1</subscript>, &tdot;, A<subscript>k</subscript>.
1331
1332       <example>
1333        <title id="aggregates-groupby">Aggregates</title>
1334        <para>
1335         If we want to know how many parts are sold by every supplier we
1336         formulate the query:
1337
1338         <programlisting>
1339 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1340     FROM SUPPLIER S, SELLS SE
1341     WHERE S.SNO = SE.SNO
1342     GROUP BY S.SNO, S.SNAME;
1343         </programlisting>
1344
1345         and get:
1346
1347         <programlisting>
1348  SNO | SNAME | COUNT
1349 -----+-------+-------
1350   1  | Smith |   2
1351   2  | Jones |   1
1352   3  | Adams |   2
1353   4  | Blake |   3
1354         </programlisting>
1355        </para>
1356
1357        <para>
1358         Now let's have a look of what is happening here.
1359         First the join of the
1360         tables SUPPLIER and SELLS is derived:
1361
1362         <programlisting>
1363  S.SNO | S.SNAME | SE.PNO
1364 -------+---------+--------
1365    1   |  Smith  |   1
1366    1   |  Smith  |   2
1367    2   |  Jones  |   4
1368    3   |  Adams  |   1
1369    3   |  Adams  |   3
1370    4   |  Blake  |   2
1371    4   |  Blake  |   3
1372    4   |  Blake  |   4
1373         </programlisting>
1374        </para>
1375
1376        <para>
1377         Next we partition the tuples into groups by putting all tuples
1378         together that agree on both attributes S.SNO and S.SNAME:
1379
1380         <programlisting>
1381  S.SNO | S.SNAME | SE.PNO
1382 -------+---------+--------
1383    1   |  Smith  |   1
1384                  |   2
1385 --------------------------
1386    2   |  Jones  |   4
1387 --------------------------
1388    3   |  Adams  |   1
1389                  |   3
1390 --------------------------
1391    4   |  Blake  |   2
1392                  |   3
1393                  |   4
1394         </programlisting>
1395        </para>
1396
1397        <para>
1398         In our example we got four groups and now we can apply the aggregate
1399         function COUNT to every group leading to the final result of the query
1400         given above.
1401        </para>
1402       </example>
1403      </para>
1404
1405      <para>
1406       Note that for a query using GROUP BY and aggregate
1407       functions to make sense, the target list can only refer directly to
1408       the attributes being grouped by.  Other attributes can only be used
1409       inside the arguments of aggregate functions.  Otherwise there would
1410       not be a unique value to associate with the other attributes.
1411      </para>
1412
1413      <para>
1414       Also observe that it makes no sense to ask for an aggregate of
1415       an aggregate, e.g., AVG(MAX(sno)), because a
1416       <command>SELECT</command> only does one pass of grouping and
1417       aggregation.  You can get a result of this kind by using a
1418       temporary table or a sub-SELECT in the FROM clause to do the
1419       first level of aggregation.
1420      </para>
1421     </sect3>
1422
1423     <sect3>
1424      <title>Having</title>
1425
1426      <para>
1427       The HAVING clause works much like the WHERE clause and is used to
1428       consider only those groups satisfying the qualification given in the
1429       HAVING clause.  Essentially, WHERE filters out unwanted input rows
1430       before grouping and aggregation are done, whereas HAVING filters out
1431       unwanted group rows post-GROUP.  Therefore, WHERE cannot refer to the
1432       results of aggregate functions.  On the other hand, there's no point
1433       in writing a HAVING condition that doesn't involve an aggregate
1434       function!  If your condition doesn't involve aggregates, you might
1435       as well write it in WHERE, and thereby avoid the computation of
1436       aggregates for groups that you're just going to throw away anyway.
1437
1438       <example>
1439        <title id="having-example">Having</title>
1440
1441        <para>
1442         If we want only those suppliers selling more than one part we use the
1443         query:
1444
1445         <programlisting>
1446 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1447     FROM SUPPLIER S, SELLS SE
1448     WHERE S.SNO = SE.SNO
1449     GROUP BY S.SNO, S.SNAME
1450     HAVING COUNT(SE.PNO) &gt; 1;
1451         </programlisting>
1452
1453         and get:
1454
1455         <programlisting>
1456  SNO | SNAME | COUNT
1457 -----+-------+-------
1458   1  | Smith |   2
1459   3  | Adams |   2
1460   4  | Blake |   3
1461         </programlisting>
1462        </para>
1463       </example>
1464      </para>
1465     </sect3>
1466
1467     <sect3>
1468      <title>Subqueries</title>
1469
1470      <para>
1471       In the WHERE and HAVING clauses the use of subqueries (subselects) is
1472       allowed in every place where a value is expected. In this case the
1473       value must be derived by evaluating the subquery first. The usage of
1474       subqueries extends the expressive power of
1475       <acronym>SQL</acronym>.
1476
1477       <example>
1478        <title id="subselect-example">Subselect</title>
1479
1480        <para>
1481         If we want to know all parts having a greater price than the part
1482         named 'Screw' we use the query:
1483
1484         <programlisting>
1485 SELECT *
1486     FROM PART
1487     WHERE PRICE &gt; (SELECT PRICE FROM PART
1488                    WHERE PNAME='Screw');
1489         </programlisting>
1490        </para>
1491
1492        <para>
1493         The result is:
1494
1495         <programlisting>
1496  PNO |  PNAME  |  PRICE
1497 -----+---------+--------
1498   3  |  Bolt   |   15
1499   4  |  Cam    |   25
1500         </programlisting>
1501        </para>
1502
1503        <para>
1504         When we look at the above query we can see the keyword
1505         <command>SELECT</command> two times. The first one at the
1506         beginning of the query - we will refer to it as outer
1507         <command>SELECT</command> - and the one in the WHERE clause which
1508         begins a nested query - we will refer to it as inner
1509         <command>SELECT</command>. For every tuple of the outer
1510         <command>SELECT</command> the inner <command>SELECT</command> has
1511         to be evaluated. After every evaluation we know the price of the
1512         tuple named 'Screw' and we can check if the price of the actual
1513         tuple is greater.  (Actually, in this example the inner query need
1514         only be evaluated once, since it does not depend on the state of
1515         the outer query.)
1516        </para>
1517
1518        <para>
1519         If we want to know all suppliers that do not sell any part
1520         (e.g. to be able to remove these suppliers from the database) we use:
1521
1522         <programlisting>
1523 SELECT *
1524     FROM SUPPLIER S
1525     WHERE NOT EXISTS
1526         (SELECT * FROM SELLS SE
1527          WHERE SE.SNO = S.SNO);
1528         </programlisting>
1529        </para>
1530
1531        <para>
1532         In our example the result will be empty because every supplier
1533         sells at least one part. Note that we use S.SNO from the outer
1534         <command>SELECT</command> within the WHERE clause of the inner
1535         <command>SELECT</command>. Here the subquery must be evaluated
1536         afresh for each tuple from the outer query, i.e. the value for
1537         S.SNO is always taken from the current tuple of the outer
1538         <command>SELECT</command>.
1539        </para>
1540       </example>
1541      </para>
1542     </sect3>
1543
1544     <sect3>
1545      <title>Subqueries in FROM</title>
1546
1547      <para>
1548       A somewhat different way of using subqueries is to put them in the
1549       FROM clause.  This is a useful feature because a subquery of this
1550       kind can output multiple columns and rows, whereas a subquery used
1551       in an expression must deliver just a single result.  It also lets
1552       us get more than one round of grouping/aggregation without resorting
1553       to a temporary table.
1554
1555       <example>
1556        <title id="subselect-in-from-example">Subselect in FROM</title>
1557
1558        <para>
1559         If we want to know the highest average part price among all our
1560         suppliers, we cannot write MAX(AVG(PRICE)), but we can write:
1561
1562         <programlisting>
1563 SELECT MAX(subtable.avgprice)
1564     FROM (SELECT AVG(P.PRICE) AS avgprice
1565           FROM SUPPLIER S, PART P, SELLS SE
1566           WHERE S.SNO = SE.SNO AND
1567                 P.PNO = SE.PNO
1568           GROUP BY S.SNO) subtable;
1569         </programlisting>
1570
1571         The subquery returns one row per supplier (because of its GROUP BY)
1572         and then we aggregate over those rows in the outer query.
1573        </para>
1574       </example>
1575      </para>
1576     </sect3>
1577
1578     <sect3>
1579      <title>Union, Intersect, Except</title>
1580
1581      <para>
1582       These operations calculate the union, intersection and set theoretic
1583       difference of the tuples derived by two subqueries.
1584
1585       <example>
1586        <title id="union-example">Union, Intersect, Except</title>
1587
1588        <para>
1589         The following query is an example for UNION:
1590
1591         <programlisting>
1592 SELECT S.SNO, S.SNAME, S.CITY
1593     FROM SUPPLIER S
1594     WHERE S.SNAME = 'Jones'
1595 UNION
1596     SELECT S.SNO, S.SNAME, S.CITY
1597     FROM SUPPLIER S
1598     WHERE S.SNAME = 'Adams';
1599         </programlisting>
1600
1601 gives the result:
1602
1603         <programlisting>
1604  SNO | SNAME |  CITY
1605 -----+-------+--------
1606   2  | Jones | Paris
1607   3  | Adams | Vienna
1608         </programlisting>
1609        </para>
1610
1611        <para>
1612         Here is an example for INTERSECT:
1613
1614         <programlisting>
1615 SELECT S.SNO, S.SNAME, S.CITY
1616     FROM SUPPLIER S
1617     WHERE S.SNO &gt; 1
1618 INTERSECT
1619     SELECT S.SNO, S.SNAME, S.CITY
1620     FROM SUPPLIER S
1621     WHERE S.SNO &lt; 3;
1622         </programlisting>
1623
1624         gives the result:
1625
1626         <programlisting>
1627  SNO | SNAME |  CITY
1628 -----+-------+--------
1629   2  | Jones | Paris
1630         </programlisting>
1631
1632         The only tuple returned by both parts of the query is the one having SNO=2.
1633        </para>
1634
1635        <para>
1636         Finally an example for EXCEPT:
1637
1638         <programlisting>
1639 SELECT S.SNO, S.SNAME, S.CITY
1640     FROM SUPPLIER S
1641     WHERE S.SNO &gt; 1
1642 EXCEPT
1643     SELECT S.SNO, S.SNAME, S.CITY
1644     FROM SUPPLIER S
1645     WHERE S.SNO &gt; 3;
1646         </programlisting>
1647
1648         gives the result:
1649
1650         <programlisting>
1651  SNO | SNAME |  CITY
1652 -----+-------+--------
1653   2  | Jones | Paris
1654   3  | Adams | Vienna
1655         </programlisting>
1656        </para>
1657       </example>
1658      </para>
1659     </sect3>
1660    </sect2>
1661
1662    <sect2 id="datadef">
1663     <title>Data Definition</title>
1664
1665     <para>
1666      There is a set of commands used for data definition included in the
1667      <acronym>SQL</acronym> language.
1668     </para>
1669
1670     <sect3 id="create">
1671      <title id="create-title">Create Table</title>
1672
1673      <para>
1674       The most fundamental command for data definition is the
1675       one that creates a new relation (a new table). The syntax of the
1676       <command>CREATE TABLE</command> command is:
1677
1678       <synopsis>
1679 CREATE TABLE <replaceable class="parameter">table_name</replaceable>
1680     (<replaceable class="parameter">name_of_attr_1</replaceable> <replaceable class="parameter">type_of_attr_1</replaceable>
1681      [, <replaceable class="parameter">name_of_attr_2</replaceable> <replaceable class="parameter">type_of_attr_2</replaceable>
1682      [, ...]]);
1683       </synopsis>
1684
1685       <example>
1686        <title id="table-create">Table Creation</title>
1687
1688        <para>
1689         To create the tables defined in
1690         <xref linkend="supplier-fig" endterm="supplier-fig"> the
1691         following <acronym>SQL</acronym> statements are used:
1692
1693         <programlisting>
1694 CREATE TABLE SUPPLIER
1695     (SNO   INTEGER,
1696      SNAME VARCHAR(20),
1697      CITY  VARCHAR(20));
1698      </programlisting>
1699
1700      <programlisting>
1701 CREATE TABLE PART
1702     (PNO   INTEGER,
1703      PNAME VARCHAR(20),
1704      PRICE DECIMAL(4 , 2));
1705      </programlisting>
1706
1707      <programlisting>
1708 CREATE TABLE SELLS
1709     (SNO INTEGER,
1710      PNO INTEGER);
1711         </programlisting>
1712        </para>
1713       </example>
1714      </para>
1715     </sect3>
1716
1717     <sect3>
1718      <title>Data Types in <acronym>SQL</acronym></title>
1719
1720      <para>
1721       The following is a list of some data types that are supported by
1722       <acronym>SQL</acronym>:
1723
1724       <itemizedlist>
1725        <listitem>
1726         <para>
1727          INTEGER: signed fullword binary integer (31 bits precision).
1728         </para>
1729        </listitem>
1730
1731        <listitem>
1732         <para>
1733          SMALLINT: signed halfword binary integer (15 bits precision).
1734         </para>
1735        </listitem>
1736
1737        <listitem>
1738         <para>
1739          DECIMAL (<replaceable class="parameter">p</replaceable>[,<replaceable class="parameter">q</replaceable>]):
1740          signed packed decimal number of up to
1741          <replaceable class="parameter">p</replaceable>
1742          digits, with
1743          <replaceable class="parameter">q</replaceable>
1744          digits to the right of the decimal point.
1745          If <replaceable class="parameter">q</replaceable>
1746          is omitted it is assumed to be 0.
1747         </para>
1748        </listitem>
1749
1750        <listitem>
1751         <para>
1752          FLOAT: signed doubleword floating point number.
1753         </para>
1754        </listitem>
1755
1756        <listitem>
1757         <para>
1758          VARCHAR(<replaceable class="parameter">n</replaceable>):
1759          varying length character string of maximum length
1760          <replaceable class="parameter">n</replaceable>.
1761         </para>
1762        </listitem>
1763
1764        <listitem>
1765         <para>
1766          CHAR(<replaceable class="parameter">n</replaceable>):
1767          fixed length character string of length
1768          <replaceable class="parameter">n</replaceable>.
1769         </para>
1770        </listitem>
1771
1772       </itemizedlist>
1773      </para>
1774     </sect3>
1775
1776     <sect3>
1777      <title>Create Index</title>
1778
1779      <para>
1780       Indexes are used to speed up access to a relation. If a relation <classname>R</classname>
1781       has an index on attribute <classname>A</classname> then we can
1782       retrieve all tuples <replaceable>t</replaceable>
1783       having
1784       <replaceable>t</replaceable>(<classname>A</classname>) = <replaceable>a</replaceable>
1785       in time roughly proportional to the number of such
1786       tuples <replaceable>t</replaceable>
1787       rather than in time proportional to the size of <classname>R</classname>.
1788      </para>
1789
1790      <para>
1791       To create an index in <acronym>SQL</acronym>
1792       the <command>CREATE INDEX</command> command is used. The syntax is:
1793
1794       <programlisting>
1795 CREATE INDEX <replaceable class="parameter">index_name</replaceable>
1796     ON <replaceable class="parameter">table_name</replaceable> ( <replaceable class="parameter">name_of_attribute</replaceable> );
1797       </programlisting>
1798      </para>
1799
1800      <para>
1801       <example>
1802        <title id="index-create">Create Index</title>
1803
1804        <para>
1805         To create an index named I on attribute SNAME of relation SUPPLIER
1806         we use the following statement:
1807
1808       <programlisting>
1809 CREATE INDEX I ON SUPPLIER (SNAME);
1810       </programlisting>
1811      </para>
1812
1813        <para>
1814         The created index is maintained automatically, i.e. whenever a new
1815         tuple is inserted into the relation SUPPLIER the index I is
1816         adapted. Note that the only changes a user can perceive when an
1817         index is present are increased speed for <command>SELECT</command>
1818         and decreases in speed of updates.
1819        </para>
1820       </example>
1821      </para>
1822     </sect3>
1823
1824     <sect3>
1825      <title>Create View</title>
1826
1827      <para>
1828       A view can be regarded as a <firstterm>virtual table</firstterm>,
1829       i.e. a table that
1830       does not <emphasis>physically</emphasis> exist in the database
1831       but looks to the user
1832       as if it does. By contrast, when we talk of a
1833       <firstterm>base table</firstterm> there is
1834       really a physically stored counterpart of each row of the table
1835       somewhere in the physical storage.
1836      </para>
1837
1838      <para>
1839       Views do not have their own, physically separate, distinguishable
1840       stored data. Instead, the system stores the definition of the
1841       view (i.e. the rules about how to access physically stored base
1842       tables in order to materialize the view) somewhere in the system
1843       catalogs (see
1844       <xref linkend="tutorial-catalogs-title" endterm="tutorial-catalogs-title">). For a
1845       discussion on different techniques to implement views refer to
1846 <!--
1847       section
1848       <xref linkend="view-impl" endterm="view-impl">.
1849 -->
1850       <citetitle>SIM98</citetitle>.
1851      </para>
1852
1853      <para>
1854       In <acronym>SQL</acronym> the <command>CREATE VIEW</command>
1855       command is used to define a view. The syntax
1856       is:
1857
1858       <programlisting>
1859 CREATE VIEW <replaceable class="parameter">view_name</replaceable>
1860     AS <replaceable class="parameter">select_stmt</replaceable>
1861       </programlisting>
1862
1863       where <replaceable class="parameter">select_stmt</replaceable>
1864       is a valid select statement as defined
1865       in <xref linkend="select-title" endterm="select-title">.
1866       Note that <replaceable class="parameter">select_stmt</replaceable> is
1867       not executed when the view is created. It is just stored in the
1868       <firstterm>system catalogs</firstterm>
1869       and is executed whenever a query against the view is made.
1870      </para>
1871
1872      <para>
1873       Let the following view definition be given (we use
1874       the tables from
1875       <xref linkend="supplier-fig" endterm="supplier-fig"> again):
1876
1877       <programlisting>
1878 CREATE VIEW London_Suppliers
1879     AS SELECT S.SNAME, P.PNAME
1880         FROM SUPPLIER S, PART P, SELLS SE
1881         WHERE S.SNO = SE.SNO AND
1882               P.PNO = SE.PNO AND
1883               S.CITY = 'London';
1884       </programlisting>
1885      </para>
1886
1887      <para>
1888       Now we can use this <firstterm>virtual relation</firstterm>
1889       <classname>London_Suppliers</classname> as
1890       if it were another base table:
1891
1892       <programlisting>
1893 SELECT * FROM London_Suppliers
1894     WHERE PNAME = 'Screw';
1895       </programlisting>
1896
1897       which will return the following table:
1898
1899       <programlisting>
1900  SNAME | PNAME
1901 -------+-------
1902  Smith | Screw
1903       </programlisting>
1904      </para>
1905
1906      <para>
1907       To calculate this result the database system has to do a
1908       <emphasis>hidden</emphasis>
1909       access to the base tables SUPPLIER, SELLS and PART first. It
1910       does so by executing the query given in the view definition against
1911       those base tables. After that the additional qualifications
1912       (given in the
1913       query against the view) can be applied to obtain the resulting
1914       table.
1915      </para>
1916     </sect3>
1917
1918     <sect3>
1919      <title>Drop Table, Drop Index, Drop View</title>
1920
1921      <para>
1922       To destroy a table (including all tuples stored in that table) the
1923       <command>DROP TABLE</command> command is used:
1924
1925       <programlisting>
1926 DROP TABLE <replaceable class="parameter">table_name</replaceable>;
1927        </programlisting>
1928       </para>
1929
1930      <para>
1931       To destroy the SUPPLIER table use the following statement:
1932
1933       <programlisting>
1934 DROP TABLE SUPPLIER;
1935       </programlisting>
1936      </para>
1937
1938      <para>
1939       The <command>DROP INDEX</command> command is used to destroy an index:
1940
1941       <programlisting>
1942 DROP INDEX <replaceable class="parameter">index_name</replaceable>;
1943       </programlisting>
1944      </para>
1945
1946      <para>
1947       Finally to destroy a given view use the command <command>DROP
1948       VIEW</command>:
1949
1950       <programlisting>
1951 DROP VIEW <replaceable class="parameter">view_name</replaceable>;
1952       </programlisting>
1953      </para>
1954     </sect3>
1955    </sect2>
1956
1957    <sect2>
1958     <title>Data Manipulation</title>
1959
1960     <sect3>
1961      <title>Insert Into</title>
1962
1963      <para>
1964       Once a table is created (see
1965       <xref linkend="create-title" endterm="create-title">), it can be filled
1966       with tuples using the command <command>INSERT INTO</command>.
1967       The syntax is:
1968
1969       <programlisting>
1970 INSERT INTO <replaceable class="parameter">table_name</replaceable> (<replaceable class="parameter">name_of_attr_1</replaceable>
1971     [, <replaceable class="parameter">name_of_attr_2</replaceable> [,...]])
1972     VALUES (<replaceable class="parameter">val_attr_1</replaceable> [, <replaceable class="parameter">val_attr_2</replaceable> [, ...]]);
1973       </programlisting>
1974      </para>
1975
1976      <para>
1977       To insert the first tuple into the relation SUPPLIER (from
1978       <xref linkend="supplier-fig" endterm="supplier-fig">) we use the
1979       following statement:
1980
1981       <programlisting>
1982 INSERT INTO SUPPLIER (SNO, SNAME, CITY)
1983     VALUES (1, 'Smith', 'London');
1984       </programlisting>
1985      </para>
1986
1987      <para>
1988       To insert the first tuple into the relation SELLS we use:
1989
1990       <programlisting>
1991 INSERT INTO SELLS (SNO, PNO)
1992     VALUES (1, 1);
1993       </programlisting>
1994      </para>
1995     </sect3>
1996
1997     <sect3>
1998      <title>Update</title>
1999
2000      <para>
2001       To change one or more attribute values of tuples in a relation the
2002       <command>UPDATE</command> command is used. The syntax is:
2003
2004       <programlisting>
2005 UPDATE <replaceable class="parameter">table_name</replaceable>
2006     SET <replaceable class="parameter">name_of_attr_1</replaceable> = <replaceable class="parameter">value_1</replaceable>
2007         [, ... [, <replaceable class="parameter">name_of_attr_k</replaceable> = <replaceable class="parameter">value_k</replaceable>]]
2008     WHERE <replaceable class="parameter">condition</replaceable>;
2009       </programlisting>
2010      </para>
2011
2012      <para>
2013       To change the value of attribute PRICE of the part 'Screw' in the
2014       relation PART we use:
2015
2016       <programlisting>
2017 UPDATE PART
2018     SET PRICE = 15
2019     WHERE PNAME = 'Screw';
2020       </programlisting>
2021      </para>
2022
2023      <para>
2024       The new value of attribute PRICE of the tuple whose name is 'Screw' is
2025       now 15.
2026      </para>
2027     </sect3>
2028
2029     <sect3>
2030      <title>Delete</title>
2031
2032      <para>
2033       To delete a tuple from a particular table use the command DELETE
2034       FROM. The syntax is:
2035
2036       <programlisting>
2037 DELETE FROM <replaceable class="parameter">table_name</replaceable>
2038     WHERE <replaceable class="parameter">condition</replaceable>;
2039       </programlisting>
2040      </para>
2041
2042      <para>
2043       To delete the supplier called 'Smith' of the table SUPPLIER the
2044       following statement is used:
2045
2046       <programlisting>
2047 DELETE FROM SUPPLIER
2048     WHERE SNAME = 'Smith';
2049       </programlisting>
2050      </para>
2051     </sect3>
2052    </sect2>
2053
2054    <sect2 id="tutorial-catalogs">
2055     <title id="tutorial-catalogs-title">System Catalogs</title>
2056
2057     <para>
2058      In every <acronym>SQL</acronym> database system
2059      <firstterm>system catalogs</firstterm> are used to keep
2060      track of which tables, views indexes etc. are defined in the
2061      database. These system catalogs can be queried as if they were normal
2062      relations. For example there is one catalog used for the definition of
2063      views. This catalog stores the query from the view definition. Whenever
2064      a query against a view is made, the system first gets the
2065      <firstterm>view definition query</firstterm> out of the catalog
2066      and materializes the view
2067      before proceeding with the user query (see
2068 <!--
2069       section
2070       <xref linkend="view-impl" endterm="view-impl">.
2071     <citetitle>SIM98</citetitle>
2072 -->
2073      <xref linkend="SIM98" endterm="SIM98">
2074      for a more detailed
2075      description). For more information about system catalogs refer to
2076      <xref linkend="DATE04" endterm="DATE04">.
2077     </para>
2078    </sect2>
2079
2080    <sect2>
2081     <title>Embedded <acronym>SQL</acronym></title>
2082
2083     <para>
2084      In this section we will sketch how <acronym>SQL</acronym> can be
2085      embedded into a host language (e.g. <literal>C</literal>).
2086      There are two main reasons why we want to use <acronym>SQL</acronym>
2087      from a host language:
2088
2089      <itemizedlist>
2090       <listitem>
2091        <para>
2092         There are queries that cannot be formulated using pure <acronym>SQL</acronym>
2093         (i.e. recursive queries). To be able to perform such queries we need a
2094         host language with a greater expressive power than
2095         <acronym>SQL</acronym>.
2096        </para>
2097       </listitem>
2098
2099       <listitem>
2100        <para>
2101         We simply want to access a database from some application that
2102         is written in the host language (e.g. a ticket reservation system
2103         with a graphical user interface is written in C and the information
2104         about which tickets are still left is stored in a database that can be
2105         accessed using embedded <acronym>SQL</acronym>).
2106        </para>
2107       </listitem>
2108      </itemizedlist>
2109     </para>
2110
2111     <para>
2112      A program using embedded <acronym>SQL</acronym>
2113      in a host language consists of statements
2114      of the host language and of
2115      <firstterm>embedded <acronym>SQL</acronym></firstterm>
2116      (<acronym>ESQL</acronym>) statements. Every <acronym>ESQL</acronym>
2117      statement begins with the keywords <command>EXEC SQL</command>.
2118      The <acronym>ESQL</acronym> statements are
2119      transformed to statements of the host language
2120      by a <firstterm>precompiler</firstterm>
2121      (which usually inserts
2122      calls to library routines that perform the various <acronym>SQL</acronym>
2123      commands).
2124     </para>
2125
2126     <para>
2127      When we look at the examples throughout
2128      <xref linkend="select-title" endterm="select-title"> we
2129      realize that the result of the queries is very often a set of
2130      tuples. Most host languages are not designed to operate on sets so we
2131      need a mechanism to access every single tuple of the set of tuples
2132      returned by a SELECT statement. This mechanism can be provided by
2133      declaring a <firstterm>cursor</firstterm>.
2134      After that we can use the <command>FETCH</command> command to
2135      retrieve a tuple and set the cursor to the next tuple.
2136     </para>
2137
2138     <para>
2139      For a detailed discussion on embedded <acronym>SQL</acronym>
2140      refer to
2141      <xref linkend="DATE97" endterm="DATE97">,
2142      <xref linkend="DATE04" endterm="DATE04">,
2143      or
2144      <xref linkend="ULL88" endterm="ULL88">.
2145     </para>
2146    </sect2>
2147   </sect1>
2148  </chapter>