defguide/en/ch01.xml

   1 <chapter id="ch-gssgml">
   2 <?dbhtml filename="ch01.html"?>
   3 <chapterinfo>
   4 <pubdate>$Date: 2002-12-30 04:20:58 +0800 (Mon, 30 Dec 2002) $</pubdate>
   5 <releaseinfo>$Revision: 2357 $</releaseinfo>
   6 </chapterinfo>
   7 <title>Getting Started<?lb?>with &SGML;/&XML;</title>
   8 <para>
   9 <indexterm id="getstartSGML" class="startofrange"><primary>SGML</primary>
  10   <secondary>getting started</secondary></indexterm>
  11 <indexterm id="XMLgetstart" class="startofrange"><primary>XML</primary>
  12   <secondary>getting started</secondary></indexterm>
  13
  14 This chapter is intended to provide a quick introduction to structured
  15 markup (&SGML; and &XML;). If you're already familiar with &SGML; or
  16 &XML;, you only need to skim this chapter.
  17 </para>
  18 <para>
  19 To work with DocBook, you need to understand a few basic concepts of
  20 structured editing in general, and DocBook, in particular. That's
  21 covered here. You also need some concrete experience with the way a
  22 DocBook document is structured. That's covered in the next chapter.
  23 </para>
  24 <sect1 id="ch01-compare">
  25 <title>&HTML; and &SGML; vs. &XML;</title>
  26 <para>
  27 <indexterm><primary>HTML</primary>
  28   <secondary>XML vs.</secondary></indexterm>
  29 <indexterm><primary>Hypertext Markup Language</primary><see>HTML</see></indexterm>
  30 <indexterm><primary>SGML</primary>
  31   <secondary>HTML vs.</secondary></indexterm>
  32
  33 This chapter doesn't assume that you know what &HTML; is, but if you
  34 do, you have a starting point for understanding structured
  35 markup. &HTML; (Hypertext Markup Language) is a way of marking up text
  36 and graphics so that the most popular web browsers can interpret
  37 them. &HTML; consists of a set of markup tags with specific
  38 meanings. Moreover, &HTML; is a very basic type of &SGML; markup that
  39 is easy to learn and easy for computer applications to generate. But
  40 the simplicity of &HTML; is both its virtue and its weakness. Because
  41 of &HTML;'s limitations, web users and programmers have had to extend
  42 and enhance it by a series of customizations and revisions that still
  43 fall short of accommodating current, to say nothing of future, needs.
  44 </para>
  45 <para>
  46
  47
  48 &SGML;, on the other hand, is an international standard that describes
  49 how markup languages are defined. &SGML; does not consist of
  50 particular tags or the rules for their usage. &HTML; is an example of
  51 a markup language defined in &SGML;.
  52 </para>
  53 <para>
  54 <indexterm><primary>XML</primary>
  55   <secondary>HTML and SGML vs.</secondary></indexterm>
  56
  57 &XML; promises an intelligent improvement over &HTML;, and
  58 compatibility with it is already being built into the most popular web
  59 browsers. &XML; is not a new markup language designed to compete with
  60 &HTML;, and it's not designed to create conversion headaches for people
  61 with tons of &HTML; documents. &XML; is intended to alleviate
  62 compatibility problems with browser software; it's a new, easier
  63 version of the standard rules that govern the markup itself, or, in
  64 other words, a new version of &SGML;. The rules of &XML; are designed
  65 to make it easier to write both applications that interpret its type
  66 of markup and applications that generate its markup. &XML; was
  67 developed by a team of &SGML; experts who understood and sought to
  68 correct the problems of learning and implementing &SGML;. &XML; is
  69 also <emphasis>extensible</emphasis> markup, which means that it is
  70 customizable. A browser or word processor that is &XML;-capable will
  71 be able to read any &XML;-based markup language that an individual
  72 user defines.
  73 </para>
  74 <para>
  75 In this book, we tend to describe things in terms of &SGML;, but where
  76 there are differences between &SGML; and &XML; (and there are only a
  77 few), we point them out. For our purposes, it doesn't really matter
  78 whether you use &SGML; or &XML;.
  79 </para>
  80 <para>
  81 During the coming months, we anticipate that &XML;-aware web browsers
  82 and other tools will become available. Nevertheless, it's not
  83 unreasonable to do your authoring in &SGML; and your online publishing
  84 in &XML; or &HTML;. By the same token, it's not unreasonable to do
  85 your authoring in &XML;.
  86 </para>
  87 </sect1>
  88 <sect1 id="s1-basic-concepts">
  89 <title>Basic &SGML;/&XML; Concepts</title>
  90 <para>
  91 <indexterm id="SGMLbasicconceptch01" class="startofrange"><primary>SGML</primary>
  92   <secondary>basic concepts</secondary></indexterm>
  93 <indexterm id="XMLbasicconceptch01" class="startofrange"><primary>XML</primary>
  94   <secondary>basic concepts</secondary></indexterm>
  95
  96 <indexterm><primary>XML</primary>
  97   <secondary>basic concepts</secondary></indexterm>
  98 <indexterm><primary>structured semantic markup language</primary><see>SGML</see></indexterm>
  99
 100 Here are the basic &SGML;/&XML; concepts you need to grasp:</para>
 101 <itemizedlist>
 102 <listitem><para>structured, semantic markup</para>
 103 </listitem>
 104 <listitem><para>elements</para>
 105 </listitem>
 106 <listitem><para>attributes</para>
 107 </listitem>
 108 <listitem><para>entities</para>
 109 </listitem>
 110 </itemizedlist>
 111 <sect2>
 112 <title>Structured and Semantic Markup</title>
 113 <para>
 114 <indexterm><primary>appearance</primary>
 115   <secondary>SGML and</secondary></indexterm>
 116 <indexterm><primary>structured markup</primary></indexterm>
 117 <indexterm><primary>semantic markup</primary></indexterm>
 118
 119 An essential characteristic of structured markup is that it explicitly
 120 distinguishes (and accordingly &ldquo;marks up&rdquo; within a
 121 document) the structure and semantic content of a document. It does
 122 not mark up the way in which the document will appear to the reader,
 123 in print or otherwise.
 124 </para>
 125 <para>
 126 In the days before word processors it was common for a typed
 127 manuscript to be submitted to a publisher. The manuscript identified
 128 the logical structures of the documents (chapters, section titles, and
 129 so on), but said nothing about its appearance. Working independently
 130 of the author, a designer then developed a specification for the
 131 appearance of the document, and a typesetter marked up and applied the
 132 designer's format to the document.
 133 </para>
 134 <para>
 135 <indexterm><primary>presentation</primary><see>appearance</see></indexterm>
 136 <indexterm><primary>HTML</primary>
 137   <secondary>appearance, limitions of specification</secondary></indexterm>
 138
 139 Because presentation or appearance is usually based on structure and
 140 content, &SGML; markup logically precedes and generally determines the
 141 way a document will look to a reader. If you are familiar with strict,
 142 simple &HTML; markup, you know that a given document that is
 143 structurally the same can also look different on different
 144 computers. That's because the markup does not specify many aspects of
 145 a document's appearance, although it does specify many aspects of a
 146 document's structure.
 147 </para>
 148 <para>
 149 <indexterm><primary>text</primary>
 150   <secondary>formatting</secondary></indexterm>
 151 <indexterm><primary>word processors, SGML/XML vs.</primary></indexterm>
 152 Many writers type their text into a word processor, line-by-line and
 153 word-for-word, italicizing technical terms, underlining words for
 154 emphasis, or setting section headers in a font complementary to the
 155 body text, and finally, setting the headers off with a few carriage
 156 returns fore and aft. The format such a writer imposes on the words on
 157 the screen imparts structure to the document by changing its
 158 appearance in ways that a reader can more or less reliably decode.
 159 The reliability depends on how consistently and unambiguously the
 160 changes in type and layout are made. By contrast, an &SGML;/&XML;
 161 markup of a section header explicitly specifies that a specific piece
 162 of text is a section header. This assertion does not specify the
 163 presentation or appearance of the section header, but it makes the
 164 fact that the text is a section header completely unambiguous.
 165 </para>
 166 <para>
 167 <indexterm><primary>elements</primary>
 168   <secondary>SGML/XML, using</secondary></indexterm>
 169 <indexterm><primary>titles</primary>
 170   <secondary>top-level sections</secondary></indexterm>
 171 <indexterm><primary>top-level sections</primary></indexterm>
 172 <indexterm><primary>characters</primary>
 173   <secondary>character sets</secondary>
 174     <tertiary>SGML documents</tertiary></indexterm>
 175 <indexterm><primary>ASCII character set</primary></indexterm>
 176 <indexterm><primary>XML</primary>
 177   <secondary>Unicode character set</secondary></indexterm>
 178 <indexterm><primary>Unicode character set</primary>
 179   <secondary>XML documents, using</secondary></indexterm>
 180
 181 &SGML; and &XML; use named elements, delimited by angle brackets
 182 (&ldquo;&lt;&rdquo; and &ldquo;>&rdquo;) to identify the markup in a
 183 document. In DocBook, a top-level section is <sgmltag class="starttag">sect1</sgmltag>, so the title of a top-level section
 184 named <emphasis>My First-Level Header</emphasis> would be identified
 185 like this:
 186 </para>
 187
 188 <screen>&lt;sect1>&lt;title>My First-Level Header&lt;/title> </screen>
 189
 190 <para>Note the following features of this markup:</para>
 191 <variablelist>
 192 <varlistentry>
 193 <term>Clarity</term>
 194 <listitem><para>A title begins with <sgmltag class="starttag">
 195 title</sgmltag> and ends with <sgmltag class="endtag">title</sgmltag>. The <sgmltag>sect1</sgmltag> also has
 196 an ending <sgmltag class="endtag">sect1</sgmltag>, but we haven't
 197 shown the whole section so it's not visible.</para>
 198 </listitem>
 199 </varlistentry>
 200 <varlistentry>
 201 <term>Hierarchy</term>
 202 <listitem><para>&ldquo;My First-Level
 203 Header&rdquo; is the title of a top-level section because it occurs
 204 inside a title in a <sgmltag>sect1</sgmltag>. A
 205 <sgmltag>title</sgmltag> element occurring somewhere else, say in a
 206 <sgmltag>Chapter</sgmltag> element, would be the title of the
 207 chapter.</para>
 208 </listitem>
 209 </varlistentry>
 210 <varlistentry>
 211 <term>Plain text</term>
 212 <listitem><para>&SGML; documents can have varying character sets, but
 213 most are <acronym>ASCII</acronym>. &XML; documents use the Unicode
 214 character set. This makes &SGML; and &XML; documents highly portable
 215 across systems and tools.</para>
 216 </listitem>
 217 </varlistentry>
 218 </variablelist>
 219 <para>
 220 <indexterm><primary>appearance</primary>
 221   <secondary>SGML and</secondary></indexterm>
 222 <indexterm><primary>formatting</primary>
 223   <secondary>SGML documents</secondary></indexterm>
 224 <indexterm><primary>filenames</primary>
 225   <secondary>tags, specifying</secondary></indexterm>
 226 <indexterm><primary>semantic content, SGML marking for</primary></indexterm>
 227
 228 In an &SGML; document, there is no obligatory difference between the
 229 size or face of the type in a first-level section header and the title
 230 of a book in a footnote or the first sentence of a body paragraph. All
 231 &SGML; files are simple text files without font changes or special
 232 characters.<footnote><para>Some structured editors apply style to the
 233 document while it's being edited, using fonts and color to make the
 234 editing task easier, but this stylistic information is not stored in
 235 the actual &SGML;/&XML; document. Instead, it is provided by the
 236 editing application.</para></footnote> Similarly, an &SGML; document
 237 does not specify the words in a text that are to be set in italic,
 238 bold, or roman type. Instead, &SGML; marks certain kinds of texts for
 239 their semantic content. For example, if a particular word is the name
 240 of a file, then the tags around it should specify that it is a
 241 filename:
 242 </para>
 243
 244 <screen>Many mail programs read configuration information from the
 245 users <sgmltag class="starttag">filename</sgmltag>.mailrc<sgmltag class="endtag">filename</sgmltag> file.</screen>
 246
 247 <para>
 248 <indexterm><primary>stylesheets</primary>
 249   <secondary>SGML documents, specifying appearance</secondary></indexterm>
 250 <indexterm><primary>appearance</primary>
 251   <secondary>structure or content vs.</secondary></indexterm>
 252 <indexterm><primary>CSS stylesheets</primary></indexterm>
 253 <indexterm><primary>FOSI stylesheets</primary></indexterm>
 254 <indexterm><primary>DSSSL</primary>
 255   <secondary>stylesheets</secondary></indexterm>
 256 <indexterm><primary>XSL stylesheets</primary></indexterm>
 257 <indexterm><primary>XML</primary>
 258   <secondary>XSL stylesheets</secondary></indexterm>
 259
 260 If the meaning of a phrase is particularly audacious, it might get
 261 tagged for boldness of thought instead of appearance. An &SGML;
 262 document contains all the information that a typesetter needs to lay
 263 out and typeset a printed page in the most effective and consistent
 264 way, but it does not specify the layout or the
 265 type.<footnote><para>The distinction between appearance or
 266 presentation and structure or content is essential to &SGML;, but
 267 there is a way to specify the appearance of an &SGML; document: attach
 268 a stylesheet to it. There are several standards for such stylesheets:
 269 <acronym>CSS</acronym>, <acronym>XSL</acronym>, <acronym>FOSI</acronym>s,
 270 and <acronym>DSSSL</acronym>.
 271 See <xref linkend="ch-publish"/>.</para></footnote>
 272 </para>
 273 <para>
 274 <indexterm><primary>DocBook DTD</primary>
 275   <secondary>document type definition</secondary></indexterm>
 276 <indexterm><primary>declarations</primary>
 277   <secondary>SGML documents</secondary></indexterm>
 278 <indexterm><primary>document type definitions</primary><see>DTDs</see></indexterm>
 279 <indexterm><primary>tags</primary>
 280   <secondary>names</secondary>
 281     <tertiary>document type definition</tertiary></indexterm>
 282 <indexterm><primary>combination rules (DTD)</primary></indexterm>
 283 <indexterm><primary>DTDs</primary></indexterm>
 284 <indexterm><primary>DTDs</primary>
 285   <secondary>DocBook</secondary><see>DocBook DTD</see></indexterm>
 286
 287
 288 Not only is the structure of an &SGML;/&XML; document explicit, but it
 289 is also carefully controlled. An &SGML; document makes reference to a
 290 set of declarations&mdash;a document type definition
 291 (&DTD;)&mdash;that contains an inventory of tag names and specifies
 292 the combination rules for the various structural and semantic features
 293 that make up a document. What the distinctive features are and how
 294 they should be combined is &ldquo;arbitrary&rdquo; in the sense that
 295 almost any selection of features and rules of composition is
 296 theoretically possible. The DocBook &DTD; chooses a particular set of
 297 features and rules for its users.
 298 </para>
 299 <para>
 300 <indexterm><primary>sections</primary>
 301   <secondary>ordering, DocBook DTD rules (example)</secondary></indexterm>
 302 Here is a specific example of how the DocBook &DTD; works. DocBook
 303 specifies that a third-level section can follow a second-level section
 304 but cannot follow a first-level section without an intervening
 305 second-level section.
 306 </para>
 307 <informaltable>
 308 <tgroup cols="2">
 309 <colspec colname="COLSPEC0" colwidth="2.50in"/>
 310 <colspec colname="COLSPEC1" colwidth="2.50in"/>
 311 <tbody>
 312 <row>
 313 <entry colname="COLSPEC0" valign="top"><para>This is valid:</para><screen>&lt;sect1>&lt;title>...&lt;/title>
 314   &lt;sect2>&lt;title>...&lt;/title>
 315     &lt;sect3>&lt;title>...&lt;/title>
 316       ...
 317     &lt;/sect3>
 318   &lt;/sect2>
 319 &lt;/sect1>
 320 </screen></entry>
 321 <entry colname="COLSPEC1" valign="top"><para>This is not:</para><screen>&lt;sect1>&lt;title>...&lt;/title>
 322   &lt;sect3>&lt;title>...&lt;/title>
 323     ...
 324   &lt;/sect3>
 325 &lt;/sect1>
 326 </screen></entry>
 327 </row>
 328 </tbody>
 329 </tgroup>
 330 </informaltable>
 331 <para>
 332 <indexterm><primary>parsers</primary>
 333   <secondary>validating</secondary></indexterm>
 334 <indexterm><primary>validation</primary>
 335   <secondary>SGML documents</secondary></indexterm>
 336 <indexterm><primary>DTDs</primary>
 337   <secondary>validating SGML documents against</secondary></indexterm>
 338 <indexterm><primary>instance (DocBook document)</primary></indexterm>
 339
 340 Because an &SGML;/&XML; document has an associated &DTD; that
 341 describes the valid, logical structures of the document, you can test
 342 the logical structure of any particular document against the
 343 &DTD;. This process is performed by a <firstterm>parser</firstterm>. An
 344 &SGML; processor must begin by parsing the document and determining if
 345 it is valid, that is, if it conforms to the rules specified in the
 346 &DTD;. <!--<phrase role="xml">-->&XML; processors are not required to
 347 check for validity, but it's always a good idea to check for validity
 348 when authoring.<!--</phrase>--> Because you can test and validate the
 349 structure of an &SGML;/&XML; document with software, a DocBook
 350 document containing a first-level section followed immediately by a
 351 third-level section will be identified as invalid, meaning that it's
 352 not a valid <firstterm>instance</firstterm> or example of a document
 353 defined by the DocBook &DTD;. Presumably, a document with a logical
 354 structure won't normally jump from a first- to a third-level section,
 355 so the rule is a safeguard&mdash;but not a guarantee&mdash;of good
 356 writing, or at the very least, reasonable structure. A parser also
 357 verifies that the names of the tags are correct and that tags
 358 requiring an ending tag have them. This means that a valid document is
 359 also one that should format correctly, without runs of paragraphs
 360 incorrectly appearing in bold type or similar monstrosities that
 361 everyone has seen in print at one time or another. For more
 362 information about &SGML;/&XML; parsers, see <xref linkend="ch-parse"/>.
 363 </para>
 364 <para>
 365 In general, adherence to the explicit rules of structure and markup in
 366 a &DTD; is a useful and reassuring guarantee of consistency and
 367 reliability within documents, across document sets, and over
 368 time. This makes &SGML;/&XML; markup particularly desirable to
 369 corporations or governments that have large sets of documents to
 370 manage, but it is a boon to the individual writer as well.
 371 </para>
 372 <sect3>
 373 <title>How can this markup help you?</title>
 374 <para>
 375 <indexterm><primary>semantic markup</primary>
 376   <secondary>presentation media, different</secondary></indexterm>
 377 Semantic markup makes your documents more amenable to interpretation
 378 by software, especially publishing software. You can publish a white
 379 paper, authored as a DocBook <sgmltag>Article</sgmltag>, in the
 380 following formats:
 381 <indexterm><primary>articles</primary>
 382   <secondary>formats, listed</secondary></indexterm>
 383 <indexterm><primary>journal articles</primary></indexterm>
 384
 385 </para>
 386 <itemizedlist>
 387 <listitem><para>On the Web in &HTML;</para>
 388 </listitem>
 389 <listitem><para>As a standalone document on 8&frac12;&times;11 paper</para>
 390 </listitem>
 391 <listitem><para>As part of a quarterly journal, in a 6&times;9 format
 392 </para>
 393 </listitem>
 394 <listitem><para>In Braille</para>
 395 </listitem>
 396 <listitem><para>In audio</para>
 397 </listitem>
 398 </itemizedlist>
 399 <para>
 400 You can produce each of these publications from exactly the same
 401 source document using the presentational techniques best suited to
 402 both the content of the document and the presentation medium. This
 403 versatility also frees the author to concentrate on the document
 404 content. For example, as we write this book, we don't know exactly how
 405 O'Reilly will choose to present chapter headings, bulleted lists,
 406 &SGML; terms, or any of the other semantic features. And we don't
 407 care. It's irrelevant; whatever presentation is chosen, the &SGML;
 408 sources will be transformed automatically into that style.
 409 </para>
 410 <para>
 411 Semantic markup can relieve the author of other, more significant
 412 burdens as well (after all, careful use of paragraph and character
 413 styles in a word processor document theoretically allows us to change
 414 the presentation independently from the document). Using semantic
 415 markup opens up your documents to a world of possibilities. Documents
 416 become, in a loose sense, databases of information. Programs can
 417 compile, retrieve, and otherwise manipulate the documents in
 418 predictable, useful ways.
 419 </para>
 420 <para>
 421 <indexterm><primary>links</primary>
 422   <secondary>SGML documents, maintaining</secondary></indexterm>
 423 <indexterm><primary>elements</primary>
 424   <secondary>linking to references</secondary></indexterm>
 425
 426 Consider the online version of this book: almost every element name
 427 (<sgmltag>Article</sgmltag>, <sgmltag>Book</sgmltag>, and so on) is a
 428 hyperlink to the reference page that describes that
 429 element. Maintaining these links by hand would be tedious and might be
 430 unreliable, as well. Instead, every element name is marked as an
 431 element using <sgmltag>SGMLTag</sgmltag>: a <sgmltag>Book</sgmltag> is
 432 a <literal><sgmltag class="starttag">sgmltag</sgmltag>Book<sgmltag class="endtag">sgmltag</sgmltag></literal>.
 433 </para>
 434 <para>
 435 Because each element name in this book is tagged semantically, the
 436 program that produces the online version can determine which
 437 occurrences of the word &ldquo;book&rdquo; in the text are actually
 438 references to the <sgmltag>Book</sgmltag> element. The program can
 439 then automatically generate the appropriate hyperlink when it should.
 440 </para>
 441 <para>
 442 There's one last point to make about the versatility of &SGML;
 443 documents: how much you have depends on the &DTD;. If you take a good
 444 photo with a high resolution lens, you can print it and copy it and
 445 scan it and put it on the Web, and it will look good. If you start
 446 with a low-resolution picture it will not survive those
 447 transformations so well. DocBook &SGML;/&XML; has this advantage over,
 448 say, &HTML;: DocBook has specific and unambiguous semantic and
 449 structural markup, because you can convert its documents with ease
 450 into other presentational forms, and search them more precisely. If
 451 you start with &HTML;, whose markup is at a lower resolution than
 452 DocBook's, your versatility and searchability is substantially
 453 restricted and cannot be improved.
 454 </para>
 455 </sect3>
 456 <sect3>
 457 <title>What are the shortcomings to structural authoring?</title>
 458 <para>
 459 There are a few significant shortcomings to structured authoring:
 460 </para>
 461 <itemizedlist>
 462 <listitem><para>It requires a significant change in the authoring
 463 process. Writing structured documents is very different from writing
 464 with a typical word processor, and change is difficult. In particular,
 465 authors don't like giving up control over the appearance of their
 466 words especially now that they have acquired it with the advent of
 467 word processors. But many publishing companies need authors to
 468 relinquish that control, because book design and production remains
 469 their job, not their authors'.</para>
 470 </listitem>
 471 <listitem><para>Because semantics are separate from appearance, in
 472 order to publish an &SGML;/&XML; document, a stylesheet or other tool
 473 must create the presentational form from the structural form. Writing
 474 stylesheets is a skill in its own right, and though not every author
 475 among a group of authors has to learn how to write them, someone has
 476 to.</para>
 477 </listitem>
 478 <listitem><para>Authoring tools for &SGML; documents can generally be
 479 pretty expensive. While it's not entirely unreasonable to edit
 480 &SGML;/&XML; documents with a simple text editor, it's a bit tedious
 481 to do so. However, there are a few free tools that are
 482 &SGML;-aware. The widespread interest in &XML; may well produce new,
 483 clever, and less expensive &XML; editing tools.</para>
 484 </listitem>
 485 </itemizedlist>
 486 </sect3>
 487 </sect2>
 488 </sect1>
 489 <sect1 id="ch01-elemattr">
 490 <title>Elements and Attributes</title>
 491 <para>
 492 <indexterm><primary>elements</primary>
 493   <secondary>attributes</secondary></indexterm>
 494 <indexterm><primary>attributes</primary>
 495   <secondary>elements and</secondary></indexterm>
 496 <indexterm><primary>elements</primary>
 497   <secondary>attributes</secondary><seealso>attributes</seealso></indexterm>
 498 <indexterm><primary>empty elements</primary></indexterm>
 499 <indexterm><primary>end tags</primary>
 500   <secondary>empty elements, not requiring</secondary></indexterm>
 501 <indexterm><primary>cross references</primary></indexterm>
 502 <indexterm><primary>entities</primary>
 503   <secondary>SGML/XML markup</secondary></indexterm>
 504
 505 &SGML;/&XML; markup consists primarily of
 506 <firstterm>elements</firstterm>, <firstterm>attributes</firstterm>,
 507 and <firstterm>entities</firstterm>. Elements are the terms we have
 508 been speaking about most, like <sgmltag>sect1</sgmltag>, that describe
 509 a document's content and structure. Most elements are represented by pairs
 510 of tags and
 511 mark the start and end of the construct they surround&mdash;for
 512 example, the &SGML; source for this particular paragraph begins with a
 513 <sgmltag class="starttag">para</sgmltag> tag and ends with a <sgmltag class="endtag">para</sgmltag> tag. Some elements are
 514 &ldquo;empty&rdquo; (such as DocBook's cross-reference element,
 515 <sgmltag class="starttag">xref</sgmltag>) and require no end
 516 tag.<footnote><para>In &XML;, this is written as
 517 <literal>&lt;xref/></literal>, as we'll see in the section <xref linkend="ch02-typexml"/>.</para></footnote>
 518 </para>
 519 <para>
 520 <indexterm><primary>ID attribute</primary>
 521   <secondary>SGML start tags</secondary></indexterm>
 522 <indexterm><primary>tags</primary>
 523   <secondary>identifiers (SGML)</secondary></indexterm>
 524 <indexterm><primary>end tags</primary>
 525   <secondary>attributes and</secondary></indexterm>
 526 <indexterm><primary>start tags</primary>
 527   <secondary>attribute ID, containing</secondary></indexterm>
 528
 529 Elements can, but don't necessarily, include one or more attributes,
 530 which are additional terms that extend the function or refine the
 531 content of a given element. For instance, in DocBook a <sgmltag class="starttag">sect1</sgmltag> start tag can contain an
 532 identifier&mdash;an <sgmltag class="attribute">id</sgmltag>
 533 attribute&mdash;that will ultimately allow the writer to
 534 cross-reference it or enable a reader to retrieve it. End tags cannot
 535 contain attributes. A <sgmltag class="starttag">sect1</sgmltag>
 536 element with an <sgmltag class="attribute">id</sgmltag> attribute
 537 looks like this:
 538 </para>
 539
 540 <screen>&lt;sect1 id="<replaceable>idvalue</replaceable>"&gt;</screen>
 541
 542 <para>
 543 <indexterm><primary>namespaces</primary>
 544   <secondary>XML tags</secondary></indexterm>
 545 <indexterm><primary>tags</primary>
 546   <secondary>namespaces (XML)</secondary></indexterm>
 547 <indexterm><primary>validation</primary>
 548   <secondary>namespace tags (XML), problems</secondary></indexterm>
 549 <indexterm><primary>XML</primary>
 550   <secondary>namespaces, using</secondary></indexterm>
 551
 552 In &SGML;, the catalog of attributes that can occur on an element is
 553 predefined. You cannot add arbitrary attribute names to an
 554 element. Similarly, the values allowed for each attribute are
 555 predefined. In &XML;, the use of <ulink url="http://www.w3.org/TR/REC-xml-names/">namespaces</ulink> may allow you
 556 to add additional attributes to an element, but as of this writing,
 557 there's no way to perform validation on those attributes.
 558 </para>
 559 <para>
 560 <indexterm><primary>SystemItem element</primary>
 561   <secondary>subdividing into URL and email addresses</secondary></indexterm>
 562 <indexterm><primary>Role attribute</primary>
 563   <secondary>systemitem tags, subdividing</secondary></indexterm>
 564
 565 The <sgmltag class="attribute">id</sgmltag> attribute is one half of a
 566 cross reference. An <sgmltag class="attribute">idref</sgmltag>
 567 attribute on another element, for example <sgmltag class="starttag">xref linkend=&rdquo;idvalue&rdquo;
 568 </sgmltag>, provides the other half. These attributes provide whatever
 569 application might process the &SGML; source with the data needed
 570 either to make a hypertext link or to substitute a named and/or numbered cross
 571 reference in place of the <sgmltag class="starttag">
 572 xref</sgmltag>. Another use for attributes is to specify subclasses of
 573 certain elements. For instance, you can subdivide DocBook's <sgmltag class="starttag">systemitem</sgmltag> into <acronym>URL</acronym>s and
 574 email addresses by making the content of the <sgmltag class="attribute">role</sgmltag> attribute the distinction between
 575 them, as in <sgmltag class="starttag">systemitem role="URL"</sgmltag>
 576 versus <sgmltag class="starttag">systemitem
 577 role="emailaddr"</sgmltag>.
 578 </para>
 579 </sect1>
 580 <sect1 id="s-entities"><title>Entities</title>
 581 <para>
 582 <indexterm><primary>entities</primary>
 583   <secondary>functions</secondary></indexterm>
 584 <indexterm><primary>parsed entities</primary></indexterm>
 585 <indexterm><primary>unparsed entities</primary></indexterm>
 586 <indexterm><primary>names</primary>
 587   <secondary>assigning to data (entities)</secondary></indexterm>
 588
 589 Entities are a fundamental concept in &SGML; and &XML;, and can be
 590 somewhat daunting at first. They serve a number of related, but
 591 slightly different functions, and this makes them a little bit
 592 complicated.
 593 </para>
 594 <para>
 595 In the most general terms, entities allow you to assign a name to some
 596 chunk of data, and use that name to refer to that data. The complexity
 597 arises because there are two different contexts in which you can use
 598 entities (in the &DTD; and in your documents), two types of entities
 599 (parsed and unparsed), and two or three different ways in which the
 600 entities can point to the chunk of data that they name.
 601 </para>
 602 <para>
 603 In the rest of this section, we'll describe each of the commonly
 604 encountered entity types. If you find the material in this section
 605 confusing, feel free to skip over it now and come back to it later.
 606 We'll refer to the different types of entities as the need arises in
 607 our discussion of DocBook. Come back to this section when you're
 608 looking for more detail.
 609 </para>
 610 <para>
 611 Entities can be divided into two broad categories, <firstterm>general
 612 entities</firstterm> and <firstterm>parameter entities</firstterm>.
 613 Parameter entities are most often used in the &DTD;, not in documents,
 614 so we'll describe them last. Before you can use any type of entity, it
 615 must be formally declared. This is typically done in the document
 616 prologue, as we'll explain in <xref linkend="ch-create"/>, but we will
 617 show you how to declare each of the entities discussed here.
 618 </para>
 619 <sect2><title>General Entities</title>
 620 <para>
 621 <indexterm><primary>general entities</primary>
 622   <secondary>external and internal</secondary></indexterm>
 623 <indexterm><primary>entities</primary>
 624   <secondary>general</secondary></indexterm>
 625 In use, general entities are introduced with an ampersand (&amp;) and end with
 626 a semicolon (;). Within the category of general entities, there are
 627 two types: <firstterm>internal general entities</firstterm> and
 628 <firstterm>external general entities</firstterm>.
 629 </para>
 630 <sect3><title>Internal general entities</title>
 631 <para>
 632 <indexterm><primary>internal general entities</primary></indexterm>
 633 <indexterm><primary>names</primary>
 634   <secondary>text, associating with (internal general entities)</secondary></indexterm>
 635 <indexterm><primary>text</primary>
 636   <secondary>entity, declaring as</secondary></indexterm>
 637
 638 With internal entities, you can associate an essentially arbitrary
 639 piece of text (which may have other markup, including references to
 640 other entities) with a name. You can then include that text by
 641 referring to its name. For example, if your document frequently refers
 642 to, say, &ldquo;O'Reilly &amp; Associates,&rdquo; you might declare it
 643 as an entity:
 644 </para>
 645
 646 <screen><![CDATA[<!ENTITY ora "O'Reilly &amp; Associates">]]></screen>
 647
 648 <para>
 649 Then, instead of typing it out each time, you can insert it as needed
 650 in your document with the entity reference <sgmltag class="genentity">ora</sgmltag>, simply to save time. Note that this
 651 entity declaration includes another entity reference within it.
 652 That's perfectly valid as long as the reference isn't directly or
 653 indirectly recursive.
 654 </para>
 655 <para>
 656 <indexterm><primary>entities</primary>
 657   <secondary>adding directly to DTD</secondary></indexterm>
 658
 659 If you find that you use a number of entities across many documents,
 660 you can add them directly to the &DTD; and avoid having to include the
 661 declarations in each document. See the discussion of
 662 <filename>dbgenent.mod</filename> in <xref linkend="app-customizing"/>.
 663 </para>
 664 </sect3>
 665 <sect3 id="s-egenent"><title>External general entities</title>
 666 <para>
 667 <indexterm><primary>external general entities</primary></indexterm>
 668 <indexterm><primary>SGML</primary>
 669   <secondary>external documents, referencing (external general entities)</secondary></indexterm>
 670 <indexterm><primary>parsers</primary>
 671   <secondary>external file text, inserting</secondary></indexterm>
 672 <indexterm><primary>files</primary>
 673   <secondary>external, referencing</secondary></indexterm>
 674
 675 With external entities, you can reference other documents from within
 676 your document. If these entities contain document text (&SGML; or
 677 &XML;), then references to them cause the parser to insert the text of
 678 the external file directly into your document (these are called parsed
 679 entities). In this way, you can use entities to divide your single,
 680 logical document into physically distinct chunks. For example, you
 681 might break your document into four chapters and store them in
 682 separate files. At the top of your document, you would include entity
 683 declarations to reference the four files:
 684 </para>
 685
 686 <screen><![CDATA[<!ENTITY ch01 SYSTEM "ch01.sgm">
 687 <!ENTITY ch02 SYSTEM "ch02.sgm">
 688 <!ENTITY ch03 SYSTEM "ch03.sgm">
 689 <!ENTITY ch04 SYSTEM "ch04.sgm">]]></screen>
 690
 691 <para>
 692 Your <sgmltag>Book</sgmltag> now consists simply of references to the
 693 entities:
 694 </para>
 695
 696 <screen>&lt;book&gt;
 697 &amp;ch01;
 698 &amp;ch02;
 699 &amp;ch03;
 700 &amp;ch04;
 701 &lt;/book&gt;</screen>
 702
 703 <para>
 704 <indexterm><primary>unparsed entities</primary></indexterm>
 705 <indexterm><primary>notations (unparsed entities)</primary></indexterm>
 706
 707 Sometimes it's useful to reference external files that don't contain
 708 document text. For example, you might want to reference an external
 709 graphic. You can do this with entities by declaring the type of data
 710 that's in the entity using a notation (these are called unparsed
 711 entities). For example, the following declaration declares the entity
 712 <literal>tree</literal> as an encapsulated PostScript image:
 713 </para>
 714
 715 <screen><![CDATA[<!ENTITY tree SYSTEM "tree.eps" NDATA EPS>]]></screen>
 716
 717 <para>
 718 <indexterm><primary>elements</primary>
 719   <secondary>entity attributes</secondary></indexterm>
 720
 721 Entities declared this way cannot be inserted directly into your
 722 document. Instead, they must be used as entity attributes to elements:
 723 </para>
 724
 725 <screen><![CDATA[<graphic entityref="tree"></graphic>]]></screen>
 726
 727 <para>
 728 Conversely, you cannot use entities declared without a notation as the
 729 value of an entity attribute.
 730 </para>
 731 </sect3>
 732 <sect3 id="s-specchar"><title>Special characters</title>
 733 <para>
 734 <indexterm><primary>markup</primary>
 735   <secondary>distinguishing from content</secondary></indexterm>
 736 <indexterm><primary>start tags</primary>
 737   <secondary>beginning</secondary></indexterm>
 738 <indexterm><primary>end tags</primary>
 739   <secondary>beginning</secondary></indexterm>
 740 In order for the parser to recognize markup in your document, it must
 741 be able to distinguish markup from content. It does this with two
 742 special characters: &ldquo;&lt;,&rdquo; which identifies the beginning
 743 of a start or end tag, and &ldquo;&amp;,&rdquo; which identifies the
 744 beginning of an entity reference.<footnote>
 745 <para>
 746 <indexterm><primary>start characters, changing</primary></indexterm>
 747 In &XML;, these characters are fixed. In &SGML;, it is possible to
 748 change the markup start characters, but we won't consider that case
 749 here. If you change the markup start characters, you know what you're
 750 doing. While we're on the subject, in &SGML;, these characters only
 751 have their special meaning if they are followed by a name character.
 752 It is, in fact, valid in an <emphasis>&SGML;</emphasis> (but not an &XML;)
 753 document to write &ldquo;O'Reilly &amp; Associates&rdquo; because the
 754 ampersand is not followed by a name character. Don't do this, however.
 755 <indexterm><primary>characters</primary>
 756   <secondary>entities</secondary>
 757     <tertiary>encoding as</tertiary></indexterm>
 758 <indexterm><primary>entities</primary>
 759   <secondary>characters</secondary></indexterm>
 760 <indexterm><primary>angle brackets</primary>
 761   <secondary>coding as entities</secondary></indexterm>
 762 </para>
 763 </footnote>
 764 If you want these characters to have their literal value, they must be
 765 encoded as entity references in your document. The entity reference
 766 <sgmltag class="genentity">lt</sgmltag> produces a left angle bracket;
 767 <sgmltag class="genentity">amp</sgmltag> produces the
 768 ampersand.<footnote>
 769 <para>
 770 <indexterm><primary>marked sections</primary>
 771   <secondary>character sequence, ending</secondary></indexterm>
 772
 773 The sequence of characters that end a marked section (see <xref linkend="s-ms"/>), such as ]]&gt; must also be encoded with at least
 774 one entity reference if it is not being used to end a marked section.
 775 For this purpose, you can use the entity reference <sgmltag class="genentity">gt</sgmltag> for the final right angle bracket.
 776 </para>
 777 </footnote>
 778 </para>
 779 <para>
 780 <indexterm><primary>parsers</primary>
 781   <secondary>entity references, interpreting</secondary></indexterm>
 782
 783 If you do not encode each of these as their respective entity
 784 references, then an &SGML; parser or application is likely to
 785 interpret them as characters introducing elements or entities (an
 786 &XML; parser will always interpret them this way); consequently, they
 787 won't appear as you intended. If you wish to cite text that contains
 788 literal ampersands and less-than signs, you need to transform these
 789 two characters into entity references before they are included in a
 790 DocBook document. The only other alternative is to incorporate text
 791 that includes them in your document through some process that avoids
 792 the parser.
 793 </para>
 794 <para>
 795 <indexterm><primary>data entities</primary></indexterm>
 796 <indexterm><primary>numeric character references</primary></indexterm>
 797
 798 In &SGML;, character entities are frequently declared using a third
 799 entity category (one that we deliberately chose to overlook), called
 800 <firstterm>data entities</firstterm>. In &XML;, these are declared using
 801 numeric character references. Numeric character references resemble
 802 entity references, but technically aren't the same. They have the form
 803 <literal>&amp;#<replaceable>999</replaceable>;</literal>, in which
 804 &ldquo;999&rdquo; is the numeric character number.
 805 </para>
 806 <para>
 807 <indexterm><primary>Unicode character set</primary>
 808   <secondary>character numbers (XML)</secondary></indexterm>
 809 <indexterm><primary>hexadecimal numeric character references (XML)</primary></indexterm>
 810
 811 In &XML;, the numeric character number is always the Unicode character
 812 number. In addition, &XML; allows hexadecimal numeric character
 813 references of the form
 814 <literal>&amp;#x<replaceable>hhhh</replaceable>;</literal>. In &SGML;, the
 815 numeric character number is a number from the document character set
 816 that's declared in the &SGML; declaration.
 817 </para>
 818 <para>
 819 <indexterm><primary>special characters, encoding as entities</primary></indexterm>
 820
 821 Character entities are also used to give a name to special characters
 822 that can't otherwise be typed or are not portable across applications
 823 and operating systems. You can then include these characters in your
 824 document by refering to their entity name. Instead of using the often
 825 obscure and inconsistent key combinations of your particular word
 826 processor to type, say, an uppercase letter U with an umlaut (&Uuml;),
 827 you type in an entity for it instead. For instance, the entity for an
 828 uppercase letter U with an umlaut has been defined as the entity
 829 <literal>Uuml</literal>, so you would type in <sgmltag class="genentity">Uuml</sgmltag> to reference it instead of the actual
 830 character. The &SGML; application that eventually processes your
 831 document for presentation will match the entity to your platform's
 832 handling of special characters in order to render it
 833 appropriately.
 834 </para>
 835 </sect3>
 836 </sect2><!--general entities-->
 837 <sect2><title>Parameter Entities</title>
 838 <para>
 839 <indexterm><primary>entities</primary>
 840   <secondary>parameter entities</secondary><see>parameter entities</see></indexterm>
 841 <indexterm><primary>parameter entities</primary></indexterm>
 842
 843 Parameter entities are only recognized in markup declarations (in the
 844 &DTD;, for example). Instead of beginning with an ampersand, they
 845 begin with a percent sign.  Parameter entities are most frequently
 846 used to customize the &DTD;. For a detailed discussion of this topic,
 847 see <xref linkend="app-customizing"/>. Following are some other uses for
 848 them.
 849 </para>
 850 <sect3 id="s-ms"><title>Marked sections</title>
 851 <para>
 852 <indexterm><primary>marked sections</primary></indexterm>
 853 <indexterm><primary>SGML</primary>
 854   <secondary>marked sections</secondary></indexterm>
 855 <indexterm><primary>XML</primary>
 856   <secondary>marked sections</secondary></indexterm>
 857
 858 You might use a parameter entity reference in an &SGML; document in a
 859 marked section. Marking sections is a mechanism for indicating that
 860 special processing should apply to a particular block of text.  Marked
 861 sections are introduced by the special sequence
 862 <literal>&lt;![<replaceable>keyword</replaceable>[</literal> and end
 863 with <literal>]]&gt;</literal>.  In &SGML;, marked sections can appear
 864 in both &DTD;s and document instances.  In &XML;, they're only allowed
 865 in the &DTD;.<footnote>
 866 <para>
 867 <indexterm><primary>CDATA</primary>
 868   <secondary>marked sections</secondary></indexterm>
 869 Actually, CDATA marked sections are allowed in an &XML; document, but
 870 the keyword cannot be a parameter entity, and it must be typed
 871 literally. See the examples on this page.
 872 </para>
 873 </footnote>
 874 </para>
 875 <para>
 876 <indexterm><primary>keywords</primary>
 877   <secondary>marked sections</secondary></indexterm>
 878 <indexterm><primary>INCLUDE keyword (marked section)</primary></indexterm>
 879 <indexterm><primary>IGNORE keyword (marked section)</primary></indexterm>
 880
 881 The most common keywords are <literal>INCLUDE</literal>, which
 882 indicates that the text in the marked section should be included in
 883 the document; <literal>IGNORE</literal>, which indicates that the text
 884 in the marked section should be ignored (it completely disappears from
 885 the parsed document); and <literal>CDATA</literal>, which indicates
 886 that all markup characters within that section should be ignored
 887 except for the closing characters <literal>]]&gt;</literal>.
 888 </para>
 889 <para>
 890 <indexterm><primary>SGML</primary>
 891   <secondary>keywords as parameter entities</secondary></indexterm>
 892 In &SGML;, these keywords can be parameter entities. For example, you
 893 might declare the following parameter entity in your document:
 894 </para>
 895
 896 <screen><![CDATA[<!ENTITY % draft "INCLUDE">]]></screen>
 897
 898 <para>
 899 Then you could put the sections of the document that are only applicable
 900 in a draft within marked sections:
 901 </para>
 902
 903 <screen>&lt;![%draft;[
 904 &lt;para>
 905 This paragraph only appears in the draft version.
 906 &lt;/para>
 907 ]]&gt;</screen>
 908
 909 <para>
 910 When you're ready to print the final version, simply change the
 911 <literal>draft</literal> parameter entity declaration:
 912 </para>
 913
 914 <screen><![CDATA[<!ENTITY % draft "IGNORE">]]></screen>
 915
 916 <para>
 917 and publish the document. None of the draft sections will appear.
 918 <indexterm startref="SGMLbasicconceptch01" class="endofrange"/>
 919 <indexterm startref="XMLbasicconceptch01" class="endofrange"/>
 920 </para>
 921 </sect3>
 922 </sect2>
 923 </sect1>
 924 <sect1 id="ch01-wheredocbook"><title>How Does DocBook Fit In?</title>
 925 <para>
 926 <indexterm><primary>DocBook DTD</primary>
 927   <secondary>history and overview</secondary></indexterm>
 928
 929 DocBook is a very popular set of tags for describing books, articles,
 930 and other prose documents, particularly technical
 931 documentation. DocBook is defined using the native &DTD; syntax of
 932 &SGML; and &XML;. Like &HTML;, DocBook is an example of a markup
 933 language defined in &SGML;/&XML;.
 934 </para>
 935 <sect2><title>A Short DocBook History</title>
 936 <para>
 937 DocBook is almost 10 years old. It began in 1991 as a joint project of
 938 HaL Computer Systems and O'Reilly. Its popularity grew, and eventually
 939 it spawned its own maintenance organization, the Davenport Group. In
 940 mid-1998, it became a Technical Committee (<acronym>TC</acronym>) of
 941 the Organization for the Advancement of Structured Information
 942 Standards (<acronym>OASIS</acronym>).
 943 </para>
 944 <sect3><title>The HaL and O'Reilly era</title>
 945 <para>
 946 <indexterm><primary>Open Software Foundation</primary></indexterm>
 947 <indexterm><primary>troff markup (UNIX documentation)</primary></indexterm>
 948 <indexterm><primary>UNIX</primary>
 949   <secondary>DocBook DTD, development</secondary></indexterm>
 950
 951 The DocBook &DTD; was originally designed and implemented by HaL
 952 Computer Systems and O'Reilly &amp; Associates around 1991. It was
 953 developed primarily to facilitate the exchange of &UNIX; documentation
 954 originally marked up in <command>troff</command>. Its design appears
 955 to have been based partly on input from &SGML; interchange projects
 956 conducted by the Unix International and Open Software Foundation
 957 consortia.
 958 </para>
 959 <para>
 960 <indexterm><primary>Davenport Group (DocBook maintenance)</primary></indexterm>
 961 When DocBook <acronym>V1.1</acronym> was published, discussion about
 962 its revision and maintenance began in earnest in the Davenport Group,
 963 a forum created by O'Reilly for computer documentation
 964 producers. Version 1.2 was influenced strongly by
 965 Novell and Digital.
 966 </para>
 967 <para>
 968 In 1994, the Davenport Group became an officially chartered entity
 969 responsible for DocBook's maintenance. DocBook
 970 <acronym>V1.2.2</acronym> was published simultaneously. The founding
 971 sponsors of this incarnation of Davenport include the following
 972 people:
 973 <itemizedlist spacing="compact">
 974 <listitem><para>Jon Bosak, Novell</para></listitem>
 975 <listitem><para>Dale Dougherty, O'Reilly &amp; Associates</para></listitem>
 976 <listitem><para>Ralph Ferris, Fujitsu <acronym>OSSI</acronym></para></listitem>
 977 <listitem><para>Dave Hollander, Hewlett-Packard</para></listitem>
 978 <listitem><para>Eve Maler, Digital Equipment Corporation</para></listitem>
 979 <listitem><para>Murray Maloney, <acronym>SCO</acronym></para></listitem>
 980 <listitem><para>Conleth O'Connell, HaL Computer Systems</para></listitem>
 981 <listitem><para>Nancy Paisner, Hitachi Computer Products</para></listitem>
 982 <listitem><para>Mike Rogers, SunSoft</para></listitem>
 983 <listitem><para>Jean Tappan, Unisys</para></listitem>
 984 </itemizedlist>
 985 </para>
 986 </sect3>
 987 <sect3><title>The Davenport era</title>
 988 <para>
 989 Under the auspices of the Davenport Group, the DocBook &DTD; began to
 990 widen its scope. It was now being used by a much wider audience, and
 991 for new purposes, such as direct authoring with &SGML;-aware tools,
 992 and publishing directly to paper. As the largest users of DocBook,
 993 Novell and Sun had a heavy influence on its design.
 994 </para>
 995 <para>
 996 <indexterm><primary>DocBook DTD</primary>
 997   <secondary>releases, rules for new versions</secondary></indexterm>
 998
 999 In order to help users manage change, the new Davenport charter established
1000 the following rules for DocBook releases:
1001 <itemizedlist>
1002 <listitem><para>Minor versions (<quote>point releases</quote> such as
1003 <acronym>V2.2</acronym>) could add to the markup model, but could not
1004 change it in a backward-incompatible way. For example, a new kind of
1005 list element could be added, but it would not be acceptable for the
1006 existing itemized-list model to start requiring two list items inside
1007 it instead of only one. Thus, any document conforming to version
1008 <replaceable>n</replaceable>.0 would also conform to
1009 <replaceable>n</replaceable>.<replaceable>m</replaceable>.</para>
1010 </listitem>
1011 <listitem><para>Major versions (such as <acronym>V3.0</acronym>) could
1012 both add to the markup model and make backward-incompatible
1013 changes. However, the changes would have to be announced in the last
1014 major release.</para>
1015 </listitem>
1016 <listitem><para>Major-version introductions must be separated by at
1017 least a year.</para>
1018 </listitem>
1019 </itemizedlist>
1020 </para>
1021 <para>
1022 <indexterm><primary>DocBook DTD</primary>
1023   <secondary>XML</secondary>
1024     <tertiary>XML-compliant version</tertiary></indexterm>
1025 <indexterm><primary>XML</primary>
1026   <secondary>DocBook version compliant with</secondary></indexterm>
1027
1028 <acronym>V3.0</acronym> was released in January 1997. After that time,
1029 although DocBook's audience continued to grow, many of the Davenport
1030 Group stalwarts became involved in the &XML; effort, and development
1031 slowed dramatically. The idea of creating an official &XML;-compliant
1032 version of DocBook was discussed, but not implemented. (For more
1033 detailed information about DocBook <acronym>V3.0</acronym> and plans
1034 for subsequent versions, see <xref linkend="app-versions"/>.)
1035 </para>
1036 <para>
1037 <indexterm><primary>OASIS</primary>
1038   <secondary>DocBook Technical Committee</secondary></indexterm>
1039
1040 The sponsors wanted to close out Davenport in an orderly way to ensure
1041 that DocBook users would be supported. It was suggested that
1042 <acronym>OASIS</acronym> become DocBook's new home. An
1043 <acronym>OASIS</acronym> DocBook Technical Committee was formed in
1044 July, 1998, with Eduardo Gutentag of Sun Microsystems as chair.
1045 </para>
1046 </sect3>
1047 <sect3>
1048 <title>The <acronym>OASIS</acronym> era</title>
1049 <para>
1050 The <ulink url="http://www.oasis-open.org/docbook/">DocBook Technical
1051 Commitee</ulink> is continuing the work started by the
1052 Davenport Group. The transition from Davenport to
1053 <acronym>OASIS</acronym> has been very smooth, in part because the
1054 core design team consists of essentially the same individuals (we all
1055 just changed hats).
1056 </para>
1057 <para>
1058 DocBook <acronym>V3.1</acronym>, published in February 1999, was the
1059 first <acronym>OASIS</acronym> release.  It integrated a number of
1060 changes that had been <quote>in the wings</quote> for some time.
1061 </para>
1062
1063 <para>In February of 2001, OASIS made DocBook SGML V4.1 and DocBook XML V4.1.2
1064 <ulink url="http://lists.oasis-open.org/archives/members/200102/msg00000.html">official
1065 OASIS Specifications</ulink>.
1066 </para>
1067
1068 <para><ulink url="http://www.oasis-open.org/docbook/specs/cs-docbook-docbook-4.2.html">Version 4.2</ulink> of the DocBook &DTD;, for both &SGML; and &XML;, was
1069 released in July 2002.</para>
1070
1071 <para>
1072 The committee continues new DocBook development to ensure
1073 that the &DTD; continues to meet the needs of its users.  Forthcoming
1074 and experimental work includes:
1075 </para>
1076
1077 <itemizedlist>
1078 <listitem><para>A V5.0 DTD projected for release no earlier than the end of
1079 2002.
1080 </para></listitem>
1081 <listitem><para>Experimental
1082 <ulink url="http://www.oasis-open.org/committees/relax-ng/">RELAX NG</ulink> schemas
1083 <ulink url="http://www.oasis-open.org/docbook/relaxng">available</ulink>.</para></listitem>
1084 <listitem><para>Experimental
1085 <ulink url="http://www.w3.org/XML/Schema">W3C XML Schema</ulink> versions
1086 <ulink url="http://www.oasis-open.org/docbook/xmlschema/">available</ulink>.</para></listitem>
1087 <listitem><para>Experimental
1088 <ulink url="http://www.xml.gr.jp/relax/">RELAX</ulink> schemas
1089 <ulink url="http://www.oasis-open.org/docbook/relax/">available</ulink>.</para></listitem>
1090 <listitem><para>Experimental
1091 <ulink url="http://www.thaiopensource.com/trex/">TREX</ulink> schemas
1092 <ulink url="http://www.oasis-open.org/docbook/trex/">available</ulink>.</para></listitem>
1093 </itemizedlist>
1094
1095 <indexterm startref="XMLgetstart" class="endofrange"/>
1096 <indexterm startref="getstartSGML" class="endofrange"/>
1097
1098 </sect3>
1099 </sect2>
1100 </sect1>
1101 </chapter>
1102
1103 <!--
1104 Local Variables:
1105 mode: xml
1106 sgml-parent-document: ("book.xml" "chapter")
1107 End:
1108 -->