doc/misc/nxml-mode.texi

   1 \input texinfo @c -*- texinfo -*-
   2 @c %**start of header
   3 @setfilename ../../info/nxml-mode
   4 @settitle nXML Mode
   5 @c %**end of header
   6
   7 @copying
   8 This manual documents nXML mode, an Emacs major mode for editing
   9 XML with RELAX NG support.
  10
  11 Copyright @copyright{} 2007--2013 Free Software Foundation, Inc.
  12
  13 @quotation
  14 Permission is granted to copy, distribute and/or modify this document
  15 under the terms of the GNU Free Documentation License, Version 1.3 or
  16 any later version published by the Free Software Foundation; with no
  17 Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
  18 and with the Back-Cover Texts as in (a) below.  A copy of the license
  19 is included in the section entitled ``GNU Free Documentation License''.
  20
  21 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
  22 modify this GNU manual.''
  23 @end quotation
  24 @end copying
  25
  26 @dircategory Emacs editing modes
  27 @direntry
  28 * nXML Mode: (nxml-mode).       XML editing mode with RELAX NG support.
  29 @end direntry
  30
  31 @node Top
  32 @top nXML Mode
  33
  34 @insertcopying
  35
  36 This manual is not yet complete.
  37
  38 @menu
  39 * Introduction::
  40 * Completion::
  41 * Inserting end-tags::
  42 * Paragraphs::
  43 * Outlining::
  44 * Locating a schema::
  45 * DTDs::
  46 * Limitations::
  47 * GNU Free Documentation License::  The license for this documentation.
  48 @end menu
  49
  50 @node Introduction
  51 @chapter Introduction
  52
  53 nXML mode is an Emacs major-mode for editing XML documents.  It supports
  54 editing well-formed XML documents, and provides schema-sensitive editing
  55 using RELAX NG Compact Syntax.  To get started, visit a file containing an
  56 XML document, and, if necessary, use @kbd{M-x nxml-mode} to switch to nXML
  57 mode.  By default, @code{auto-mode-alist} and @code{magic-fallback-alist}
  58 put buffers in nXML mode if they have recognizable XML content or file
  59 extensions.  You may wish to customize the settings, for example to
  60 recognize different file extensions.
  61
  62 Once in nXML mode, you can type @kbd{C-h m} for basic information on the
  63 mode.
  64
  65 The @file{etc/nxml} directory in the Emacs distribution contains some data
  66 files used by nXML mode, and includes two files (@file{test-valid.xml} and
  67 @file{test-invalid.xml}) that provide examples of valid and invalid XML
  68 documents.
  69
  70 To get validation and schema-sensitive editing, you need a RELAX NG Compact
  71 Syntax (RNC) schema for your document (@pxref{Locating a schema}).  The
  72 @file{etc/schema} directory includes some schemas for popular document
  73 types.  See @url{http://relaxng.org/} for more information on RELAX NG@.
  74 You can use the @samp{Trang} program from
  75 @url{http://www.thaiopensource.com/relaxng/trang.html} to
  76 automatically create RNC schemas.  This program can:
  77
  78 @itemize @bullet
  79 @item
  80 infer an RNC schema from an instance document;
  81 @item
  82 convert a DTD to an RNC schema;
  83 @item
  84 convert a RELAX NG XML syntax schema to an RNC schema.
  85 @end itemize
  86
  87 @noindent To convert a RELAX NG XML syntax (@samp{.rng}) schema to a RNC
  88 one, you can also use the XSLT stylesheet from
  89 @url{https://github.com/oleg-pavliv/emacs/tree/master/xsl}.
  90 @ignore
  91 @c Original location, now defunct.
  92 @url{http://www.pantor.com/download.html}.
  93 @end ignore
  94
  95 To convert a W3C XML Schema to an RNC schema, you need first to convert it
  96 to RELAX NG XML syntax using the RELAX NG converter tool @code{rngconv}
  97 (built on top of MSV).  See @url{https://github.com/kohsuke/msv}
  98 and @url{https://msv.dev.java.net/}.
  99
 100 For historical discussions only, see the mailing list archives at
 101 @url{http://groups.yahoo.com/group/emacs-nxml-mode/}.  Please make all new
 102 discussions on the @samp{help-gnu-emacs} and @samp{emacs-devel} mailing
 103 lists.  Report any bugs with @kbd{M-x report-emacs-bug}.
 104
 105
 106 @node Completion
 107 @chapter Completion
 108
 109 Apart from real-time validation, the most important feature that nXML
 110 mode provides for assisting in document creation is "completion".
 111 Completion assists the user in inserting characters at point, based on
 112 knowledge of the schema and on the contents of the buffer before
 113 point.
 114
 115 nXML mode adapts the standard GNU Emacs command for completion in a
 116 buffer: @code{completion-at-point}, which is bound to @kbd{C-M-i} and
 117 @kbd{M-@key{TAB}}.  Note that many window systems and window managers
 118 use @kbd{M-@key{TAB}} themselves (typically for switching between
 119 windows) and do not pass it to applications.  In that case, you should
 120 type @kbd{C-M-i} or @kbd{@key{ESC} @key{TAB}} for completion, or bind
 121 @code{completion-at-point} to a key that is convenient for you.  In
 122 the following, I will assume that you type @kbd{C-M-i}.
 123
 124 nXML mode completion works by examining the symbol preceding point.
 125 This is the symbol to be completed. The symbol to be completed may be
 126 the empty. Completion considers what symbols starting with the symbol
 127 to be completed would be valid replacements for the symbol to be
 128 completed, given the schema and the contents of the buffer before
 129 point.  These symbols are the possible completions.  An example may
 130 make this clearer.  Suppose the buffer looks like this (where @point{}
 131 indicates point):
 132
 133 @example
 134 <html xmlns="http://www.w3.org/1999/xhtml">
 135 <h@point{}
 136 @end example
 137
 138 @noindent
 139 and the schema is XHTML@.  In this context, the symbol to be completed
 140 is @samp{h}.  The possible completions consist of just
 141 @samp{head}.  Another example, is
 142
 143 @example
 144 <html xmlns="http://www.w3.org/1999/xhtml">
 145 <head>
 146 <@point{}
 147 @end example
 148
 149 @noindent
 150 In this case, the symbol to be completed is empty, and the possible
 151 completions are @samp{base}, @samp{isindex},
 152 @samp{link}, @samp{meta}, @samp{script},
 153 @samp{style}, @samp{title}.  Another example is:
 154
 155 @example
 156 <html xmlns="@point{}
 157 @end example
 158
 159 @noindent
 160 In this case, the symbol to be completed is empty, and the possible
 161 completions are just @samp{http://www.w3.org/1999/xhtml}.
 162
 163 When you type @kbd{C-M-i}, what happens depends
 164 on what the set of possible completions are.
 165
 166 @itemize @bullet
 167 @item
 168 If the set of completions is empty, nothing
 169 happens.
 170 @item
 171 If there is one possible completion, then that completion is
 172 inserted, together with any following characters that are
 173 required. For example, in this case:
 174
 175 @example
 176 <html xmlns="http://www.w3.org/1999/xhtml">
 177 <@point{}
 178 @end example
 179
 180 @noindent
 181 @kbd{C-M-i} will yield
 182
 183 @example
 184 <html xmlns="http://www.w3.org/1999/xhtml">
 185 <head@point{}
 186 @end example
 187 @item
 188 If there is more than one possible completion, but all
 189 possible completions share a common non-empty prefix, then that prefix
 190 is inserted. For example, suppose the buffer is:
 191
 192 @example
 193 <html x@point{}
 194 @end example
 195
 196 @noindent
 197 The symbol to be completed is @samp{x}. The possible completions are
 198 @samp{xmlns} and @samp{xml:lang}.  These share a common prefix of
 199 @samp{xml}.  Thus, @kbd{C-M-i} will yield:
 200
 201 @example
 202 <html xml@point{}
 203 @end example
 204
 205 @noindent
 206 Typically, you would do @kbd{C-M-i} again, which would have the result
 207 described in the next item.
 208 @item
 209 If there is more than one possible completion, but the
 210 possible completions do not share a non-empty prefix, then Emacs will
 211 prompt you to input the symbol in the minibuffer, initializing the
 212 minibuffer with the symbol to be completed, and popping up a buffer
 213 showing the possible completions.  You can now input the symbol to be
 214 inserted.  The symbol you input will be inserted in the buffer instead
 215 of the symbol to be completed.  Emacs will then insert any required
 216 characters after the symbol.  For example, if it contains:
 217
 218 @example
 219 <html xml@point{}
 220 @end example
 221
 222 @noindent
 223 Emacs will prompt you in the minibuffer with
 224
 225 @example
 226 Attribute: xml@point{}
 227 @end example
 228
 229 @noindent
 230 and the buffer showing possible completions will contain
 231
 232 @example
 233 Possible completions are:
 234 xml:lang                           xmlns
 235 @end example
 236
 237 @noindent
 238 If you input @kbd{xmlns}, the result will be:
 239
 240 @example
 241 <html xmlns="@point{}
 242 @end example
 243
 244 @noindent
 245 (If you do @kbd{C-M-i} again, the namespace URI will be
 246 inserted. Should that happen automatically?)
 247 @end itemize
 248
 249 @node Inserting end-tags
 250 @chapter Inserting end-tags
 251
 252 The main redundancy in XML syntax is end-tags.  nXML mode provides
 253 several ways to make it easier to enter end-tags.  You can use all of
 254 these without a schema.
 255
 256 You can use @kbd{C-M-i} after @samp{</} to complete the rest of the
 257 end-tag.
 258
 259 @kbd{C-c C-f} inserts an end-tag for the element containing
 260 point. This command is useful when you want to input the start-tag,
 261 then input the content and finally input the end-tag. The @samp{f}
 262 is mnemonic for finish.
 263
 264 If you want to keep tags balanced and input the end-tag at the
 265 same time as the start-tag, before inputting the content, then you can
 266 use @kbd{C-c C-i}. This inserts a @samp{>}, then inserts
 267 the end-tag and leaves point before the end-tag.  @kbd{C-c C-b}
 268 is similar but more convenient for block-level elements: it puts the
 269 start-tag, point and the end-tag on successive lines, appropriately
 270 indented. The @samp{i} is mnemonic for inline and the
 271 @samp{b} is mnemonic for block.
 272
 273 Finally, you can customize nXML mode so that @kbd{/} automatically
 274 inserts the rest of the end-tag when it occurs after @samp{<}, by
 275 doing
 276
 277 @display
 278 @kbd{M-x customize-variable @key{RET} nxml-slash-auto-complete-flag @key{RET}}
 279 @end display
 280
 281 @noindent
 282 and then following the instructions in the displayed buffer.
 283
 284 @node Paragraphs
 285 @chapter Paragraphs
 286
 287 Emacs has several commands that operate on paragraphs, most
 288 notably @kbd{M-q}. nXML mode redefines these to work in a way
 289 that is useful for XML@.  The exact rules that are used to find the
 290 beginning and end of a paragraph are complicated; they are designed
 291 mainly to ensure that @kbd{M-q} does the right thing.
 292
 293 A paragraph consists of one or more complete, consecutive lines.
 294 A group of lines is not considered a paragraph unless it contains some
 295 non-whitespace characters between tags or inside comments.  A blank
 296 line separates paragraphs.  A single tag on a line by itself also
 297 separates paragraphs.  More precisely, if one tag together with any
 298 leading and trailing whitespace completely occupy one or more lines,
 299 then those lines will not be included in any paragraph.
 300
 301 A start-tag at the beginning of the line (possibly indented) may
 302 be treated as starting a paragraph.  Similarly, an end-tag at the end
 303 of the line may be treated as ending a paragraph. The following rules
 304 are used to determine whether such a tag is in fact treated as a
 305 paragraph boundary:
 306
 307 @itemize @bullet
 308 @item
 309 If the schema does not allow text at that point, then it
 310 is a paragraph boundary.
 311 @item
 312 If the end-tag corresponding to the start-tag is not at
 313 the end of its line, or the start-tag corresponding to the end-tag is
 314 not at the beginning of its line, then it is not a paragraph
 315 boundary. For example, in
 316
 317 @example
 318 <p>This is a paragraph with an
 319 <emph>emphasized</emph> phrase.
 320 @end example
 321
 322 @noindent
 323 the @samp{<emph>} start-tag would not be considered as
 324 starting a paragraph, because its corresponding end-tag is not at the
 325 end of the line.
 326 @item
 327 If there is text that is a sibling in element tree, then
 328 it is not a paragraph boundary.  For example, in
 329
 330 @example
 331 <p>This is a paragraph with an
 332 <emph>emphasized phrase that takes one source line</emph>
 333 @end example
 334
 335 @noindent
 336 the @samp{<emph>} start-tag would not be considered as
 337 starting a paragraph, even though its end-tag is at the end of its
 338 line, because there the text @samp{This is a paragraph with an}
 339 is a sibling of the @samp{emph} element.
 340 @item
 341 Otherwise, it is a paragraph boundary.
 342 @end itemize
 343
 344 @node Outlining
 345 @chapter Outlining
 346
 347 nXML mode allows you to display all or part of a buffer as an
 348 outline, in a similar way to Emacs's outline mode.  An outline in nXML
 349 mode is based on recognizing two kinds of element: sections and
 350 headings.  There is one heading for every section and one section for
 351 every heading.  A section contains its heading as or within its first
 352 child element.  A section also contains its subordinate sections (its
 353 subsections).  The text content of a section consists of anything in a
 354 section that is neither a subsection nor a heading.
 355
 356 Note that this is a different model from that used by XHTML@.
 357 nXML mode's outline support will not be useful for XHTML unless you
 358 adopt a convention of adding a @code{div} to enclose each
 359 section, rather than having sections implicitly delimited by different
 360 @code{h@var{n}} elements.  This limitation may be removed
 361 in a future version.
 362
 363 The variable @code{nxml-section-element-name-regexp} gives
 364 a regexp for the local names (i.e., the part of the name following any
 365 prefix) of section elements. The variable
 366 @code{nxml-heading-element-name-regexp} gives a regexp for the
 367 local names of heading elements. For an element to be recognized
 368 as a section
 369
 370 @itemize @bullet
 371 @item
 372 its start-tag must occur at the beginning of a line
 373 (possibly indented);
 374 @item
 375 its local name must match
 376 @code{nxml-section-element-name-regexp};
 377 @item
 378 either its first child element or a descendant of that
 379 first child element must have a local name that matches
 380 @code{nxml-heading-element-name-regexp}; the first such element
 381 is treated as the section's heading.
 382 @end itemize
 383
 384 @noindent
 385 You can customize these variables using @kbd{M-x
 386 customize-variable}.
 387
 388 There are three possible outline states for a section:
 389
 390 @itemize @bullet
 391 @item
 392 normal, showing everything, including its heading, text
 393 content and subsections; each subsection is displayed according to the
 394 state of that subsection;
 395 @item
 396 showing just its heading, with both its text content and
 397 its subsections hidden; all subsections are hidden regardless of their
 398 state;
 399 @item
 400 showing its heading and its subsections, with its text
 401 content hidden; each subsection is displayed according to the state of
 402 that subsection.
 403 @end itemize
 404
 405 In the last two states, where the text content is hidden, the
 406 heading is displayed specially, in an abbreviated form. An element
 407 like this:
 408
 409 @example
 410 <section>
 411 <title>Food</title>
 412 <para>There are many kinds of food.</para>
 413 </section>
 414 @end example
 415
 416 @noindent
 417 would be displayed on a single line like this:
 418
 419 @example
 420 <-section>Food...</>
 421 @end example
 422
 423 @noindent
 424 If there are hidden subsections, then a @code{+} will be used
 425 instead of a @code{-} like this:
 426
 427 @example
 428 <+section>Food...</>
 429 @end example
 430
 431 @noindent
 432 If there are non-hidden subsections, then the section will instead be
 433 displayed like this:
 434
 435 @example
 436 <-section>Food...
 437   <-section>Delicious Food...</>
 438   <-section>Distasteful Food...</>
 439 </-section>
 440 @end example
 441
 442 @noindent
 443 The heading is always displayed with an indent that corresponds to its
 444 depth in the outline, even it is not actually indented in the buffer.
 445 The variable @code{nxml-outline-child-indent} controls how much
 446 a subheading is indented with respect to its parent heading when the
 447 heading is being displayed specially.
 448
 449 Commands to change the outline state of sections are bound to
 450 key sequences that start with @kbd{C-c C-o} (@kbd{o} is
 451 mnemonic for outline).  The third and final key has been chosen to be
 452 consistent with outline mode.  In the following descriptions
 453 current section means the section containing point, or, more precisely,
 454 the innermost section containing the character immediately following
 455 point.
 456
 457 @itemize @bullet
 458 @item
 459 @kbd{C-c C-o C-a} shows all sections in the buffer
 460 normally.
 461 @item
 462 @kbd{C-c C-o C-t} hides the text content
 463 of all sections in the buffer.
 464 @item
 465 @kbd{C-c C-o C-c} hides the text content
 466 of the current section.
 467 @item
 468 @kbd{C-c C-o C-e} shows the text content
 469 of the current section.
 470 @item
 471 @kbd{C-c C-o C-d} hides the text content
 472 and subsections of the current section.
 473 @item
 474 @kbd{C-c C-o C-s} shows the current section
 475 and all its direct and indirect subsections normally.
 476 @item
 477 @kbd{C-c C-o C-k} shows the headings of the
 478 direct and indirect subsections of the current section.
 479 @item
 480 @kbd{C-c C-o C-l} hides the text content of the
 481 current section and of its direct and indirect
 482 subsections.
 483 @item
 484 @kbd{C-c C-o C-i} shows the headings of the
 485 direct subsections of the current section.
 486 @item
 487 @kbd{C-c C-o C-o} hides as much as possible without
 488 hiding the current section's text content; the headings of ancestor
 489 sections of the current section and their child section sections will
 490 not be hidden.
 491 @end itemize
 492
 493 When a heading is displayed specially, you can use
 494 @key{RET} in that heading to show the text content of the section
 495 in the same way as @kbd{C-c C-o C-e}.
 496
 497 You can also use the mouse to change the outline state:
 498 @kbd{S-mouse-2} hides the text content of a section in the same
 499 way as@kbd{C-c C-o C-c}; @kbd{mouse-2} on a specially
 500 displayed heading shows the text content of the section in the same
 501 way as @kbd{C-c C-o C-e}; @kbd{mouse-1} on a specially
 502 displayed start-tag toggles the display of subheadings on and
 503 off.
 504
 505 The outline state for each section is stored with the first
 506 character of the section (as a text property). Every command that
 507 changes the outline state of any section updates the display of the
 508 buffer so that each section is displayed correctly according to its
 509 outline state.  If the section structure is subsequently changed, then
 510 it is possible for the display to no longer correctly reflect the
 511 stored outline state. @kbd{C-c C-o C-r} can be used to refresh
 512 the display so it is correct again.
 513
 514 @node Locating a schema
 515 @chapter Locating a schema
 516
 517 nXML mode has a configurable set of rules to locate a schema for
 518 the file being edited.  The rules are contained in one or more schema
 519 locating files, which are XML documents.
 520
 521 The variable @samp{rng-schema-locating-files} specifies
 522 the list of the file-names of schema locating files that nXML mode
 523 should use.  The order of the list is significant: when file
 524 @var{x} occurs in the list before file @var{y} then rules
 525 from file @var{x} have precedence over rules from file
 526 @var{y}.  A filename specified in
 527 @samp{rng-schema-locating-files} may be relative. If so, it will
 528 be resolved relative to the document for which a schema is being
 529 located. It is not an error if relative file-names in
 530 @samp{rng-schema-locating-files} do not exist. You can use
 531 @kbd{M-x customize-variable @key{RET} rng-schema-locating-files
 532 @key{RET}} to customize the list of schema locating
 533 files.
 534
 535 By default, @samp{rng-schema-locating-files} list has two
 536 members: @samp{schemas.xml}, and
 537 @samp{@var{dist-dir}/schema/schemas.xml} where
 538 @samp{@var{dist-dir}} is the directory containing the nXML
 539 distribution. The first member will cause nXML mode to use a file
 540 @samp{schemas.xml} in the same directory as the document being
 541 edited if such a file exist.  The second member contains rules for the
 542 schemas that are included with the nXML distribution.
 543
 544 @menu
 545 * Commands for locating a schema::
 546 * Schema locating files::
 547 @end menu
 548
 549 @node Commands for locating a schema
 550 @section Commands for locating a schema
 551
 552 The command @kbd{C-c C-s C-w} will tell you what schema
 553 is currently being used.
 554
 555 The rules for locating a schema are applied automatically when
 556 you visit a file in nXML mode. However, if you have just created a new
 557 file and the schema cannot be inferred from the file-name, then this
 558 will not locate the right schema.  In this case, you should insert the
 559 start-tag of the root element and then use the command @kbd{C-c C-s
 560 C-a}, which reapplies the rules based on the current content of
 561 the document.  It is usually not necessary to insert the complete
 562 start-tag; often just @samp{<@var{name}} is
 563 enough.
 564
 565 If you want to use a schema that has not yet been added to the
 566 schema locating files, you can use the command @kbd{C-c C-s C-f}
 567 to manually select the file containing the schema for the document in
 568 current buffer.  Emacs will read the file-name of the schema from the
 569 minibuffer. After reading the file-name, Emacs will ask whether you
 570 wish to add a rule to a schema locating file that persistently
 571 associates the document with the selected schema.  The rule will be
 572 added to the first file in the list specified
 573 @samp{rng-schema-locating-files}; it will create the file if
 574 necessary, but will not create a directory. If the variable
 575 @samp{rng-schema-locating-files} has not been customized, this
 576 means that the rule will be added to the file @samp{schemas.xml}
 577 in the same directory as the document being edited.
 578
 579 The command @kbd{C-c C-s C-t} allows you to select a schema by
 580 specifying an identifier for the type of the document.  The schema
 581 locating files determine the available type identifiers and what
 582 schema is used for each type identifier. This is useful when it is
 583 impossible to infer the right schema from either the file-name or the
 584 content of the document, even though the schema is already in the
 585 schema locating file.  A situation in which this can occur is when
 586 there are multiple variants of a schema where all valid documents have
 587 the same document element.  For example, XHTML has Strict and
 588 Transitional variants.  In a situation like this, a schema locating file
 589 can define a type identifier for each variant. As with @kbd{C-c
 590 C-s C-f}, Emacs will ask whether you wish to add a rule to a schema
 591 locating file that persistently associates the document with the
 592 specified type identifier.
 593
 594 The command @kbd{C-c C-s C-l} adds a rule to a schema
 595 locating file that persistently associates the document with
 596 the schema that is currently being used.
 597
 598 @node Schema locating files
 599 @section Schema locating files
 600
 601 Each schema locating file specifies a list of rules.  The rules
 602 from each file are appended in order. To locate a schema each rule is
 603 applied in turn until a rule matches.  The first matching rule is then
 604 used to determine the schema.
 605
 606 Schema locating files are designed to be useful for other
 607 applications that need to locate a schema for a document. In fact,
 608 there is nothing specific to locating schemas in the design; it could
 609 equally well be used for locating a stylesheet.
 610
 611 @menu
 612 * Schema locating file syntax basics::
 613 * Using the document's URI to locate a schema::
 614 * Using the document element to locate a schema::
 615 * Using type identifiers in schema locating files::
 616 * Using multiple schema locating files::
 617 @end menu
 618
 619 @node Schema locating file syntax basics
 620 @subsection Schema locating file syntax basics
 621
 622 There is a schema for schema locating files in the file
 623 @samp{locate.rnc} in the schema directory.  Schema locating
 624 files must be valid with respect to this schema.
 625
 626 The document element of a schema locating file must be
 627 @samp{locatingRules} and the namespace URI must be
 628 @samp{http://thaiopensource.com/ns/locating-rules/1.0}.  The
 629 children of the document element specify rules. The order of the
 630 children is the same as the order of the rules.  Here's a complete
 631 example of a schema locating file:
 632
 633 @example
 634 <?xml version="1.0"?>
 635 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 636   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 637   <documentElement localName="book" uri="docbook.rnc"/>
 638 </locatingRules>
 639 @end example
 640
 641 @noindent
 642 This says to use the schema @samp{xhtml.rnc} for a document with
 643 namespace @samp{http://www.w3.org/1999/xhtml}, and to use the
 644 schema @samp{docbook.rnc} for a document whose local name is
 645 @samp{book}.  If the document element had both a namespace URI
 646 of @samp{http://www.w3.org/1999/xhtml} and a local name of
 647 @samp{book}, then the matching rule that comes first will be
 648 used and so the schema @samp{xhtml.rnc} would be used.  There is
 649 no precedence between different types of rule; the first matching rule
 650 of any type is used.
 651
 652 As usual with XML-related technologies, resources are identified
 653 by URIs.  The @samp{uri} attribute identifies the schema by
 654 specifying the URI@.  The URI may be relative.  If so, it is resolved
 655 relative to the URI of the schema locating file that contains
 656 attribute. This means that if the value of @samp{uri} attribute
 657 does not contain a @samp{/}, then it will refer to a filename in
 658 the same directory as the schema locating file.
 659
 660 @node Using the document's URI to locate a schema
 661 @subsection Using the document's URI to locate a schema
 662
 663 A @samp{uri} rule locates a schema based on the URI of the
 664 document.  The @samp{uri} attribute specifies the URI of the
 665 schema.  The @samp{resource} attribute can be used to specify
 666 the schema for a particular document.  For example,
 667
 668 @example
 669 <uri resource="spec.xml" uri="docbook.rnc"/>
 670 @end example
 671
 672 @noindent
 673 specifies that the schema for @samp{spec.xml} is
 674 @samp{docbook.rnc}.
 675
 676 The @samp{pattern} attribute can be used instead of the
 677 @samp{resource} attribute to specify the schema for any document
 678 whose URI matches a pattern.  The pattern has the same syntax as an
 679 absolute or relative URI except that the path component of the URI can
 680 use a @samp{*} character to stand for zero or more characters
 681 within a path segment (i.e., any character other @samp{/}).
 682 Typically, the URI pattern looks like a relative URI, but, whereas a
 683 relative URI in the @samp{resource} attribute is resolved into a
 684 particular absolute URI using the base URI of the schema locating
 685 file, a relative URI pattern matches if it matches some number of
 686 complete path segments of the document's URI ending with the last path
 687 segment of the document's URI@. For example,
 688
 689 @example
 690 <uri pattern="*.xsl" uri="xslt.rnc"/>
 691 @end example
 692
 693 @noindent
 694 specifies that the schema for documents with a URI whose path ends
 695 with @samp{.xsl} is @samp{xslt.rnc}.
 696
 697 A @samp{transformURI} rule locates a schema by
 698 transforming the URI of the document. The @samp{fromPattern}
 699 attribute specifies a URI pattern with the same meaning as the
 700 @samp{pattern} attribute of the @samp{uri} element.  The
 701 @samp{toPattern} attribute is a URI pattern that is used to
 702 generate the URI of the schema.  Each @samp{*} in the
 703 @samp{toPattern} is replaced by the string that matched the
 704 corresponding @samp{*} in the @samp{fromPattern}.  The
 705 resulting string is appended to the initial part of the document's URI
 706 that was not explicitly matched by the @samp{fromPattern}.  The
 707 rule matches only if the transformed URI identifies an existing
 708 resource.  For example, the rule
 709
 710 @example
 711 <transformURI fromPattern="*.xml" toPattern="*.rnc"/>
 712 @end example
 713
 714 @noindent
 715 would transform the URI @samp{file:///home/jjc/docs/spec.xml}
 716 into the URI @samp{file:///home/jjc/docs/spec.rnc}.  Thus, this
 717 rule specifies that to locate a schema for a document
 718 @samp{@var{foo}.xml}, Emacs should test whether a file
 719 @samp{@var{foo}.rnc} exists in the same directory as
 720 @samp{@var{foo}.xml}, and, if so, should use it as the
 721 schema.
 722
 723 @node Using the document element to locate a schema
 724 @subsection Using the document element to locate a schema
 725
 726 A @samp{documentElement} rule locates a schema based on
 727 the local name and prefix of the document element. For example, a rule
 728
 729 @example
 730 <documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
 731 @end example
 732
 733 @noindent
 734 specifies that when the name of the document element is
 735 @samp{xsl:stylesheet}, then @samp{xslt.rnc} should be used
 736 as the schema. Either the @samp{prefix} or
 737 @samp{localName} attribute may be omitted to allow any prefix or
 738 local name.
 739
 740 A @samp{namespace} rule locates a schema based on the
 741 namespace URI of the document element. For example, a rule
 742
 743 @example
 744 <namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
 745 @end example
 746
 747 @noindent
 748 specifies that when the namespace URI of the document is
 749 @samp{http://www.w3.org/1999/XSL/Transform}, then
 750 @samp{xslt.rnc} should be used as the schema.
 751
 752 @node Using type identifiers in schema locating files
 753 @subsection Using type identifiers in schema locating files
 754
 755 Type identifiers allow a level of indirection in locating the
 756 schema for a document.  Instead of associating the document directly
 757 with a schema URI, the document is associated with a type identifier,
 758 which is in turn associated with a schema URI@. nXML mode does not
 759 constrain the format of type identifiers.  They can be simply strings
 760 without any formal structure or they can be public identifiers or
 761 URIs.  Note that these type identifiers have nothing to do with the
 762 DOCTYPE declaration.  When comparing type identifiers, whitespace is
 763 normalized in the same way as with the @samp{xsd:token}
 764 datatype: leading and trailing whitespace is stripped; other sequences
 765 of whitespace are normalized to a single space character.
 766
 767 Each of the rules described in previous sections that uses a
 768 @samp{uri} attribute to specify a schema, can instead use a
 769 @samp{typeId} attribute to specify a type identifier.  The type
 770 identifier can be associated with a URI using a @samp{typeId}
 771 element. For example,
 772
 773 @example
 774 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 775   <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/>
 776   <typeId id="XHTML" typeId="XHTML Strict"/>
 777   <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/>
 778   <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/>
 779 </locatingRules>
 780 @end example
 781
 782 @noindent
 783 declares three type identifiers @samp{XHTML} (representing the
 784 default variant of XHTML to be used), @samp{XHTML Strict} and
 785 @samp{XHTML Transitional}.  Such a schema locating file would
 786 use @samp{xhtml-strict.rnc} for a document whose namespace is
 787 @samp{http://www.w3.org/1999/xhtml}.  But it is considerably
 788 more flexible than a schema locating file that simply specified
 789
 790 @example
 791 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
 792 @end example
 793
 794 @noindent
 795 A user can easily use @kbd{C-c C-s C-t} to select between XHTML
 796 Strict and XHTML Transitional. Also, a user can easily add a catalog
 797
 798 @example
 799 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 800   <typeId id="XHTML" typeId="XHTML Transitional"/>
 801 </locatingRules>
 802 @end example
 803
 804 @noindent
 805 that makes the default variant of XHTML be XHTML Transitional.
 806
 807 @node Using multiple schema locating files
 808 @subsection Using multiple schema locating files
 809
 810 The @samp{include} element includes rules from another
 811 schema locating file.  The behavior is exactly as if the rules from
 812 that file were included in place of the @samp{include} element.
 813 Relative URIs are resolved into absolute URIs before the inclusion is
 814 performed. For example,
 815
 816 @example
 817 <include rules="../rules.xml"/>
 818 @end example
 819
 820 @noindent
 821 includes the rules from @samp{rules.xml}.
 822
 823 The process of locating a schema takes as input a list of schema
 824 locating files.  The rules in all these files and in the files they
 825 include are resolved into a single list of rules, which are applied
 826 strictly in order.  Sometimes this order is not what is needed.
 827 For example, suppose you have two schema locating files, a private
 828 file
 829
 830 @example
 831 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 832   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 833 </locatingRules>
 834 @end example
 835
 836 @noindent
 837 followed by a public file
 838
 839 @example
 840 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 841   <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
 842   <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/>
 843 </locatingRules>
 844 @end example
 845
 846 @noindent
 847 The effect of these two files is that the XHTML @samp{namespace}
 848 rule takes precedence over the @samp{transformURI} rule, which
 849 is almost certainly not what is needed.  This can be solved by adding
 850 an @samp{applyFollowingRules} to the private file.
 851
 852 @example
 853 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 854   <applyFollowingRules ruleType="transformURI"/>
 855   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 856 </locatingRules>
 857 @end example
 858
 859 @node DTDs
 860 @chapter DTDs
 861
 862 nXML mode is designed to support the creation of standalone XML
 863 documents that do not depend on a DTD@.  Although it is common practice
 864 to insert a DOCTYPE declaration referencing an external DTD, this has
 865 undesirable side-effects.  It means that the document is no longer
 866 self-contained. It also means that different XML parsers may interpret
 867 the document in different ways, since the XML Recommendation does not
 868 require XML parsers to read the DTD@.  With DTDs, it was impractical to
 869 get validation without using an external DTD or reference to an
 870 parameter entity.  With RELAX NG and other schema languages, you can
 871 simultaneously get the benefits of validation and standalone XML
 872 documents.  Therefore, I recommend that you do not reference an
 873 external DOCTYPE in your XML documents.
 874
 875 One problem is entities for characters. Typically, as well as
 876 providing validation, DTDs also provide a set of character entities
 877 for documents to use. Schemas cannot provide this functionality,
 878 because schema validation happens after XML parsing.  The recommended
 879 solution is to either use the Unicode characters directly, or, if this
 880 is impractical, use character references.  nXML mode supports this by
 881 providing commands for entering characters and character references
 882 using the Unicode names, and can display the glyph corresponding to a
 883 character reference.
 884
 885 @node Limitations
 886 @chapter Limitations
 887
 888 nXML mode has some limitations:
 889
 890 @itemize @bullet
 891 @item
 892 DTD support is limited.  Internal parsed general entities declared
 893 in the internal subset are supported provided they do not contain
 894 elements. Other usage of DTDs is ignored.
 895 @item
 896 The restrictions on RELAX NG schemas in section 7 of the RELAX NG
 897 specification are not enforced.
 898 @end itemize
 899
 900 @node GNU Free Documentation License
 901 @appendix GNU Free Documentation License
 902 @include doclicense.texi
 903
 904 @bye