doc/misc/nxml-mode.texi

   1 \input texinfo @c -*- texinfo -*-
   2 @c %**start of header
   3 @setfilename ../../info/nxml-mode
   4 @settitle nXML Mode
   5 @c %**end of header
   6
   7 @copying
   8 This manual documents nXML mode, an Emacs major mode for editing
   9 XML with RELAX NG support.
  10
  11 Copyright @copyright{} 2007-2012 Free Software Foundation, Inc.
  12
  13 @quotation
  14 Permission is granted to copy, distribute and/or modify this document
  15 under the terms of the GNU Free Documentation License, Version 1.3 or
  16 any later version published by the Free Software Foundation; with no
  17 Invariant Sections, with the Front-Cover texts being ``A GNU
  18 Manual,'' and with the Back-Cover Texts as in (a) below.  A copy of the
  19 license is included in the section entitled ``GNU Free Documentation
  20 License'' in the Emacs manual.
  21
  22 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
  23 modify this GNU manual.  Buying copies from the FSF supports it in
  24 developing GNU and promoting software freedom.''
  25
  26 This document is part of a collection distributed under the GNU Free
  27 Documentation License.  If you want to distribute this document
  28 separately from the collection, you can do so by adding a copy of the
  29 license to the document, as described in section 6 of the license.
  30 @end quotation
  31 @end copying
  32
  33 @dircategory Emacs editing modes
  34 @direntry
  35 * nXML Mode: (nxml-mode).       XML editing mode with RELAX NG support.
  36 @end direntry
  37
  38 @node Top
  39 @top nXML Mode
  40
  41 @insertcopying
  42
  43 This manual is not yet complete.
  44
  45 @menu
  46 * Introduction::
  47 * Completion::
  48 * Inserting end-tags::
  49 * Paragraphs::
  50 * Outlining::
  51 * Locating a schema::
  52 * DTDs::
  53 * Limitations::
  54 @end menu
  55
  56 @node Introduction
  57 @chapter Introduction
  58
  59 nXML mode is an Emacs major-mode for editing XML documents.  It supports
  60 editing well-formed XML documents, and provides schema-sensitive editing
  61 using RELAX NG Compact Syntax.  To get started, visit a file containing an
  62 XML document, and, if necessary, use @kbd{M-x nxml-mode} to switch to nXML
  63 mode.  By default, @code{auto-mode-alist} and @code{magic-fallback-alist}
  64 put buffers in nXML mode if they have recognizable XML content or file
  65 extensions.  You may wish to customize the settings, for example to
  66 recognize different file extensions.
  67
  68 Once in nXML mode, you can type @kbd{C-h m} for basic information on the
  69 mode.
  70
  71 The @file{etc/nxml} directory in the Emacs distribution contains some data
  72 files used by nXML mode, and includes two files (@file{test-valid.xml} and
  73 @file{test-invalid.xml}) that provide examples of valid and invalid XML
  74 documents.
  75
  76 To get validation and schema-sensitive editing, you need a RELAX NG Compact
  77 Syntax (RNC) schema for your document (@pxref{Locating a schema}).  The
  78 @file{etc/schema} directory includes some schemas for popular document
  79 types.  See @url{http://relaxng.org/} for more information on RELAX NG.
  80 You can use the @samp{Trang} program from
  81 @url{http://www.thaiopensource.com/relaxng/trang.html} to
  82 automatically create RNC schemas.  This program can:
  83
  84 @itemize @bullet
  85 @item
  86 infer an RNC schema from an instance document;
  87 @item
  88 convert a DTD to an RNC schema;
  89 @item
  90 convert a RELAX NG XML syntax schema to an RNC schema.
  91 @end itemize
  92
  93 @noindent To convert a RELAX NG XML syntax (@samp{.rng}) schema to a RNC
  94 one, you can also use the XSLT stylesheet from
  95 @url{http://www.pantor.com/download.html}.
  96
  97 To convert a W3C XML Schema to an RNC schema, you need first to convert it
  98 to RELAX NG XML syntax using the RELAX NG converter tool @code{rngconv}
  99 (built on top of MSV).  See @url{https://github.com/kohsuke/msv}
 100 and @url{https://msv.dev.java.net/}.
 101
 102 For historical discussions only, see the mailing list archives at
 103 @url{http://groups.yahoo.com/group/emacs-nxml-mode/}.  Please make all new
 104 discussions on the @samp{help-gnu-emacs} and @samp{emacs-devel} mailing
 105 lists.  Report any bugs with @kbd{M-x report-emacs-bug}.
 106
 107
 108 @node Completion
 109 @chapter Completion
 110
 111 Apart from real-time validation, the most important feature that nXML
 112 mode provides for assisting in document creation is "completion".
 113 Completion assists the user in inserting characters at point, based on
 114 knowledge of the schema and on the contents of the buffer before
 115 point.
 116
 117 nXML mode adapts the standard GNU Emacs command for completion in a
 118 buffer: @code{completion-at-point}, which is bound to @kbd{C-M-i} and
 119 @kbd{M-@key{TAB}}.  Note that many window systems and window managers
 120 use @kbd{M-@key{TAB}} themselves (typically for switching between
 121 windows) and do not pass it to applications.  In that case, you should
 122 type @kbd{C-M-i} or @kbd{@key{ESC} @key{TAB}} for completion, or bind
 123 @code{completion-at-point} to a key that is convenient for you.  In
 124 the following, I will assume that you type @kbd{C-M-i}.
 125
 126 nXML mode completion works by examining the symbol preceding point.
 127 This is the symbol to be completed. The symbol to be completed may be
 128 the empty. Completion considers what symbols starting with the symbol
 129 to be completed would be valid replacements for the symbol to be
 130 completed, given the schema and the contents of the buffer before
 131 point.  These symbols are the possible completions.  An example may
 132 make this clearer.  Suppose the buffer looks like this (where @point{}
 133 indicates point):
 134
 135 @example
 136 <html xmlns="http://www.w3.org/1999/xhtml">
 137 <h@point{}
 138 @end example
 139
 140 @noindent
 141 and the schema is XHTML.  In this context, the symbol to be completed
 142 is @samp{h}.  The possible completions consist of just
 143 @samp{head}.  Another example, is
 144
 145 @example
 146 <html xmlns="http://www.w3.org/1999/xhtml">
 147 <head>
 148 <@point{}
 149 @end example
 150
 151 @noindent
 152 In this case, the symbol to be completed is empty, and the possible
 153 completions are @samp{base}, @samp{isindex},
 154 @samp{link}, @samp{meta}, @samp{script},
 155 @samp{style}, @samp{title}.  Another example is:
 156
 157 @example
 158 <html xmlns="@point{}
 159 @end example
 160
 161 @noindent
 162 In this case, the symbol to be completed is empty, and the possible
 163 completions are just @samp{http://www.w3.org/1999/xhtml}.
 164
 165 When you type @kbd{C-M-i}, what happens depends
 166 on what the set of possible completions are.
 167
 168 @itemize @bullet
 169 @item
 170 If the set of completions is empty, nothing
 171 happens.
 172 @item
 173 If there is one possible completion, then that completion is
 174 inserted, together with any following characters that are
 175 required. For example, in this case:
 176
 177 @example
 178 <html xmlns="http://www.w3.org/1999/xhtml">
 179 <@point{}
 180 @end example
 181
 182 @noindent
 183 @kbd{C-M-i} will yield
 184
 185 @example
 186 <html xmlns="http://www.w3.org/1999/xhtml">
 187 <head@point{}
 188 @end example
 189 @item
 190 If there is more than one possible completion, but all
 191 possible completions share a common non-empty prefix, then that prefix
 192 is inserted. For example, suppose the buffer is:
 193
 194 @example
 195 <html x@point{}
 196 @end example
 197
 198 @noindent
 199 The symbol to be completed is @samp{x}. The possible completions are
 200 @samp{xmlns} and @samp{xml:lang}.  These share a common prefix of
 201 @samp{xml}.  Thus, @kbd{C-M-i} will yield:
 202
 203 @example
 204 <html xml@point{}
 205 @end example
 206
 207 @noindent
 208 Typically, you would do @kbd{C-M-i} again, which would have the result
 209 described in the next item.
 210 @item
 211 If there is more than one possible completion, but the
 212 possible completions do not share a non-empty prefix, then Emacs will
 213 prompt you to input the symbol in the minibuffer, initializing the
 214 minibuffer with the symbol to be completed, and popping up a buffer
 215 showing the possible completions.  You can now input the symbol to be
 216 inserted.  The symbol you input will be inserted in the buffer instead
 217 of the symbol to be completed.  Emacs will then insert any required
 218 characters after the symbol.  For example, if it contains:
 219
 220 @example
 221 <html xml@point{}
 222 @end example
 223
 224 @noindent
 225 Emacs will prompt you in the minibuffer with
 226
 227 @example
 228 Attribute: xml@point{}
 229 @end example
 230
 231 @noindent
 232 and the buffer showing possible completions will contain
 233
 234 @example
 235 Possible completions are:
 236 xml:lang                           xmlns
 237 @end example
 238
 239 @noindent
 240 If you input @kbd{xmlns}, the result will be:
 241
 242 @example
 243 <html xmlns="@point{}
 244 @end example
 245
 246 @noindent
 247 (If you do @kbd{C-M-i} again, the namespace URI will be
 248 inserted. Should that happen automatically?)
 249 @end itemize
 250
 251 @node Inserting end-tags
 252 @chapter Inserting end-tags
 253
 254 The main redundancy in XML syntax is end-tags.  nXML mode provides
 255 several ways to make it easier to enter end-tags.  You can use all of
 256 these without a schema.
 257
 258 You can use @kbd{C-M-i} after @samp{</} to complete the rest of the
 259 end-tag.
 260
 261 @kbd{C-c C-f} inserts an end-tag for the element containing
 262 point. This command is useful when you want to input the start-tag,
 263 then input the content and finally input the end-tag. The @samp{f}
 264 is mnemonic for finish.
 265
 266 If you want to keep tags balanced and input the end-tag at the
 267 same time as the start-tag, before inputting the content, then you can
 268 use @kbd{C-c C-i}. This inserts a @samp{>}, then inserts
 269 the end-tag and leaves point before the end-tag.  @kbd{C-c C-b}
 270 is similar but more convenient for block-level elements: it puts the
 271 start-tag, point and the end-tag on successive lines, appropriately
 272 indented. The @samp{i} is mnemonic for inline and the
 273 @samp{b} is mnemonic for block.
 274
 275 Finally, you can customize nXML mode so that @kbd{/} automatically
 276 inserts the rest of the end-tag when it occurs after @samp{<}, by
 277 doing
 278
 279 @display
 280 @kbd{M-x customize-variable @key{RET} nxml-slash-auto-complete-flag @key{RET}}
 281 @end display
 282
 283 @noindent
 284 and then following the instructions in the displayed buffer.
 285
 286 @node Paragraphs
 287 @chapter Paragraphs
 288
 289 Emacs has several commands that operate on paragraphs, most
 290 notably @kbd{M-q}. nXML mode redefines these to work in a way
 291 that is useful for XML.  The exact rules that are used to find the
 292 beginning and end of a paragraph are complicated; they are designed
 293 mainly to ensure that @kbd{M-q} does the right thing.
 294
 295 A paragraph consists of one or more complete, consecutive lines.
 296 A group of lines is not considered a paragraph unless it contains some
 297 non-whitespace characters between tags or inside comments.  A blank
 298 line separates paragraphs.  A single tag on a line by itself also
 299 separates paragraphs.  More precisely, if one tag together with any
 300 leading and trailing whitespace completely occupy one or more lines,
 301 then those lines will not be included in any paragraph.
 302
 303 A start-tag at the beginning of the line (possibly indented) may
 304 be treated as starting a paragraph.  Similarly, an end-tag at the end
 305 of the line may be treated as ending a paragraph. The following rules
 306 are used to determine whether such a tag is in fact treated as a
 307 paragraph boundary:
 308
 309 @itemize @bullet
 310 @item
 311 If the schema does not allow text at that point, then it
 312 is a paragraph boundary.
 313 @item
 314 If the end-tag corresponding to the start-tag is not at
 315 the end of its line, or the start-tag corresponding to the end-tag is
 316 not at the beginning of its line, then it is not a paragraph
 317 boundary. For example, in
 318
 319 @example
 320 <p>This is a paragraph with an
 321 <emph>emphasized</emph> phrase.
 322 @end example
 323
 324 @noindent
 325 the @samp{<emph>} start-tag would not be considered as
 326 starting a paragraph, because its corresponding end-tag is not at the
 327 end of the line.
 328 @item
 329 If there is text that is a sibling in element tree, then
 330 it is not a paragraph boundary.  For example, in
 331
 332 @example
 333 <p>This is a paragraph with an
 334 <emph>emphasized phrase that takes one source line</emph>
 335 @end example
 336
 337 @noindent
 338 the @samp{<emph>} start-tag would not be considered as
 339 starting a paragraph, even though its end-tag is at the end of its
 340 line, because there the text @samp{This is a paragraph with an}
 341 is a sibling of the @samp{emph} element.
 342 @item
 343 Otherwise, it is a paragraph boundary.
 344 @end itemize
 345
 346 @node Outlining
 347 @chapter Outlining
 348
 349 nXML mode allows you to display all or part of a buffer as an
 350 outline, in a similar way to Emacs's outline mode.  An outline in nXML
 351 mode is based on recognizing two kinds of element: sections and
 352 headings.  There is one heading for every section and one section for
 353 every heading.  A section contains its heading as or within its first
 354 child element.  A section also contains its subordinate sections (its
 355 subsections).  The text content of a section consists of anything in a
 356 section that is neither a subsection nor a heading.
 357
 358 Note that this is a different model from that used by XHTML.
 359 nXML mode's outline support will not be useful for XHTML unless you
 360 adopt a convention of adding a @code{div} to enclose each
 361 section, rather than having sections implicitly delimited by different
 362 @code{h@var{n}} elements.  This limitation may be removed
 363 in a future version.
 364
 365 The variable @code{nxml-section-element-name-regexp} gives
 366 a regexp for the local names (i.e. the part of the name following any
 367 prefix) of section elements. The variable
 368 @code{nxml-heading-element-name-regexp} gives a regexp for the
 369 local names of heading elements. For an element to be recognized
 370 as a section
 371
 372 @itemize @bullet
 373 @item
 374 its start-tag must occur at the beginning of a line
 375 (possibly indented);
 376 @item
 377 its local name must match
 378 @code{nxml-section-element-name-regexp};
 379 @item
 380 either its first child element or a descendant of that
 381 first child element must have a local name that matches
 382 @code{nxml-heading-element-name-regexp}; the first such element
 383 is treated as the section's heading.
 384 @end itemize
 385
 386 @noindent
 387 You can customize these variables using @kbd{M-x
 388 customize-variable}.
 389
 390 There are three possible outline states for a section:
 391
 392 @itemize @bullet
 393 @item
 394 normal, showing everything, including its heading, text
 395 content and subsections; each subsection is displayed according to the
 396 state of that subsection;
 397 @item
 398 showing just its heading, with both its text content and
 399 its subsections hidden; all subsections are hidden regardless of their
 400 state;
 401 @item
 402 showing its heading and its subsections, with its text
 403 content hidden; each subsection is displayed according to the state of
 404 that subsection.
 405 @end itemize
 406
 407 In the last two states, where the text content is hidden, the
 408 heading is displayed specially, in an abbreviated form. An element
 409 like this:
 410
 411 @example
 412 <section>
 413 <title>Food</title>
 414 <para>There are many kinds of food.</para>
 415 </section>
 416 @end example
 417
 418 @noindent
 419 would be displayed on a single line like this:
 420
 421 @example
 422 <-section>Food...</>
 423 @end example
 424
 425 @noindent
 426 If there are hidden subsections, then a @code{+} will be used
 427 instead of a @code{-} like this:
 428
 429 @example
 430 <+section>Food...</>
 431 @end example
 432
 433 @noindent
 434 If there are non-hidden subsections, then the section will instead be
 435 displayed like this:
 436
 437 @example
 438 <-section>Food...
 439   <-section>Delicious Food...</>
 440   <-section>Distasteful Food...</>
 441 </-section>
 442 @end example
 443
 444 @noindent
 445 The heading is always displayed with an indent that corresponds to its
 446 depth in the outline, even it is not actually indented in the buffer.
 447 The variable @code{nxml-outline-child-indent} controls how much
 448 a subheading is indented with respect to its parent heading when the
 449 heading is being displayed specially.
 450
 451 Commands to change the outline state of sections are bound to
 452 key sequences that start with @kbd{C-c C-o} (@kbd{o} is
 453 mnemonic for outline).  The third and final key has been chosen to be
 454 consistent with outline mode.  In the following descriptions
 455 current section means the section containing point, or, more precisely,
 456 the innermost section containing the character immediately following
 457 point.
 458
 459 @itemize @bullet
 460 @item
 461 @kbd{C-c C-o C-a} shows all sections in the buffer
 462 normally.
 463 @item
 464 @kbd{C-c C-o C-t} hides the text content
 465 of all sections in the buffer.
 466 @item
 467 @kbd{C-c C-o C-c} hides the text content
 468 of the current section.
 469 @item
 470 @kbd{C-c C-o C-e} shows the text content
 471 of the current section.
 472 @item
 473 @kbd{C-c C-o C-d} hides the text content
 474 and subsections of the current section.
 475 @item
 476 @kbd{C-c C-o C-s} shows the current section
 477 and all its direct and indirect subsections normally.
 478 @item
 479 @kbd{C-c C-o C-k} shows the headings of the
 480 direct and indirect subsections of the current section.
 481 @item
 482 @kbd{C-c C-o C-l} hides the text content of the
 483 current section and of its direct and indirect
 484 subsections.
 485 @item
 486 @kbd{C-c C-o C-i} shows the headings of the
 487 direct subsections of the current section.
 488 @item
 489 @kbd{C-c C-o C-o} hides as much as possible without
 490 hiding the current section's text content; the headings of ancestor
 491 sections of the current section and their child section sections will
 492 not be hidden.
 493 @end itemize
 494
 495 When a heading is displayed specially, you can use
 496 @key{RET} in that heading to show the text content of the section
 497 in the same way as @kbd{C-c C-o C-e}.
 498
 499 You can also use the mouse to change the outline state:
 500 @kbd{S-mouse-2} hides the text content of a section in the same
 501 way as@kbd{C-c C-o C-c}; @kbd{mouse-2} on a specially
 502 displayed heading shows the text content of the section in the same
 503 way as @kbd{C-c C-o C-e}; @kbd{mouse-1} on a specially
 504 displayed start-tag toggles the display of subheadings on and
 505 off.
 506
 507 The outline state for each section is stored with the first
 508 character of the section (as a text property). Every command that
 509 changes the outline state of any section updates the display of the
 510 buffer so that each section is displayed correctly according to its
 511 outline state.  If the section structure is subsequently changed, then
 512 it is possible for the display to no longer correctly reflect the
 513 stored outline state. @kbd{C-c C-o C-r} can be used to refresh
 514 the display so it is correct again.
 515
 516 @node Locating a schema
 517 @chapter Locating a schema
 518
 519 nXML mode has a configurable set of rules to locate a schema for
 520 the file being edited.  The rules are contained in one or more schema
 521 locating files, which are XML documents.
 522
 523 The variable @samp{rng-schema-locating-files} specifies
 524 the list of the file-names of schema locating files that nXML mode
 525 should use.  The order of the list is significant: when file
 526 @var{x} occurs in the list before file @var{y} then rules
 527 from file @var{x} have precedence over rules from file
 528 @var{y}.  A filename specified in
 529 @samp{rng-schema-locating-files} may be relative. If so, it will
 530 be resolved relative to the document for which a schema is being
 531 located. It is not an error if relative file-names in
 532 @samp{rng-schema-locating-files} do not exist. You can use
 533 @kbd{M-x customize-variable @key{RET} rng-schema-locating-files
 534 @key{RET}} to customize the list of schema locating
 535 files.
 536
 537 By default, @samp{rng-schema-locating-files} list has two
 538 members: @samp{schemas.xml}, and
 539 @samp{@var{dist-dir}/schema/schemas.xml} where
 540 @samp{@var{dist-dir}} is the directory containing the nXML
 541 distribution. The first member will cause nXML mode to use a file
 542 @samp{schemas.xml} in the same directory as the document being
 543 edited if such a file exist.  The second member contains rules for the
 544 schemas that are included with the nXML distribution.
 545
 546 @menu
 547 * Commands for locating a schema::
 548 * Schema locating files::
 549 @end menu
 550
 551 @node Commands for locating a schema
 552 @section Commands for locating a schema
 553
 554 The command @kbd{C-c C-s C-w} will tell you what schema
 555 is currently being used.
 556
 557 The rules for locating a schema are applied automatically when
 558 you visit a file in nXML mode. However, if you have just created a new
 559 file and the schema cannot be inferred from the file-name, then this
 560 will not locate the right schema.  In this case, you should insert the
 561 start-tag of the root element and then use the command @kbd{C-c C-s
 562 C-a}, which reapplies the rules based on the current content of
 563 the document.  It is usually not necessary to insert the complete
 564 start-tag; often just @samp{<@var{name}} is
 565 enough.
 566
 567 If you want to use a schema that has not yet been added to the
 568 schema locating files, you can use the command @kbd{C-c C-s C-f}
 569 to manually select the file containing the schema for the document in
 570 current buffer.  Emacs will read the file-name of the schema from the
 571 minibuffer. After reading the file-name, Emacs will ask whether you
 572 wish to add a rule to a schema locating file that persistently
 573 associates the document with the selected schema.  The rule will be
 574 added to the first file in the list specified
 575 @samp{rng-schema-locating-files}; it will create the file if
 576 necessary, but will not create a directory. If the variable
 577 @samp{rng-schema-locating-files} has not been customized, this
 578 means that the rule will be added to the file @samp{schemas.xml}
 579 in the same directory as the document being edited.
 580
 581 The command @kbd{C-c C-s C-t} allows you to select a schema by
 582 specifying an identifier for the type of the document.  The schema
 583 locating files determine the available type identifiers and what
 584 schema is used for each type identifier. This is useful when it is
 585 impossible to infer the right schema from either the file-name or the
 586 content of the document, even though the schema is already in the
 587 schema locating file.  A situation in which this can occur is when
 588 there are multiple variants of a schema where all valid documents have
 589 the same document element.  For example, XHTML has Strict and
 590 Transitional variants.  In a situation like this, a schema locating file
 591 can define a type identifier for each variant. As with @kbd{C-c
 592 C-s C-f}, Emacs will ask whether you wish to add a rule to a schema
 593 locating file that persistently associates the document with the
 594 specified type identifier.
 595
 596 The command @kbd{C-c C-s C-l} adds a rule to a schema
 597 locating file that persistently associates the document with
 598 the schema that is currently being used.
 599
 600 @node Schema locating files
 601 @section Schema locating files
 602
 603 Each schema locating file specifies a list of rules.  The rules
 604 from each file are appended in order. To locate a schema each rule is
 605 applied in turn until a rule matches.  The first matching rule is then
 606 used to determine the schema.
 607
 608 Schema locating files are designed to be useful for other
 609 applications that need to locate a schema for a document. In fact,
 610 there is nothing specific to locating schemas in the design; it could
 611 equally well be used for locating a stylesheet.
 612
 613 @menu
 614 * Schema locating file syntax basics::
 615 * Using the document's URI to locate a schema::
 616 * Using the document element to locate a schema::
 617 * Using type identifiers in schema locating files::
 618 * Using multiple schema locating files::
 619 @end menu
 620
 621 @node Schema locating file syntax basics
 622 @subsection Schema locating file syntax basics
 623
 624 There is a schema for schema locating files in the file
 625 @samp{locate.rnc} in the schema directory.  Schema locating
 626 files must be valid with respect to this schema.
 627
 628 The document element of a schema locating file must be
 629 @samp{locatingRules} and the namespace URI must be
 630 @samp{http://thaiopensource.com/ns/locating-rules/1.0}.  The
 631 children of the document element specify rules. The order of the
 632 children is the same as the order of the rules.  Here's a complete
 633 example of a schema locating file:
 634
 635 @example
 636 <?xml version="1.0"?>
 637 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 638   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 639   <documentElement localName="book" uri="docbook.rnc"/>
 640 </locatingRules>
 641 @end example
 642
 643 @noindent
 644 This says to use the schema @samp{xhtml.rnc} for a document with
 645 namespace @samp{http://www.w3.org/1999/xhtml}, and to use the
 646 schema @samp{docbook.rnc} for a document whose local name is
 647 @samp{book}.  If the document element had both a namespace URI
 648 of @samp{http://www.w3.org/1999/xhtml} and a local name of
 649 @samp{book}, then the matching rule that comes first will be
 650 used and so the schema @samp{xhtml.rnc} would be used.  There is
 651 no precedence between different types of rule; the first matching rule
 652 of any type is used.
 653
 654 As usual with XML-related technologies, resources are identified
 655 by URIs.  The @samp{uri} attribute identifies the schema by
 656 specifying the URI.  The URI may be relative.  If so, it is resolved
 657 relative to the URI of the schema locating file that contains
 658 attribute. This means that if the value of @samp{uri} attribute
 659 does not contain a @samp{/}, then it will refer to a filename in
 660 the same directory as the schema locating file.
 661
 662 @node Using the document's URI to locate a schema
 663 @subsection Using the document's URI to locate a schema
 664
 665 A @samp{uri} rule locates a schema based on the URI of the
 666 document.  The @samp{uri} attribute specifies the URI of the
 667 schema.  The @samp{resource} attribute can be used to specify
 668 the schema for a particular document.  For example,
 669
 670 @example
 671 <uri resource="spec.xml" uri="docbook.rnc"/>
 672 @end example
 673
 674 @noindent
 675 specifies that the schema for @samp{spec.xml} is
 676 @samp{docbook.rnc}.
 677
 678 The @samp{pattern} attribute can be used instead of the
 679 @samp{resource} attribute to specify the schema for any document
 680 whose URI matches a pattern.  The pattern has the same syntax as an
 681 absolute or relative URI except that the path component of the URI can
 682 use a @samp{*} character to stand for zero or more characters
 683 within a path segment (i.e. any character other @samp{/}).
 684 Typically, the URI pattern looks like a relative URI, but, whereas a
 685 relative URI in the @samp{resource} attribute is resolved into a
 686 particular absolute URI using the base URI of the schema locating
 687 file, a relative URI pattern matches if it matches some number of
 688 complete path segments of the document's URI ending with the last path
 689 segment of the document's URI. For example,
 690
 691 @example
 692 <uri pattern="*.xsl" uri="xslt.rnc"/>
 693 @end example
 694
 695 @noindent
 696 specifies that the schema for documents with a URI whose path ends
 697 with @samp{.xsl} is @samp{xslt.rnc}.
 698
 699 A @samp{transformURI} rule locates a schema by
 700 transforming the URI of the document. The @samp{fromPattern}
 701 attribute specifies a URI pattern with the same meaning as the
 702 @samp{pattern} attribute of the @samp{uri} element.  The
 703 @samp{toPattern} attribute is a URI pattern that is used to
 704 generate the URI of the schema.  Each @samp{*} in the
 705 @samp{toPattern} is replaced by the string that matched the
 706 corresponding @samp{*} in the @samp{fromPattern}.  The
 707 resulting string is appended to the initial part of the document's URI
 708 that was not explicitly matched by the @samp{fromPattern}.  The
 709 rule matches only if the transformed URI identifies an existing
 710 resource.  For example, the rule
 711
 712 @example
 713 <transformURI fromPattern="*.xml" toPattern="*.rnc"/>
 714 @end example
 715
 716 @noindent
 717 would transform the URI @samp{file:///home/jjc/docs/spec.xml}
 718 into the URI @samp{file:///home/jjc/docs/spec.rnc}.  Thus, this
 719 rule specifies that to locate a schema for a document
 720 @samp{@var{foo}.xml}, Emacs should test whether a file
 721 @samp{@var{foo}.rnc} exists in the same directory as
 722 @samp{@var{foo}.xml}, and, if so, should use it as the
 723 schema.
 724
 725 @node Using the document element to locate a schema
 726 @subsection Using the document element to locate a schema
 727
 728 A @samp{documentElement} rule locates a schema based on
 729 the local name and prefix of the document element. For example, a rule
 730
 731 @example
 732 <documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
 733 @end example
 734
 735 @noindent
 736 specifies that when the name of the document element is
 737 @samp{xsl:stylesheet}, then @samp{xslt.rnc} should be used
 738 as the schema. Either the @samp{prefix} or
 739 @samp{localName} attribute may be omitted to allow any prefix or
 740 local name.
 741
 742 A @samp{namespace} rule locates a schema based on the
 743 namespace URI of the document element. For example, a rule
 744
 745 @example
 746 <namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
 747 @end example
 748
 749 @noindent
 750 specifies that when the namespace URI of the document is
 751 @samp{http://www.w3.org/1999/XSL/Transform}, then
 752 @samp{xslt.rnc} should be used as the schema.
 753
 754 @node Using type identifiers in schema locating files
 755 @subsection Using type identifiers in schema locating files
 756
 757 Type identifiers allow a level of indirection in locating the
 758 schema for a document.  Instead of associating the document directly
 759 with a schema URI, the document is associated with a type identifier,
 760 which is in turn associated with a schema URI. nXML mode does not
 761 constrain the format of type identifiers.  They can be simply strings
 762 without any formal structure or they can be public identifiers or
 763 URIs.  Note that these type identifiers have nothing to do with the
 764 DOCTYPE declaration.  When comparing type identifiers, whitespace is
 765 normalized in the same way as with the @samp{xsd:token}
 766 datatype: leading and trailing whitespace is stripped; other sequences
 767 of whitespace are normalized to a single space character.
 768
 769 Each of the rules described in previous sections that uses a
 770 @samp{uri} attribute to specify a schema, can instead use a
 771 @samp{typeId} attribute to specify a type identifier.  The type
 772 identifier can be associated with a URI using a @samp{typeId}
 773 element. For example,
 774
 775 @example
 776 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 777   <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/>
 778   <typeId id="XHTML" typeId="XHTML Strict"/>
 779   <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/>
 780   <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/>
 781 </locatingRules>
 782 @end example
 783
 784 @noindent
 785 declares three type identifiers @samp{XHTML} (representing the
 786 default variant of XHTML to be used), @samp{XHTML Strict} and
 787 @samp{XHTML Transitional}.  Such a schema locating file would
 788 use @samp{xhtml-strict.rnc} for a document whose namespace is
 789 @samp{http://www.w3.org/1999/xhtml}.  But it is considerably
 790 more flexible than a schema locating file that simply specified
 791
 792 @example
 793 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
 794 @end example
 795
 796 @noindent
 797 A user can easily use @kbd{C-c C-s C-t} to select between XHTML
 798 Strict and XHTML Transitional. Also, a user can easily add a catalog
 799
 800 @example
 801 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 802   <typeId id="XHTML" typeId="XHTML Transitional"/>
 803 </locatingRules>
 804 @end example
 805
 806 @noindent
 807 that makes the default variant of XHTML be XHTML Transitional.
 808
 809 @node Using multiple schema locating files
 810 @subsection Using multiple schema locating files
 811
 812 The @samp{include} element includes rules from another
 813 schema locating file.  The behavior is exactly as if the rules from
 814 that file were included in place of the @samp{include} element.
 815 Relative URIs are resolved into absolute URIs before the inclusion is
 816 performed. For example,
 817
 818 @example
 819 <include rules="../rules.xml"/>
 820 @end example
 821
 822 @noindent
 823 includes the rules from @samp{rules.xml}.
 824
 825 The process of locating a schema takes as input a list of schema
 826 locating files.  The rules in all these files and in the files they
 827 include are resolved into a single list of rules, which are applied
 828 strictly in order.  Sometimes this order is not what is needed.
 829 For example, suppose you have two schema locating files, a private
 830 file
 831
 832 @example
 833 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 834   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 835 </locatingRules>
 836 @end example
 837
 838 @noindent
 839 followed by a public file
 840
 841 @example
 842 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 843   <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
 844   <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/>
 845 </locatingRules>
 846 @end example
 847
 848 @noindent
 849 The effect of these two files is that the XHTML @samp{namespace}
 850 rule takes precedence over the @samp{transformURI} rule, which
 851 is almost certainly not what is needed.  This can be solved by adding
 852 an @samp{applyFollowingRules} to the private file.
 853
 854 @example
 855 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 856   <applyFollowingRules ruleType="transformURI"/>
 857   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 858 </locatingRules>
 859 @end example
 860
 861 @node DTDs
 862 @chapter DTDs
 863
 864 nXML mode is designed to support the creation of standalone XML
 865 documents that do not depend on a DTD.  Although it is common practice
 866 to insert a DOCTYPE declaration referencing an external DTD, this has
 867 undesirable side-effects.  It means that the document is no longer
 868 self-contained. It also means that different XML parsers may interpret
 869 the document in different ways, since the XML Recommendation does not
 870 require XML parsers to read the DTD.  With DTDs, it was impractical to
 871 get validation without using an external DTD or reference to an
 872 parameter entity.  With RELAX NG and other schema languages, you can
 873 simultaneously get the benefits of validation and standalone XML
 874 documents.  Therefore, I recommend that you do not reference an
 875 external DOCTYPE in your XML documents.
 876
 877 One problem is entities for characters. Typically, as well as
 878 providing validation, DTDs also provide a set of character entities
 879 for documents to use. Schemas cannot provide this functionality,
 880 because schema validation happens after XML parsing.  The recommended
 881 solution is to either use the Unicode characters directly, or, if this
 882 is impractical, use character references.  nXML mode supports this by
 883 providing commands for entering characters and character references
 884 using the Unicode names, and can display the glyph corresponding to a
 885 character reference.
 886
 887 @node Limitations
 888 @chapter Limitations
 889
 890 nXML mode has some limitations:
 891
 892 @itemize @bullet
 893 @item
 894 DTD support is limited.  Internal parsed general entities declared
 895 in the internal subset are supported provided they do not contain
 896 elements. Other usage of DTDs is ignored.
 897 @item
 898 The restrictions on RELAX NG schemas in section 7 of the RELAX NG
 899 specification are not enforced.
 900 @end itemize
 901
 902 @bye