doc/misc/nxml-mode.texi

   1 \input texinfo @c -*- texinfo -*-
   2 @c %**start of header
   3 @setfilename ../../info/nxml-mode.info
   4 @settitle nXML Mode
   5 @include docstyle.texi
   6 @c %**end of header
   7
   8 @copying
   9 This manual documents nXML mode, an Emacs major mode for editing
  10 XML with RELAX NG support.
  11
  12 Copyright @copyright{} 2007--2016 Free Software Foundation, Inc.
  13
  14 @quotation
  15 Permission is granted to copy, distribute and/or modify this document
  16 under the terms of the GNU Free Documentation License, Version 1.3 or
  17 any later version published by the Free Software Foundation; with no
  18 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
  19 and with the Back-Cover Texts as in (a) below.  A copy of the license
  20 is included in the section entitled ``GNU Free Documentation License''.
  21
  22 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
  23 modify this GNU manual.''
  24 @end quotation
  25 @end copying
  26
  27 @dircategory Emacs editing modes
  28 @direntry
  29 * nXML Mode: (nxml-mode).       XML editing mode with RELAX NG support.
  30 @end direntry
  31
  32
  33 @titlepage
  34 @title nXML mode
  35 @page
  36 @vskip 0pt plus 1filll
  37 @insertcopying
  38 @end titlepage
  39
  40 @contents
  41
  42
  43 @node Top
  44 @top nXML Mode
  45
  46 @insertcopying
  47
  48 This manual is not yet complete.
  49
  50 @menu
  51 * Introduction::
  52 * Completion::
  53 * Inserting end-tags::
  54 * Paragraphs::
  55 * Outlining::
  56 * Locating a schema::
  57 * DTDs::
  58 * Limitations::
  59 * GNU Free Documentation License::  The license for this documentation.
  60 @end menu
  61
  62 @node Introduction
  63 @chapter Introduction
  64
  65 nXML mode is an Emacs major-mode for editing XML documents.  It supports
  66 editing well-formed XML documents, and provides schema-sensitive editing
  67 using RELAX NG Compact Syntax.  To get started, visit a file containing an
  68 XML document, and, if necessary, use @kbd{M-x nxml-mode} to switch to nXML
  69 mode.  By default, @code{auto-mode-alist} and @code{magic-fallback-alist}
  70 put buffers in nXML mode if they have recognizable XML content or file
  71 extensions.  You may wish to customize the settings, for example to
  72 recognize different file extensions.
  73
  74 Once in nXML mode, you can type @kbd{C-h m} for basic information on the
  75 mode.
  76
  77 The @file{etc/nxml} directory in the Emacs distribution contains some data
  78 files used by nXML mode, and includes two files (@file{test-valid.xml} and
  79 @file{test-invalid.xml}) that provide examples of valid and invalid XML
  80 documents.
  81
  82 To get validation and schema-sensitive editing, you need a RELAX NG Compact
  83 Syntax (RNC) schema for your document (@pxref{Locating a schema}).  The
  84 @file{etc/schema} directory includes some schemas for popular document
  85 types.  See @url{http://relaxng.org/} for more information on RELAX NG@.
  86 You can use the @samp{Trang} program from
  87 @url{http://www.thaiopensource.com/relaxng/trang.html} to
  88 automatically create RNC schemas.  This program can:
  89
  90 @itemize @bullet
  91 @item
  92 infer an RNC schema from an instance document;
  93 @item
  94 convert a DTD to an RNC schema;
  95 @item
  96 convert a RELAX NG XML syntax schema to an RNC schema.
  97 @end itemize
  98
  99 @noindent To convert a RELAX NG XML syntax (@samp{.rng}) schema to a RNC
 100 one, you can also use the XSLT stylesheet from
 101 @url{https://github.com/oleg-pavliv/emacs/tree/master/xsl}.
 102 @ignore
 103 @c Original location, now defunct.
 104 @url{http://www.pantor.com/download.html}.
 105 @end ignore
 106
 107 To convert a W3C XML Schema to an RNC schema, you need first to convert it
 108 to RELAX NG XML syntax using the RELAX NG converter tool @code{rngconv}
 109 (built on top of MSV).  See @url{https://github.com/kohsuke/msv}
 110 and @url{https://msv.dev.java.net/}.
 111
 112 For historical discussions only, see the mailing list archives at
 113 @url{http://groups.yahoo.com/group/emacs-nxml-mode/}.  Please make all new
 114 discussions on the @samp{help-gnu-emacs} and @samp{emacs-devel} mailing
 115 lists.  Report any bugs with @kbd{M-x report-emacs-bug}.
 116
 117
 118 @node Completion
 119 @chapter Completion
 120
 121 Apart from real-time validation, the most important feature that nXML
 122 mode provides for assisting in document creation is "completion".
 123 Completion assists the user in inserting characters at point, based on
 124 knowledge of the schema and on the contents of the buffer before
 125 point.
 126
 127 nXML mode adapts the standard GNU Emacs command for completion in a
 128 buffer: @code{completion-at-point}, which is bound to @kbd{C-M-i} and
 129 @kbd{M-@key{TAB}}.  Note that many window systems and window managers
 130 use @kbd{M-@key{TAB}} themselves (typically for switching between
 131 windows) and do not pass it to applications.  In that case, you should
 132 type @kbd{C-M-i} or @kbd{@key{ESC} @key{TAB}} for completion, or bind
 133 @code{completion-at-point} to a key that is convenient for you.  In
 134 the following, I will assume that you type @kbd{C-M-i}.
 135
 136 nXML mode completion works by examining the symbol preceding point.
 137 This is the symbol to be completed. The symbol to be completed may be
 138 the empty. Completion considers what symbols starting with the symbol
 139 to be completed would be valid replacements for the symbol to be
 140 completed, given the schema and the contents of the buffer before
 141 point.  These symbols are the possible completions.  An example may
 142 make this clearer.  Suppose the buffer looks like this (where @point{}
 143 indicates point):
 144
 145 @example
 146 <html xmlns="http://www.w3.org/1999/xhtml">
 147 <h@point{}
 148 @end example
 149
 150 @noindent
 151 and the schema is XHTML@.  In this context, the symbol to be completed
 152 is @samp{h}.  The possible completions consist of just
 153 @samp{head}.  Another example, is
 154
 155 @example
 156 <html xmlns="http://www.w3.org/1999/xhtml">
 157 <head>
 158 <@point{}
 159 @end example
 160
 161 @noindent
 162 In this case, the symbol to be completed is empty, and the possible
 163 completions are @samp{base}, @samp{isindex},
 164 @samp{link}, @samp{meta}, @samp{script},
 165 @samp{style}, @samp{title}.  Another example is:
 166
 167 @example
 168 <html xmlns="@point{}
 169 @end example
 170
 171 @noindent
 172 In this case, the symbol to be completed is empty, and the possible
 173 completions are just @samp{http://www.w3.org/1999/xhtml}.
 174
 175 When you type @kbd{C-M-i}, what happens depends
 176 on what the set of possible completions are.
 177
 178 @itemize @bullet
 179 @item
 180 If the set of completions is empty, nothing
 181 happens.
 182 @item
 183 If there is one possible completion, then that completion is
 184 inserted, together with any following characters that are
 185 required. For example, in this case:
 186
 187 @example
 188 <html xmlns="http://www.w3.org/1999/xhtml">
 189 <@point{}
 190 @end example
 191
 192 @noindent
 193 @kbd{C-M-i} will yield
 194
 195 @example
 196 <html xmlns="http://www.w3.org/1999/xhtml">
 197 <head@point{}
 198 @end example
 199 @item
 200 If there is more than one possible completion, but all
 201 possible completions share a common non-empty prefix, then that prefix
 202 is inserted. For example, suppose the buffer is:
 203
 204 @example
 205 <html x@point{}
 206 @end example
 207
 208 @noindent
 209 The symbol to be completed is @samp{x}. The possible completions are
 210 @samp{xmlns} and @samp{xml:lang}.  These share a common prefix of
 211 @samp{xml}.  Thus, @kbd{C-M-i} will yield:
 212
 213 @example
 214 <html xml@point{}
 215 @end example
 216
 217 @noindent
 218 Typically, you would do @kbd{C-M-i} again, which would have the result
 219 described in the next item.
 220 @item
 221 If there is more than one possible completion, but the
 222 possible completions do not share a non-empty prefix, then Emacs will
 223 prompt you to input the symbol in the minibuffer, initializing the
 224 minibuffer with the symbol to be completed, and popping up a buffer
 225 showing the possible completions.  You can now input the symbol to be
 226 inserted.  The symbol you input will be inserted in the buffer instead
 227 of the symbol to be completed.  Emacs will then insert any required
 228 characters after the symbol.  For example, if it contains:
 229
 230 @example
 231 <html xml@point{}
 232 @end example
 233
 234 @noindent
 235 Emacs will prompt you in the minibuffer with
 236
 237 @example
 238 Attribute: xml@point{}
 239 @end example
 240
 241 @noindent
 242 and the buffer showing possible completions will contain
 243
 244 @example
 245 Possible completions are:
 246 xml:lang                           xmlns
 247 @end example
 248
 249 @noindent
 250 If you input @kbd{xmlns}, the result will be:
 251
 252 @example
 253 <html xmlns="@point{}
 254 @end example
 255
 256 @noindent
 257 (If you do @kbd{C-M-i} again, the namespace URI will be
 258 inserted. Should that happen automatically?)
 259 @end itemize
 260
 261 @node Inserting end-tags
 262 @chapter Inserting end-tags
 263
 264 The main redundancy in XML syntax is end-tags.  nXML mode provides
 265 several ways to make it easier to enter end-tags.  You can use all of
 266 these without a schema.
 267
 268 You can use @kbd{C-M-i} after @samp{</} to complete the rest of the
 269 end-tag.
 270
 271 @kbd{C-c C-f} inserts an end-tag for the element containing
 272 point. This command is useful when you want to input the start-tag,
 273 then input the content and finally input the end-tag. The @samp{f}
 274 is mnemonic for finish.
 275
 276 If you want to keep tags balanced and input the end-tag at the
 277 same time as the start-tag, before inputting the content, then you can
 278 use @kbd{C-c C-i}. This inserts a @samp{>}, then inserts
 279 the end-tag and leaves point before the end-tag.  @kbd{C-c C-b}
 280 is similar but more convenient for block-level elements: it puts the
 281 start-tag, point and the end-tag on successive lines, appropriately
 282 indented. The @samp{i} is mnemonic for inline and the
 283 @samp{b} is mnemonic for block.
 284
 285 Finally, you can customize nXML mode so that @kbd{/} automatically
 286 inserts the rest of the end-tag when it occurs after @samp{<}, by
 287 doing
 288
 289 @display
 290 @kbd{M-x customize-variable @key{RET} nxml-slash-auto-complete-flag @key{RET}}
 291 @end display
 292
 293 @noindent
 294 and then following the instructions in the displayed buffer.
 295
 296 @node Paragraphs
 297 @chapter Paragraphs
 298
 299 Emacs has several commands that operate on paragraphs, most
 300 notably @kbd{M-q}. nXML mode redefines these to work in a way
 301 that is useful for XML@.  The exact rules that are used to find the
 302 beginning and end of a paragraph are complicated; they are designed
 303 mainly to ensure that @kbd{M-q} does the right thing.
 304
 305 A paragraph consists of one or more complete, consecutive lines.
 306 A group of lines is not considered a paragraph unless it contains some
 307 non-whitespace characters between tags or inside comments.  A blank
 308 line separates paragraphs.  A single tag on a line by itself also
 309 separates paragraphs.  More precisely, if one tag together with any
 310 leading and trailing whitespace completely occupy one or more lines,
 311 then those lines will not be included in any paragraph.
 312
 313 A start-tag at the beginning of the line (possibly indented) may
 314 be treated as starting a paragraph.  Similarly, an end-tag at the end
 315 of the line may be treated as ending a paragraph. The following rules
 316 are used to determine whether such a tag is in fact treated as a
 317 paragraph boundary:
 318
 319 @itemize @bullet
 320 @item
 321 If the schema does not allow text at that point, then it
 322 is a paragraph boundary.
 323 @item
 324 If the end-tag corresponding to the start-tag is not at
 325 the end of its line, or the start-tag corresponding to the end-tag is
 326 not at the beginning of its line, then it is not a paragraph
 327 boundary. For example, in
 328
 329 @example
 330 <p>This is a paragraph with an
 331 <emph>emphasized</emph> phrase.
 332 @end example
 333
 334 @noindent
 335 the @samp{<emph>} start-tag would not be considered as
 336 starting a paragraph, because its corresponding end-tag is not at the
 337 end of the line.
 338 @item
 339 If there is text that is a sibling in element tree, then
 340 it is not a paragraph boundary.  For example, in
 341
 342 @example
 343 <p>This is a paragraph with an
 344 <emph>emphasized phrase that takes one source line</emph>
 345 @end example
 346
 347 @noindent
 348 the @samp{<emph>} start-tag would not be considered as
 349 starting a paragraph, even though its end-tag is at the end of its
 350 line, because there the text @samp{This is a paragraph with an}
 351 is a sibling of the @samp{emph} element.
 352 @item
 353 Otherwise, it is a paragraph boundary.
 354 @end itemize
 355
 356 @node Outlining
 357 @chapter Outlining
 358
 359 nXML mode allows you to display all or part of a buffer as an
 360 outline, in a similar way to Emacs's outline mode.  An outline in nXML
 361 mode is based on recognizing two kinds of element: sections and
 362 headings.  There is one heading for every section and one section for
 363 every heading.  A section contains its heading as or within its first
 364 child element.  A section also contains its subordinate sections (its
 365 subsections).  The text content of a section consists of anything in a
 366 section that is neither a subsection nor a heading.
 367
 368 Note that this is a different model from that used by XHTML@.
 369 nXML mode's outline support will not be useful for XHTML unless you
 370 adopt a convention of adding a @code{div} to enclose each
 371 section, rather than having sections implicitly delimited by different
 372 @code{h@var{n}} elements.  This limitation may be removed
 373 in a future version.
 374
 375 The variable @code{nxml-section-element-name-regexp} gives
 376 a regexp for the local names (i.e., the part of the name following any
 377 prefix) of section elements. The variable
 378 @code{nxml-heading-element-name-regexp} gives a regexp for the
 379 local names of heading elements. For an element to be recognized
 380 as a section
 381
 382 @itemize @bullet
 383 @item
 384 its start-tag must occur at the beginning of a line
 385 (possibly indented);
 386 @item
 387 its local name must match
 388 @code{nxml-section-element-name-regexp};
 389 @item
 390 either its first child element or a descendant of that
 391 first child element must have a local name that matches
 392 @code{nxml-heading-element-name-regexp}; the first such element
 393 is treated as the section's heading.
 394 @end itemize
 395
 396 @noindent
 397 You can customize these variables using @kbd{M-x
 398 customize-variable}.
 399
 400 There are three possible outline states for a section:
 401
 402 @itemize @bullet
 403 @item
 404 normal, showing everything, including its heading, text
 405 content and subsections; each subsection is displayed according to the
 406 state of that subsection;
 407 @item
 408 showing just its heading, with both its text content and
 409 its subsections hidden; all subsections are hidden regardless of their
 410 state;
 411 @item
 412 showing its heading and its subsections, with its text
 413 content hidden; each subsection is displayed according to the state of
 414 that subsection.
 415 @end itemize
 416
 417 In the last two states, where the text content is hidden, the
 418 heading is displayed specially, in an abbreviated form. An element
 419 like this:
 420
 421 @example
 422 <section>
 423 <title>Food</title>
 424 <para>There are many kinds of food.</para>
 425 </section>
 426 @end example
 427
 428 @noindent
 429 would be displayed on a single line like this:
 430
 431 @example
 432 <-section>Food...</>
 433 @end example
 434
 435 @noindent
 436 If there are hidden subsections, then a @code{+} will be used
 437 instead of a @code{-} like this:
 438
 439 @example
 440 <+section>Food...</>
 441 @end example
 442
 443 @noindent
 444 If there are non-hidden subsections, then the section will instead be
 445 displayed like this:
 446
 447 @example
 448 <-section>Food...
 449   <-section>Delicious Food...</>
 450   <-section>Distasteful Food...</>
 451 </-section>
 452 @end example
 453
 454 @noindent
 455 The heading is always displayed with an indent that corresponds to its
 456 depth in the outline, even it is not actually indented in the buffer.
 457 The variable @code{nxml-outline-child-indent} controls how much
 458 a subheading is indented with respect to its parent heading when the
 459 heading is being displayed specially.
 460
 461 Commands to change the outline state of sections are bound to
 462 key sequences that start with @kbd{C-c C-o} (@kbd{o} is
 463 mnemonic for outline).  The third and final key has been chosen to be
 464 consistent with outline mode.  In the following descriptions
 465 current section means the section containing point, or, more precisely,
 466 the innermost section containing the character immediately following
 467 point.
 468
 469 @itemize @bullet
 470 @item
 471 @kbd{C-c C-o C-a} shows all sections in the buffer
 472 normally.
 473 @item
 474 @kbd{C-c C-o C-t} hides the text content
 475 of all sections in the buffer.
 476 @item
 477 @kbd{C-c C-o C-c} hides the text content
 478 of the current section.
 479 @item
 480 @kbd{C-c C-o C-e} shows the text content
 481 of the current section.
 482 @item
 483 @kbd{C-c C-o C-d} hides the text content
 484 and subsections of the current section.
 485 @item
 486 @kbd{C-c C-o C-s} shows the current section
 487 and all its direct and indirect subsections normally.
 488 @item
 489 @kbd{C-c C-o C-k} shows the headings of the
 490 direct and indirect subsections of the current section.
 491 @item
 492 @kbd{C-c C-o C-l} hides the text content of the
 493 current section and of its direct and indirect
 494 subsections.
 495 @item
 496 @kbd{C-c C-o C-i} shows the headings of the
 497 direct subsections of the current section.
 498 @item
 499 @kbd{C-c C-o C-o} hides as much as possible without
 500 hiding the current section's text content; the headings of ancestor
 501 sections of the current section and their child section sections will
 502 not be hidden.
 503 @end itemize
 504
 505 When a heading is displayed specially, you can use
 506 @key{RET} in that heading to show the text content of the section
 507 in the same way as @kbd{C-c C-o C-e}.
 508
 509 You can also use the mouse to change the outline state:
 510 @kbd{S-mouse-2} hides the text content of a section in the same
 511 way as@kbd{C-c C-o C-c}; @kbd{mouse-2} on a specially
 512 displayed heading shows the text content of the section in the same
 513 way as @kbd{C-c C-o C-e}; @kbd{mouse-1} on a specially
 514 displayed start-tag toggles the display of subheadings on and
 515 off.
 516
 517 The outline state for each section is stored with the first
 518 character of the section (as a text property). Every command that
 519 changes the outline state of any section updates the display of the
 520 buffer so that each section is displayed correctly according to its
 521 outline state.  If the section structure is subsequently changed, then
 522 it is possible for the display to no longer correctly reflect the
 523 stored outline state. @kbd{C-c C-o C-r} can be used to refresh
 524 the display so it is correct again.
 525
 526 @node Locating a schema
 527 @chapter Locating a schema
 528
 529 nXML mode has a configurable set of rules to locate a schema for
 530 the file being edited.  The rules are contained in one or more schema
 531 locating files, which are XML documents.
 532
 533 The variable @samp{rng-schema-locating-files} specifies
 534 the list of the file-names of schema locating files that nXML mode
 535 should use.  The order of the list is significant: when file
 536 @var{x} occurs in the list before file @var{y} then rules
 537 from file @var{x} have precedence over rules from file
 538 @var{y}.  A filename specified in
 539 @samp{rng-schema-locating-files} may be relative. If so, it will
 540 be resolved relative to the document for which a schema is being
 541 located. It is not an error if relative file-names in
 542 @samp{rng-schema-locating-files} do not exist. You can use
 543 @kbd{M-x customize-variable @key{RET} rng-schema-locating-files
 544 @key{RET}} to customize the list of schema locating
 545 files.
 546
 547 By default, @samp{rng-schema-locating-files} list has two
 548 members: @samp{schemas.xml}, and
 549 @samp{@var{dist-dir}/schema/schemas.xml} where
 550 @samp{@var{dist-dir}} is the directory containing the nXML
 551 distribution. The first member will cause nXML mode to use a file
 552 @samp{schemas.xml} in the same directory as the document being
 553 edited if such a file exist.  The second member contains rules for the
 554 schemas that are included with the nXML distribution.
 555
 556 @menu
 557 * Commands for locating a schema::
 558 * Schema locating files::
 559 @end menu
 560
 561 @node Commands for locating a schema
 562 @section Commands for locating a schema
 563
 564 The command @kbd{C-c C-s C-w} will tell you what schema
 565 is currently being used.
 566
 567 The rules for locating a schema are applied automatically when
 568 you visit a file in nXML mode. However, if you have just created a new
 569 file and the schema cannot be inferred from the file-name, then this
 570 will not locate the right schema.  In this case, you should insert the
 571 start-tag of the root element and then use the command @kbd{C-c C-s
 572 C-a}, which reapplies the rules based on the current content of
 573 the document.  It is usually not necessary to insert the complete
 574 start-tag; often just @samp{<@var{name}} is
 575 enough.
 576
 577 If you want to use a schema that has not yet been added to the
 578 schema locating files, you can use the command @kbd{C-c C-s C-f}
 579 to manually select the file containing the schema for the document in
 580 current buffer.  Emacs will read the file-name of the schema from the
 581 minibuffer. After reading the file-name, Emacs will ask whether you
 582 wish to add a rule to a schema locating file that persistently
 583 associates the document with the selected schema.  The rule will be
 584 added to the first file in the list specified
 585 @samp{rng-schema-locating-files}; it will create the file if
 586 necessary, but will not create a directory. If the variable
 587 @samp{rng-schema-locating-files} has not been customized, this
 588 means that the rule will be added to the file @samp{schemas.xml}
 589 in the same directory as the document being edited.
 590
 591 The command @kbd{C-c C-s C-t} allows you to select a schema by
 592 specifying an identifier for the type of the document.  The schema
 593 locating files determine the available type identifiers and what
 594 schema is used for each type identifier. This is useful when it is
 595 impossible to infer the right schema from either the file-name or the
 596 content of the document, even though the schema is already in the
 597 schema locating file.  A situation in which this can occur is when
 598 there are multiple variants of a schema where all valid documents have
 599 the same document element.  For example, XHTML has Strict and
 600 Transitional variants.  In a situation like this, a schema locating file
 601 can define a type identifier for each variant. As with @kbd{C-c
 602 C-s C-f}, Emacs will ask whether you wish to add a rule to a schema
 603 locating file that persistently associates the document with the
 604 specified type identifier.
 605
 606 The command @kbd{C-c C-s C-l} adds a rule to a schema
 607 locating file that persistently associates the document with
 608 the schema that is currently being used.
 609
 610 @node Schema locating files
 611 @section Schema locating files
 612
 613 Each schema locating file specifies a list of rules.  The rules
 614 from each file are appended in order. To locate a schema each rule is
 615 applied in turn until a rule matches.  The first matching rule is then
 616 used to determine the schema.
 617
 618 Schema locating files are designed to be useful for other
 619 applications that need to locate a schema for a document. In fact,
 620 there is nothing specific to locating schemas in the design; it could
 621 equally well be used for locating a stylesheet.
 622
 623 @menu
 624 * Schema locating file syntax basics::
 625 * Using the document's URI to locate a schema::
 626 * Using the document element to locate a schema::
 627 * Using type identifiers in schema locating files::
 628 * Using multiple schema locating files::
 629 @end menu
 630
 631 @node Schema locating file syntax basics
 632 @subsection Schema locating file syntax basics
 633
 634 There is a schema for schema locating files in the file
 635 @samp{locate.rnc} in the schema directory.  Schema locating
 636 files must be valid with respect to this schema.
 637
 638 The document element of a schema locating file must be
 639 @samp{locatingRules} and the namespace URI must be
 640 @samp{http://thaiopensource.com/ns/locating-rules/1.0}.  The
 641 children of the document element specify rules. The order of the
 642 children is the same as the order of the rules.  Here's a complete
 643 example of a schema locating file:
 644
 645 @example
 646 <?xml version="1.0"?>
 647 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 648   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 649   <documentElement localName="book" uri="docbook.rnc"/>
 650 </locatingRules>
 651 @end example
 652
 653 @noindent
 654 This says to use the schema @samp{xhtml.rnc} for a document with
 655 namespace @samp{http://www.w3.org/1999/xhtml}, and to use the
 656 schema @samp{docbook.rnc} for a document whose local name is
 657 @samp{book}.  If the document element had both a namespace URI
 658 of @samp{http://www.w3.org/1999/xhtml} and a local name of
 659 @samp{book}, then the matching rule that comes first will be
 660 used and so the schema @samp{xhtml.rnc} would be used.  There is
 661 no precedence between different types of rule; the first matching rule
 662 of any type is used.
 663
 664 As usual with XML-related technologies, resources are identified
 665 by URIs.  The @samp{uri} attribute identifies the schema by
 666 specifying the URI@.  The URI may be relative.  If so, it is resolved
 667 relative to the URI of the schema locating file that contains
 668 attribute. This means that if the value of @samp{uri} attribute
 669 does not contain a @samp{/}, then it will refer to a filename in
 670 the same directory as the schema locating file.
 671
 672 @node Using the document's URI to locate a schema
 673 @subsection Using the document's URI to locate a schema
 674
 675 A @samp{uri} rule locates a schema based on the URI of the
 676 document.  The @samp{uri} attribute specifies the URI of the
 677 schema.  The @samp{resource} attribute can be used to specify
 678 the schema for a particular document.  For example,
 679
 680 @example
 681 <uri resource="spec.xml" uri="docbook.rnc"/>
 682 @end example
 683
 684 @noindent
 685 specifies that the schema for @samp{spec.xml} is
 686 @samp{docbook.rnc}.
 687
 688 The @samp{pattern} attribute can be used instead of the
 689 @samp{resource} attribute to specify the schema for any document
 690 whose URI matches a pattern.  The pattern has the same syntax as an
 691 absolute or relative URI except that the path component of the URI can
 692 use a @samp{*} character to stand for zero or more characters
 693 within a path segment (i.e., any character other @samp{/}).
 694 Typically, the URI pattern looks like a relative URI, but, whereas a
 695 relative URI in the @samp{resource} attribute is resolved into a
 696 particular absolute URI using the base URI of the schema locating
 697 file, a relative URI pattern matches if it matches some number of
 698 complete path segments of the document's URI ending with the last path
 699 segment of the document's URI@. For example,
 700
 701 @example
 702 <uri pattern="*.xsl" uri="xslt.rnc"/>
 703 @end example
 704
 705 @noindent
 706 specifies that the schema for documents with a URI whose path ends
 707 with @samp{.xsl} is @samp{xslt.rnc}.
 708
 709 A @samp{transformURI} rule locates a schema by
 710 transforming the URI of the document. The @samp{fromPattern}
 711 attribute specifies a URI pattern with the same meaning as the
 712 @samp{pattern} attribute of the @samp{uri} element.  The
 713 @samp{toPattern} attribute is a URI pattern that is used to
 714 generate the URI of the schema.  Each @samp{*} in the
 715 @samp{toPattern} is replaced by the string that matched the
 716 corresponding @samp{*} in the @samp{fromPattern}.  The
 717 resulting string is appended to the initial part of the document's URI
 718 that was not explicitly matched by the @samp{fromPattern}.  The
 719 rule matches only if the transformed URI identifies an existing
 720 resource.  For example, the rule
 721
 722 @example
 723 <transformURI fromPattern="*.xml" toPattern="*.rnc"/>
 724 @end example
 725
 726 @noindent
 727 would transform the URI @samp{file:///home/jjc/docs/spec.xml}
 728 into the URI @samp{file:///home/jjc/docs/spec.rnc}.  Thus, this
 729 rule specifies that to locate a schema for a document
 730 @samp{@var{foo}.xml}, Emacs should test whether a file
 731 @samp{@var{foo}.rnc} exists in the same directory as
 732 @samp{@var{foo}.xml}, and, if so, should use it as the
 733 schema.
 734
 735 @node Using the document element to locate a schema
 736 @subsection Using the document element to locate a schema
 737
 738 A @samp{documentElement} rule locates a schema based on
 739 the local name and prefix of the document element. For example, a rule
 740
 741 @example
 742 <documentElement prefix="xsl" localName="stylesheet" uri="xslt.rnc"/>
 743 @end example
 744
 745 @noindent
 746 specifies that when the name of the document element is
 747 @samp{xsl:stylesheet}, then @samp{xslt.rnc} should be used
 748 as the schema. Either the @samp{prefix} or
 749 @samp{localName} attribute may be omitted to allow any prefix or
 750 local name.
 751
 752 A @samp{namespace} rule locates a schema based on the
 753 namespace URI of the document element. For example, a rule
 754
 755 @example
 756 <namespace ns="http://www.w3.org/1999/XSL/Transform" uri="xslt.rnc"/>
 757 @end example
 758
 759 @noindent
 760 specifies that when the namespace URI of the document is
 761 @samp{http://www.w3.org/1999/XSL/Transform}, then
 762 @samp{xslt.rnc} should be used as the schema.
 763
 764 @node Using type identifiers in schema locating files
 765 @subsection Using type identifiers in schema locating files
 766
 767 Type identifiers allow a level of indirection in locating the
 768 schema for a document.  Instead of associating the document directly
 769 with a schema URI, the document is associated with a type identifier,
 770 which is in turn associated with a schema URI@. nXML mode does not
 771 constrain the format of type identifiers.  They can be simply strings
 772 without any formal structure or they can be public identifiers or
 773 URIs.  Note that these type identifiers have nothing to do with the
 774 DOCTYPE declaration.  When comparing type identifiers, whitespace is
 775 normalized in the same way as with the @samp{xsd:token}
 776 datatype: leading and trailing whitespace is stripped; other sequences
 777 of whitespace are normalized to a single space character.
 778
 779 Each of the rules described in previous sections that uses a
 780 @samp{uri} attribute to specify a schema, can instead use a
 781 @samp{typeId} attribute to specify a type identifier.  The type
 782 identifier can be associated with a URI using a @samp{typeId}
 783 element. For example,
 784
 785 @example
 786 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 787   <namespace ns="http://www.w3.org/1999/xhtml" typeId="XHTML"/>
 788   <typeId id="XHTML" typeId="XHTML Strict"/>
 789   <typeId id="XHTML Strict" uri="xhtml-strict.rnc"/>
 790   <typeId id="XHTML Transitional" uri="xhtml-transitional.rnc"/>
 791 </locatingRules>
 792 @end example
 793
 794 @noindent
 795 declares three type identifiers @samp{XHTML} (representing the
 796 default variant of XHTML to be used), @samp{XHTML Strict} and
 797 @samp{XHTML Transitional}.  Such a schema locating file would
 798 use @samp{xhtml-strict.rnc} for a document whose namespace is
 799 @samp{http://www.w3.org/1999/xhtml}.  But it is considerably
 800 more flexible than a schema locating file that simply specified
 801
 802 @example
 803 <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml-strict.rnc"/>
 804 @end example
 805
 806 @noindent
 807 A user can easily use @kbd{C-c C-s C-t} to select between XHTML
 808 Strict and XHTML Transitional. Also, a user can easily add a catalog
 809
 810 @example
 811 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 812   <typeId id="XHTML" typeId="XHTML Transitional"/>
 813 </locatingRules>
 814 @end example
 815
 816 @noindent
 817 that makes the default variant of XHTML be XHTML Transitional.
 818
 819 @node Using multiple schema locating files
 820 @subsection Using multiple schema locating files
 821
 822 The @samp{include} element includes rules from another
 823 schema locating file.  The behavior is exactly as if the rules from
 824 that file were included in place of the @samp{include} element.
 825 Relative URIs are resolved into absolute URIs before the inclusion is
 826 performed. For example,
 827
 828 @example
 829 <include rules="../rules.xml"/>
 830 @end example
 831
 832 @noindent
 833 includes the rules from @samp{rules.xml}.
 834
 835 The process of locating a schema takes as input a list of schema
 836 locating files.  The rules in all these files and in the files they
 837 include are resolved into a single list of rules, which are applied
 838 strictly in order.  Sometimes this order is not what is needed.
 839 For example, suppose you have two schema locating files, a private
 840 file
 841
 842 @example
 843 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 844   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 845 </locatingRules>
 846 @end example
 847
 848 @noindent
 849 followed by a public file
 850
 851 @example
 852 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 853   <transformURI pathSuffix=".xml" replacePathSuffix=".rnc"/>
 854   <namespace ns="http://www.w3.org/1999/XSL/Transform" typeId="XSLT"/>
 855 </locatingRules>
 856 @end example
 857
 858 @noindent
 859 The effect of these two files is that the XHTML @samp{namespace}
 860 rule takes precedence over the @samp{transformURI} rule, which
 861 is almost certainly not what is needed.  This can be solved by adding
 862 an @samp{applyFollowingRules} to the private file.
 863
 864 @example
 865 <locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
 866   <applyFollowingRules ruleType="transformURI"/>
 867   <namespace ns="http://www.w3.org/1999/xhtml" uri="xhtml.rnc"/>
 868 </locatingRules>
 869 @end example
 870
 871 @node DTDs
 872 @chapter DTDs
 873
 874 nXML mode is designed to support the creation of standalone XML
 875 documents that do not depend on a DTD@.  Although it is common practice
 876 to insert a DOCTYPE declaration referencing an external DTD, this has
 877 undesirable side-effects.  It means that the document is no longer
 878 self-contained. It also means that different XML parsers may interpret
 879 the document in different ways, since the XML Recommendation does not
 880 require XML parsers to read the DTD@.  With DTDs, it was impractical to
 881 get validation without using an external DTD or reference to an
 882 parameter entity.  With RELAX NG and other schema languages, you can
 883 simultaneously get the benefits of validation and standalone XML
 884 documents.  Therefore, I recommend that you do not reference an
 885 external DOCTYPE in your XML documents.
 886
 887 One problem is entities for characters. Typically, as well as
 888 providing validation, DTDs also provide a set of character entities
 889 for documents to use. Schemas cannot provide this functionality,
 890 because schema validation happens after XML parsing.  The recommended
 891 solution is to either use the Unicode characters directly, or, if this
 892 is impractical, use character references.  nXML mode supports this by
 893 providing commands for entering characters and character references
 894 using the Unicode names, and can display the glyph corresponding to a
 895 character reference.
 896
 897 @node Limitations
 898 @chapter Limitations
 899
 900 nXML mode has some limitations:
 901
 902 @itemize @bullet
 903 @item
 904 DTD support is limited.  Internal parsed general entities declared
 905 in the internal subset are supported provided they do not contain
 906 elements. Other usage of DTDs is ignored.
 907 @item
 908 The restrictions on RELAX NG schemas in section 7 of the RELAX NG
 909 specification are not enforced.
 910 @end itemize
 911
 912 @node GNU Free Documentation License
 913 @appendix GNU Free Documentation License
 914 @include doclicense.texi
 915
 916 @bye