prest/doc/src/alternatives.rst

   1 ==================================================
   2  A Record of reStructuredText Syntax Alternatives
   3 ==================================================
   4 :Author: David Goodger
   5 :Contact: goodger@users.sourceforge.net
   6 :Revision: $Revision: 684 $
   7 :Date: $Date: 2005-10-06 15:45:31 -0500 (Thu, 06 Oct 2005) $
   8
   9 The following are ideas, alternatives, and justifications that were
  10 considered for reStructuredText syntax, which did not originate with
  11 Setext_ or StructuredText_.  For an analysis of constructs which *did*
  12 originate with StructuredText or Setext, please see `Problems With
  13 StructuredText`_.  See the `reStructuredText Markup Specification`_
  14 for full details of the established syntax.
  15
  16 .. _Setext: http://docutils.sourceforge.net/mirror/setext.html
  17 .. _StructuredText:
  18    http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage
  19 .. _Problems with StructuredText: problems.html
  20 .. _reStructuredText Markup Specification: reStructuredText.html
  21
  22
  23 .. contents::
  24
  25
  26 ... Or Not To Do?
  27 =================
  28
  29 This is the realm of the possible but questionably probable.  These
  30 ideas are kept here as a record of what has been proposed, for
  31 posterity and in case any of them prove to be useful.
  32
  33
  34 Compound Enumerated Lists
  35 -------------------------
  36
  37 Allow for compound enumerators, such as "1.1." or "1.a." or "1(a)", to
  38 allow for nested enumerated lists without indentation?
  39
  40
  41 Sloppy Indentation of List Items
  42 --------------------------------
  43
  44 Perhaps the indentation shouldn't be so strict.  Currently, this is
  45 required::
  46
  47     1. First line,
  48        second line.
  49
  50 Anything wrong with this? ::
  51
  52     1. First line,
  53      second line.
  54
  55 Problem? ::
  56
  57     1. First para.
  58
  59        Block quote.  (no good: requires some indent relative to first
  60        para)
  61
  62      Second Para.
  63
  64     2. Have to carefully define where the literal block ends::
  65
  66          Literal block
  67
  68        Literal block?
  69
  70 Hmm...  Non-strict indentation isn't such a good idea.
  71
  72
  73 Lazy Indentation of List Items
  74 ------------------------------
  75
  76 Another approach: Going back to the first draft of reStructuredText
  77 (2000-11-27 post to Doc-SIG)::
  78
  79     - This is the fourth item of the main list (no blank line above).
  80     The second line of this item is not indented relative to the
  81     bullet, which precludes it from having a second paragraph.
  82
  83 Change that to *require* a blank line above and below, to reduce
  84 ambiguity.  This "loosening" may be added later, once the parser's
  85 been nailed down.  However, a serious drawback of this approach is to
  86 limit the content of each list item to a single paragraph.
  87
  88
  89 David's Idea for Lazy Indentation
  90 `````````````````````````````````
  91
  92 Consider a paragraph in a word processor.  It is a single logical line
  93 of text which ends with a newline, soft-wrapped arbitrarily at the
  94 right edge of the page or screen.  We can think of a plaintext
  95 paragraph in the same way, as a single logical line of text, ending
  96 with two newlines (a blank line) instead of one, and which may contain
  97 arbitrary line breaks (newlines) where it was accidentally
  98 hard-wrapped by an application.  We can compensate for the accidental
  99 hard-wrapping by "unwrapping" every unindented second and subsequent
 100 line.  The indentation of the first line of a paragraph or list item
 101 would determine the indentation for the entire element.  Blank lines
 102 would be required between list items when using lazy indentation.
 103
 104 The following example shows the lazy indentation of multiple body
 105 elements::
 106
 107     - This is the first paragraph
 108     of the first list item.
 109
 110       Here is the second paragraph
 111     of the first list item.
 112
 113     - This is the first paragraph
 114     of the second list item.
 115
 116       Here is the second paragraph
 117     of the second list item.
 118
 119 A more complex example shows the limitations of lazy indentation::
 120
 121     - This is the first paragraph
 122     of the first list item.
 123
 124       Next is a definition list item:
 125
 126       Term
 127           Definition.  The indentation of the term is
 128     required, as is the indentation of the definition's
 129     first line.
 130
 131           When the definition extends to more than
 132     one line, lazy indentation may occur.  (This is the second
 133     paragraph of the definition.)
 134
 135     - This is the first paragraph
 136     of the second list item.
 137
 138       - Here is the first paragraph of
 139     the first item of a nested list.
 140
 141       So this paragraph would be outside of the nested list,
 142     but inside the second list item of the outer list.
 143
 144     But this paragraph is not part of the list at all.
 145
 146 And the ambiguity remains::
 147
 148     - Look at the hyphen at the beginning of the next line
 149     - is it a second list item marker, or a dash in the text?
 150
 151     Similarly, we may want to refer to numbers inside enumerated
 152     lists:
 153
 154     1. How many socks in a pair? There are
 155     2. How many pants in a pair? Exactly
 156     1. Go figure.
 157
 158 Literal blocks and block quotes would still require consistent
 159 indentation for all their lines.  For block quotes, we might be able
 160 to get away with only requiring that the first line of each contained
 161 element be indented.  For example::
 162
 163     Here's a paragraph.
 164
 165         This is a paragraph inside a block quote.
 166     Second and subsequent lines need not be indented at all.
 167
 168         - A bullet list inside
 169     the block quote.
 170
 171           Second paragraph of the
 172     bullet list inside the block quote.
 173
 174 Although feasible, this form of lazy indentation has problems.  The
 175 document structure and hierarchy is not obvious from the indentation,
 176 making the source plaintext difficult to read.  This will also make
 177 keeping track of the indentation while writing difficult and
 178 error-prone.  However, these problems may be acceptable for Wikis and
 179 email mode, where we may be able to rely on less complex structure
 180 (few nested lists, for example).
 181
 182
 183 Field Lists
 184 ===========
 185
 186 Prior to the syntax for field lists being finalized, several
 187 alternatives were proposed.
 188
 189 1. Unadorned RFC822_ everywhere::
 190
 191        Author: Me
 192        Version: 1
 193
 194    Advantages: clean, precedent (RFC822-compliant).  Disadvantage:
 195    ambiguous (these paragraphs are a prime example).
 196
 197    Conclusion: rejected.
 198
 199 2. Special case: use unadorned RFC822_ for the very first or very last
 200    text block of a document::
 201
 202        """
 203        Author: Me
 204        Version: 1
 205
 206        The rest of the document...
 207        """
 208
 209    Advantages: clean, precedent (RFC822-compliant).  Disadvantages:
 210    special case, flat (unnested) field lists only, still ambiguous::
 211
 212        """
 213        Usage: cmdname [options] arg1 arg2 ...
 214
 215        We obviously *don't* want the like above to be interpreted as a
 216        field list item.  Or do we?
 217        """
 218
 219    Conclusion: rejected for the general case, accepted for specific
 220    contexts (PEPs, email).
 221
 222 3. Use a directive::
 223
 224        .. fields::
 225
 226           Author: Me
 227           Version: 1
 228
 229    Advantages: explicit and unambiguous, RFC822-compliant.
 230    Disadvantage: cumbersome.
 231
 232    Conclusion: rejected for the general case (but such a directive
 233    could certainly be written).
 234
 235 4. Use Javadoc-style::
 236
 237        @Author: Me
 238        @Version: 1
 239        @param a: integer
 240
 241    Advantages: unambiguous, precedent, flexible.  Disadvantages:
 242    non-intuitive, ugly, not RFC822-compliant.
 243
 244    Conclusion: rejected.
 245
 246 5. Use leading colons::
 247
 248        :Author: Me
 249        :Version: 1
 250
 251    Advantages: unambiguous, obvious (*almost* RFC822-compliant),
 252    flexible, perhaps even elegant.  Disadvantages: no precedent, not
 253    quite RFC822-compliant.
 254
 255    Conclusion: accepted!
 256
 257 6. Use double colons::
 258
 259        Author:: Me
 260        Version:: 1
 261
 262    Advantages: unambiguous, obvious? (*almost* RFC822-compliant),
 263    flexible, similar to syntax already used for literal blocks and
 264    directives.  Disadvantages: no precedent, not quite
 265    RFC822-compliant, similar to syntax already used for literal blocks
 266    and directives.
 267
 268    Conclusion: rejected because of the syntax similarity & conflicts.
 269
 270 Why is RFC822 compliance important?  It's a universal Internet
 271 standard, and super obvious.  Also, I'd like to support the PEP format
 272 (ulterior motive: get PEPs to use reStructuredText as their standard).
 273 But it *would* be easy to get used to an alternative (easy even to
 274 convert PEPs; probably harder to convert python-deviants ;-).
 275
 276 Unfortunately, without well-defined context (such as in email headers:
 277 RFC822 only applies before any blank lines), the RFC822 format is
 278 ambiguous.  It is very common in ordinary text.  To implement field
 279 lists unambiguously, we need explicit syntax.
 280
 281 The following question was posed in a footnote:
 282
 283    Should "bibliographic field lists" be defined at the parser level,
 284    or at the DPS transformation level?  In other words, are they
 285    reStructuredText-specific, or would they also be applicable to
 286    another (many/every other?) syntax?
 287
 288 The answer is that bibliographic fields are a
 289 reStructuredText-specific markup convention.  Other syntaxes may
 290 implement the bibliographic elements explicitly.  For example, there
 291 would be no need for such a transformation for an XML-based markup
 292 syntax.
 293
 294 .. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt
 295
 296
 297 Interpreted Text "Roles"
 298 ========================
 299
 300 The original purpose of interpreted text was as a mechanism for
 301 descriptive markup, to describe the nature or role of a word or
 302 phrase.  For example, in XML we could say "<function>len</function>"
 303 to mark up "len" as a function.  It is envisaged that within Python
 304 docstrings (inline documentation in Python module source files, the
 305 primary market for reStructuredText) the role of a piece of
 306 interpreted text can be inferred implicitly from the context of the
 307 docstring within the program source.  For other applications, however,
 308 the role may have to be indicated explicitly.
 309
 310 Interpreted text is enclosed in single backquotes (`).
 311
 312 1. Initially, it was proposed that an explicit role could be indicated
 313    as a word or phrase within the enclosing backquotes:
 314
 315    - As a prefix, separated by a colon and whitespace::
 316
 317          `role: interpreted text`
 318
 319    - As a suffix, separated by whitespace and a colon::
 320
 321          `interpreted text :role`
 322
 323    There are problems with the initial approach:
 324
 325    - There could be ambiguity with interpreted text containing colons.
 326      For example, an index entry of "Mission: Impossible" would
 327      require a backslash-escaped colon.
 328
 329    - The explicit role is descriptive markup, not content, and will
 330      not be visible in the processed output.  Putting it inside the
 331      backquotes doesn't feel right; the *role* isn't being quoted.
 332
 333 2. Tony Ibbs suggested that the role be placed outside the
 334    backquotes::
 335
 336        role:`prefix` or `suffix`:role
 337
 338    This removes the embedded-colons ambiguity, but limits the role
 339    identifier to be a single word (whitespace would be illegal).
 340    Since roles are not meant to be visible after processing, the lack
 341    of whitespace support is not important.
 342
 343    The suggested syntax remains ambiguous with respect to ratios and
 344    some writing styles.  For example, suppose there is a "signal"
 345    identifier, and we write::
 346
 347        ...calculate the `signal`:noise ratio.
 348
 349    "noise" looks like a role.
 350
 351 3. As an improvement on #2, we can bracket the role with colons::
 352
 353        :role:`prefix` or `suffix`:role:
 354
 355    This syntax is similar to that of field lists, which is fine since
 356    both are doing similar things: describing.
 357
 358    This is the syntax chosen for reStructuredText.
 359
 360 4. Another alternative is two colons instead of one::
 361
 362        role::`prefix` or `suffix`::role
 363
 364    But this is used for analogies ("A:B::C:D": "A is to B as C is to
 365    D").
 366
 367    Both alternative #2 and #4 lack delimiters on both sides of the
 368    role, making it difficult to parse (by the reader).
 369
 370 5. Some kind of bracketing could be used:
 371
 372    - Parentheses::
 373
 374          (role)`prefix` or `suffix`(role)
 375
 376    - Braces::
 377
 378          {role}`prefix` or `suffix`{role}
 379
 380    - Square brackets::
 381
 382          [role]`prefix` or `suffix`[role]
 383
 384    - Angle brackets::
 385
 386          <role>`prefix` or `suffix`<role>
 387
 388      (The overlap of \*ML tags with angle brackets would be too
 389      confusing and precludes their use.)
 390
 391 Syntax #3 was chosen for reStructuredText.
 392
 393
 394 Comments
 395 ========
 396
 397 A problem with comments (actually, with all indented constructs) is
 398 that they cannot be followed by an indented block -- a block quote --
 399 without swallowing it up.
 400
 401 I thought that perhaps comments should be one-liners only.  But would
 402 this mean that footnotes, hyperlink targets, and directives must then
 403 also be one-liners?  Not a good solution.
 404
 405 Tony Ibbs suggested a "comment" directive.  I added that we could
 406 limit a comment to a single text block, and that a "multi-block
 407 comment" could use "comment-start" and "comment-end" directives.  This
 408 would remove the indentation incompatibility.  A "comment" directive
 409 automatically suggests "footnote" and (hyperlink) "target" directives
 410 as well.  This could go on forever!  Bad choice.
 411
 412 Garth Kidd suggested that an "empty comment", a ".." explicit markup
 413 start with nothing on the first line (except possibly whitespace) and
 414 a blank line immediately following, could serve as an "unindent".  An
 415 empty comment does **not** swallow up indented blocks following it,
 416 so block quotes are safe.  "A tiny but practical wart."  Accepted.
 417
 418
 419 Anonymous Hyperlinks
 420 ====================
 421
 422 Alan Jaffray came up with this idea, along with the following syntax::
 423
 424     Search the `Python DOC-SIG mailing list archives`{}_.
 425
 426     .. _: http://mail.python.org/pipermail/doc-sig/
 427
 428 The idea is sound and useful.  I suggested a "double underscore"
 429 syntax::
 430
 431     Search the `Python DOC-SIG mailing list archives`__.
 432
 433     .. __: http://mail.python.org/pipermail/doc-sig/
 434
 435 But perhaps single underscores are okay?  The syntax looks better, but
 436 the hyperlink itself doesn't explicitly say "anonymous"::
 437
 438     Search the `Python DOC-SIG mailing list archives`_.
 439
 440     .. _: http://mail.python.org/pipermail/doc-sig/
 441
 442 Mixing anonymous and named hyperlinks becomes confusing.  The order of
 443 targets is not significant for named hyperlinks, but it is for
 444 anonymous hyperlinks::
 445
 446     Hyperlinks: anonymous_, named_, and another anonymous_.
 447
 448     .. _named: named
 449     .. _: anonymous1
 450     .. _: anonymous2
 451
 452 Without the extra syntax of double underscores, determining which
 453 hyperlink references are anonymous may be difficult.  We'd have to
 454 check which references don't have corresponding targets, and match
 455 those up with anonymous targets.  Keeping to a simple consistent
 456 ordering (as with auto-numbered footnotes) seems simplest.
 457
 458 reStructuredText will use the explicit double-underscore syntax for
 459 anonymous hyperlinks.  An alternative (see `Reworking Explicit
 460 Markup`_ below) for the somewhat awkward ".. __:" syntax is "__"::
 461
 462     An anonymous__ reference.
 463
 464     __ http://anonymous
 465
 466
 467 Reworking Explicit Markup
 468 =========================
 469
 470 Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added
 471 to reStructuredText.  Subsequently it was asserted that hyperlinks
 472 (especially anonymous hyperlinks) would play an increasingly important
 473 role in reStructuredText documents, and therefore they require a
 474 simpler and more concise syntax.  This prompted a review of the
 475 current and proposed explicit markup syntaxes with regards to
 476 improving usability.
 477
 478 1. Original syntax::
 479
 480        .. _blah:                     internal hyperlink target
 481        .. _blah: http://somewhere    external hyperlink target
 482        .. _blah: blahblah_           indirect hyperlink target
 483        .. __:                        anonymous internal target
 484        .. __: http://somewhere       anonymous external target
 485        .. __: blahblah_              anonymous indirect target
 486        .. [blah] http://somewhere    footnote
 487        .. blah:: http://somewhere    directive
 488        .. blah: http://somewhere     comment
 489
 490    .. Note::
 491
 492       The comment text was intentionally made to look like a hyperlink
 493       target.
 494
 495    Origins:
 496
 497    * Except for the colon (a delimiter necessary to allow for
 498      phrase-links), hyperlink target ``.. _blah:`` comes from Setext.
 499    * Comment syntax from Setext.
 500    * Footnote syntax from StructuredText ("named links").
 501    * Directives and anonymous hyperlinks original to reStructuredText.
 502
 503    Advantages:
 504
 505    + Consistent explicit markup indicator: "..".
 506    + Consistent hyperlink syntax: ".. _" & ":".
 507
 508    Disadvantages:
 509
 510    - Anonymous target markup is awkward: ".. __:".
 511    - The explicit markup indicator ("..") is excessively overloaded?
 512    - Comment text is limited (can't look like a footnote, hyperlink,
 513      or directive).  But this is probably not important.
 514
 515 2. Alan Jaffray's proposed syntax #1::
 516
 517        __ _blah                      internal hyperlink target
 518        __ blah: http://somewhere     external hyperlink target
 519        __ blah: blahblah_            indirect hyperlink target
 520        __                            anonymous internal target
 521        __ http://somewhere           anonymous external target
 522        __ blahblah_                  anonymous indirect target
 523        __ [blah] http://somewhere    footnote
 524        .. blah:: http://somewhere    directive
 525        .. blah: http://somewhere     comment
 526
 527    The hyperlink-connoted underscores have become first-level syntax.
 528
 529    Advantages:
 530
 531    + Anonymous targets are simpler.
 532    + All hyperlink targets are one character shorter.
 533
 534    Disadvantages:
 535
 536    - Inconsistent internal hyperlink targets.  Unlike all other named
 537      hyperlink targets, there's no colon.  There's an extra leading
 538      underscore, but we can't drop it because without it, "blah" looks
 539      like a relative URI.  Unless we restore the colon::
 540
 541          __ blah:                      internal hyperlink target
 542
 543    - Obtrusive markup?
 544
 545 3. Alan Jaffray's proposed syntax #2::
 546
 547        .. _blah                      internal hyperlink target
 548        .. blah: http://somewhere     external hyperlink target
 549        .. blah: blahblah_            indirect hyperlink target
 550        ..                            anonymous internal target
 551        .. http://somewhere           anonymous external target
 552        .. blahblah_                  anonymous indirect target
 553        .. [blah] http://somewhere    footnote
 554        !! blah: http://somewhere     directive
 555        ## blah: http://somewhere     comment
 556
 557    Leading underscores have been (almost) replaced by "..", while
 558    comments and directives have gained their own syntax.
 559
 560    Advantages:
 561
 562    + Anonymous hyperlinks are simpler.
 563    + Unique syntax for comments.  Connotation of "comment" from
 564      some programming languages (including our favorite).
 565    + Unique syntax for directives.  Connotation of "action!".
 566
 567    Disadvantages:
 568
 569    - Inconsistent internal hyperlink targets.  Again, unlike all other
 570      named hyperlink targets, there's no colon.  There's a leading
 571      underscore, matching the trailing underscores of references,
 572      which no other hyperlink targets have.  We can't drop that one
 573      leading underscore though: without it, "blah" looks like a
 574      relative URI.  Again, unless we restore the colon::
 575
 576          .. blah:                      internal hyperlink target
 577
 578    - All (except for internal) hyperlink targets lack their leading
 579      underscores, losing the "hyperlink" connotation.
 580
 581    - Obtrusive syntax for comments.  Alternatives::
 582
 583          ;; blah: http://somewhere
 584             (also comment syntax in Lisp & others)
 585          ,, blah: http://somewhere
 586             ("comma comma": sounds like "comment"!)
 587
 588    - Iffy syntax for directives.  Alternatives?
 589
 590 4. Tony Ibbs' proposed syntax::
 591
 592        .. _blah:                     internal hyperlink target
 593        .. _blah: http://somewhere    external hyperlink target
 594        .. _blah: blahblah_           indirect hyperlink target
 595        ..                            anonymous internal target
 596        .. http://somewhere           anonymous external target
 597        .. blahblah_                  anonymous indirect target
 598        .. [blah] http://somewhere    footnote
 599        .. blah:: http://somewhere    directive
 600        .. blah: http://somewhere     comment
 601
 602    This is the same as the current syntax, except for anonymous
 603    targets which drop their "__: ".
 604
 605    Advantage:
 606
 607    + Anonymous targets are simpler.
 608
 609    Disadvantages:
 610
 611    - Anonymous targets lack their leading underscores, losing the
 612      "hyperlink" connotation.
 613    - Anonymous targets are almost indistinguishable from comments.
 614      (Better to know "up front".)
 615
 616 5. David Goodger's proposed syntax: Perhaps going back to one of
 617    Alan's earlier suggestions might be the best solution.  How about
 618    simply adding "__ " as a synonym for ".. __: " in the original
 619    syntax?  These would become equivalent::
 620
 621        .. __:                        anonymous internal target
 622        .. __: http://somewhere       anonymous external target
 623        .. __: blahblah_              anonymous indirect target
 624
 625        __                            anonymous internal target
 626        __ http://somewhere           anonymous external target
 627        __ blahblah_                  anonymous indirect target
 628
 629 Alternative 5 has been adopted.
 630
 631
 632 Backquotes in Phrase-Links
 633 ==========================
 634
 635 [From a 2001-06-05 Doc-SIG post in reply to questions from Doug
 636 Hellmann.]
 637
 638 The first draft of the spec, posted to the Doc-SIG in November 2000,
 639 used square brackets for phrase-links.  I changed my mind because:
 640
 641 1. In the first draft, I had already decided on single-backquotes for
 642    inline literal text.
 643
 644 2. However, I wanted to minimize the necessity for backslash escapes,
 645    for example when quoting Python repr-equivalent syntax that uses
 646    backquotes.
 647
 648 3. The processing of identifiers (function/method/attribute/module
 649    etc. names) into hyperlinks is a useful feature.  PyDoc recognizes
 650    identifiers heuristically, but it doesn't take much imagination to
 651    come up with counter-examples where PyDoc's heuristics would result
 652    in embarassing failure.  I wanted to do it deterministically, and
 653    that called for syntax.  I called this construct "interpreted
 654    text".
 655
 656 4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the
 657    idea of using double-backquotes as syntax.
 658
 659 5. I worked out some rules for inline markup recognition.
 660
 661 6. In combination with #5, double backquotes lent themselves to inline
 662    literals, neatly satisfying #2, minimizing backslash escapes.  In
 663    fact, the spec says that no interpretation of any kind is done
 664    within double-backquote inline literal text; backslashes do *no*
 665    escaping within literal text.
 666
 667 7. Single backquotes are then freed up for interpreted text.
 668
 669 8. I already had square brackets required for footnote references.
 670
 671 9. Since interpreted text will typically turn into hyperlinks, it was
 672    a natural fit to use backquotes as the phrase-quoting syntax for
 673    trailing-underscore hyperlinks.
 674
 675 The original inspiration for the trailing underscore hyperlink syntax
 676 was Setext.  But for phrases Setext used a very cumbersome
 677 ``underscores_between_words_like_this_`` syntax.
 678
 679 The underscores can be viewed as if they were right-pointing arrows:
 680 ``-->``.  So ``hyperlink_`` points away from the reference, and
 681 ``.. _hyperlink:`` points toward the target.
 682
 683
 684 Substitution Mechanism
 685 ======================
 686
 687 Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by
 688 Alan Jaffray, "reStructuredText inline markup".  It reminded me of a
 689 missing piece of the reStructuredText puzzle, first referred to in my
 690 contribution to "Documentation markup & processing / PEPs" (Doc-SIG
 691 2001-06-21).
 692
 693 Substitutions allow the power and flexibility of directives to be
 694 shared by inline text.  They are a way to allow arbitrarily complex
 695 inline objects, while keeping the details out of the flow of text.
 696 They are the equivalent of SGML/XML's named entities.  For example, an
 697 inline image (using reference syntax alternative 4d (vertical bars)
 698 and definition alternative 3, the alternatives chosen for inclusion in
 699 the spec)::
 700
 701     The |biohazard| symbol must be used on containers used to dispose
 702     of medical waste.
 703
 704     .. |biohazard| image:: biohazard.png
 705        [height=20 width=20]
 706
 707 The ``|biohazard|`` substitution reference will be replaced in-line by
 708 whatever the ``.. |biohazard|`` substitution definition generates (in
 709 this case, an image).  A substitution definition contains the
 710 substitution text bracketed with vertical bars, followed by a an
 711 embedded inline-compatible directive, such as "image".  A transform is
 712 required to complete the substitution.
 713
 714 Syntax alternatives for the reference:
 715
 716 1. Use the existing interpreted text syntax, with a predefined role
 717    such as "sub"::
 718
 719        The `biohazard`:sub: symbol...
 720
 721    Advantages: existing syntax, explicit.  Disadvantages: verbose,
 722    obtrusive.
 723
 724 2. Use a variant of the interpreted text syntax, with a new suffix
 725    akin to the underscore in phrase-link references::
 726
 727        (a) `name`@
 728        (b) `name`#
 729        (c) `name`&
 730        (d) `name`/
 731        (e) `name`<
 732        (f) `name`::
 733        (g) `name`:
 734
 735
 736    Due to incompatibility with other constructs and ordinary text
 737    usage, (f) and (g) are not possible.
 738
 739 3. Use interpreted text syntax with a fixed internal format::
 740
 741        (a) `:name:`
 742        (b) `name:`
 743        (c) `name::`
 744        (d) `::name::`
 745        (e) `%name%`
 746        (f) `#name#`
 747        (g) `/name/`
 748        (h) `&name&`
 749        (i) `|name|`
 750        (j) `[name]`
 751        (k) `<name>`
 752        (l) `&name;`
 753        (m) `'name'`
 754
 755    To avoid ML confusion (k) and (l) are definitely out.  Square
 756    brackets (j) won't work in the target (the substitution definition
 757    would be indistinguishable from a footnote).
 758
 759    The ```/name/``` syntax (g) is reminiscent of "s/find/sub"
 760    substitution syntax in ed-like languages.  However, it may have a
 761    misleading association with regexps, and looks like an absolute
 762    POSIX path.  (i) is visually equivalent and lacking the
 763    connotations.
 764
 765    A disadvantage of all of these is that they limit interpreted text,
 766    albeit only slightly.
 767
 768 4. Use specialized syntax, something new::
 769
 770        (a) #name#
 771        (b) @name@
 772        (c) /name/
 773        (d) |name|
 774        (e) <<name>>
 775        (f) //name//
 776        (g) ||name||
 777        (h) ^name^
 778        (i) [[name]]
 779        (j) ~name~
 780        (k) !name!
 781        (l) =name=
 782        (m) ?name?
 783        (n) >name<
 784
 785    "#" (a) and "@" (b) are obtrusive.  "/" (c) without backquotes
 786    looks just like a POSIX path; it is likely for such usage to appear
 787    in text.
 788
 789    "|" (d) and "^" (h) are feasible.
 790
 791 5. Redefine the trailing underscore syntax.  See definition syntax
 792    alternative 4, below.
 793
 794 Syntax alternatives for the definition:
 795
 796 1. Use the existing directive syntax, with a predefined directive such
 797    as "sub".  It contains a further embedded directive resolving to an
 798    inline-compatible object::
 799
 800        .. sub:: biohazard
 801           .. image:: biohazard.png
 802              [height=20 width=20]
 803
 804        .. sub:: parrot
 805           That bird wouldn't *voom* if you put 10,000,000 volts
 806           through it!
 807
 808    The advantages and disadvantages are the same as in inline
 809    alternative 1.
 810
 811 2. Use syntax as in #1, but with an embedded directivecompressed::
 812
 813        .. sub:: biohazard image:: biohazard.png
 814           [height=20 width=20]
 815
 816    This is a bit better than alternative 1, but still too much.
 817
 818 3. Use a variant of directive syntax, incorporating the substitution
 819    text, obviating the need for a special "sub" directive name.  If we
 820    assume reference alternative 4d (vertical bars), the matching
 821    definition would look like this::
 822
 823        .. |biohazard| image:: biohazard.png
 824           [height=20 width=20]
 825
 826 4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.)
 827
 828    Instead of adding new syntax, redefine the trailing underscore
 829    syntax to mean "substitution reference" instead of "hyperlink
 830    reference".  Alan's example::
 831
 832        I had lunch with Jonathan_ today.  We talked about Zope_.
 833
 834        .. _Jonathan: lj [user=jhl]
 835        .. _Zope: http://www.zope.org/
 836
 837    A problem with the proposed syntax is that URIs which look like
 838    simple reference names (alphanum plus ".", "-", "_") would be
 839    indistinguishable from substitution directive names.  A more
 840    consistent syntax would be::
 841
 842        I had lunch with Jonathan_ today.  We talked about Zope_.
 843
 844        .. _Jonathan: lj:: user=jhl
 845        .. _Zope: http://www.zope.org/
 846
 847    (``::`` after ``.. _Jonathan: lj``.)
 848
 849    The "Zope" target is a simple external hyperlink, but the
 850    "Jonathan" target contains a directive.  Alan proposed is that the
 851    reference text be replaced by whatever the referenced directive
 852    (the "directive target") produces.  A directive reference becomes a
 853    hyperlink reference if the contents of the directive target resolve
 854    to a hyperlink.  If the directive target resolves to an icon, the
 855    reference is replaced by an inline icon.  If the directive target
 856    resolves to a hyperlink, the directive reference becomes a
 857    hyperlink reference.
 858
 859    This seems too indirect and complicated for easy comprehension.
 860
 861    The reference in the text will sometimes become a link, sometimes
 862    not.  Sometimes the reference text will remain, sometimes not.  We
 863    don't know *at the reference*::
 864
 865        This is a `hyperlink reference`_; its text will remain.
 866        This is an `inline icon`_; its text will disappear.
 867
 868    That's a problem.
 869
 870 The syntax that has been incorporated into the spec and parser is
 871 reference alternative 4d with definition alternative 3::
 872
 873     The |biohazard| symbol...
 874
 875     .. |biohazard| image:: biohazard.png
 876        [height=20 width=20]
 877
 878 We can also combine substitution references with hyperlink references,
 879 by appending a "_" (named hyperlink reference) or "__" (anonymous
 880 hyperlink reference) suffix to the substitution reference.  This
 881 allows us to click on an image-link::
 882
 883     The |biohazard|_ symbol...
 884
 885     .. |biohazard| image:: biohazard.png
 886        [height=20 width=20]
 887     .. _biohazard: http://www.cdc.gov/
 888
 889 There have been several suggestions for the naming of these
 890 constructs, originally called "substitution references" and
 891 "substitutions".
 892
 893 1. Candidate names for the reference construct:
 894
 895    (a) substitution reference
 896    (b) tagging reference
 897    (c) inline directive reference
 898    (d) directive reference
 899    (e) indirect inline directive reference
 900    (f) inline directive placeholder
 901    (g) inline directive insertion reference
 902    (h) directive insertion reference
 903    (i) insertion reference
 904    (j) directive macro reference
 905    (k) macro reference
 906    (l) substitution directive reference
 907
 908 2. Candidate names for the definition construct:
 909
 910    (a) substitution
 911    (b) substitution directive
 912    (c) tag
 913    (d) tagged directive
 914    (e) directive target
 915    (f) inline directive
 916    (g) inline directive definition
 917    (h) referenced directive
 918    (i) indirect directive
 919    (j) indirect directive definition
 920    (k) directive definition
 921    (l) indirect inline directive
 922    (m) named directive definition
 923    (n) inline directive insertion definition
 924    (o) directive insertion definition
 925    (p) insertion definition
 926    (q) insertion directive
 927    (r) substitution definition
 928    (s) directive macro definition
 929    (t) macro definition
 930    (u) substitution directive definition
 931    (v) substitution definition
 932
 933 "Inline directive reference" (1c) seems to be an appropriate term at
 934 first, but the term "inline" is redundant in the case of the
 935 reference.  Its counterpart "inline directive definition" (2g) is
 936 awkward, because the directive definition itself is not inline.
 937
 938 "Directive reference" (1d) and "directive definition" (2k) are too
 939 vague.  "Directive definition" could be used to refer to any
 940 directive, not just those used for inline substitutions.
 941
 942 One meaning of the term "macro" (1k, 2s, 2t) is too
 943 programming-language-specific.  Also, macros are typically simple text
 944 substitution mechanisms: the text is substituted first and evaluated
 945 later.  reStructuredText substitution definitions are evaluated in
 946 place at parse time and substituted afterwards.
 947
 948 "Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that
 949 something new is getting added rather than one construct being
 950 replaced by another.
 951
 952 Which brings us back to "substitution".  The overall best names are
 953 "substitution reference" (1a) and "substitution definition" (2v).  A
 954 long way to go to add one word!
 955
 956
 957 Reworking Footnotes
 958 ===================
 959
 960 As a further wrinkle (see `Reworking Explicit Markup`_ above), in the
 961 wee hours of 2002-02-28 I posted several ideas for changes to footnote
 962 syntax:
 963
 964     - Change footnote syntax from ``.. [1]`` to ``_[1]``? ...
 965     - Differentiate (with new DTD elements) author-date "citations"
 966       (``[GVR2002]``) from numbered footnotes? ...
 967     - Render footnote references as superscripts without "[]"? ...
 968
 969 These ideas are all related, and suggest changes in the
 970 reStructuredText syntax as well as the docutils tree model.
 971
 972 The footnote has been used for both true footnotes (asides expanding
 973 on points or defining terms) and for citations (references to external
 974 works).  Rather than dealing with one amalgam construct, we could
 975 separate the current footnote concept into strict footnotes and
 976 citations.  Citations could be interpreted and treated differently
 977 from footnotes.  Footnotes would be limited to numerical labels:
 978 manual ("1") and auto-numbered (anonymous "#", named "#label").
 979
 980 The footnote is the only explicit markup construct (starts with ".. ")
 981 that directly translates to a visible body element.  I've always been
 982 a little bit uncomfortable with the ".. " marker for footnotes because
 983 of this; ".. " has a connotation of "special", but footnotes aren't
 984 especially "special".  Printed texts often put footnotes at the bottom
 985 of the page where the reference occurs (thus "foot note").  Some HTML
 986 designs would leave footnotes to be rendered the same positions where
 987 they're defined.  Other online and printed designs will gather
 988 footnotes into a section near the end of the document, converting them
 989 to "endnotes" (perhaps using a directive in our case); but this
 990 "special processing" is not an intrinsic property of the footnote
 991 itself, but a decision made by the document author or processing
 992 system.
 993
 994 Citations are almost invariably collected in a section at the end of a
 995 document or section.  Citations "disappear" from where they are
 996 defined and are magically reinserted at some well-defined point.
 997 There's more of a connection to the "special" connotation of the ".. "
 998 syntax.  The point at which the list of citations is inserted could be
 999 defined manually by a directive (e.g., ".. citations::"), and/or have
1000 default behavior (e.g., a section automatically inserted at the end of
1001 the document) that might be influenced by options to the Writer.
1002
1003 Syntax proposals:
1004
1005 + Footnotes:
1006
1007   - Current syntax::
1008
1009         .. [1] Footnote 1
1010         .. [#] Auto-numbered footnote.
1011         .. [#label] Auto-labeled footnote.
1012
1013   - The syntax proposed in the original 2002-02-28 Doc-SIG post:
1014     remove the ".. ", prefix a "_"::
1015
1016         _[1] Footnote 1
1017         _[#] Auto-numbered footnote.
1018         _[#label] Auto-labeled footnote.
1019
1020     The leading underscore syntax (earlier dropped because
1021     ``.. _[1]:`` was too verbose) is a useful reminder that footnotes
1022     are hyperlink targets.
1023
1024   - Minimal syntax: remove the ".. [" and "]", prefix a "_", and
1025     suffix a "."::
1026
1027         _1. Footnote 1.
1028         _#. Auto-numbered footnote.
1029         _#label. Auto-labeled footnote.
1030
1031                  ``_1.``, ``_#.``, and ``_#label.`` are markers,
1032                  like list markers.
1033
1034     Footnotes could be rendered something like this in HTML
1035
1036         | 1. This is a footnote.  The brackets could be dropped
1037         |    from the label, and a vertical bar could set them
1038         |    off from the rest of the document in the HTML.
1039
1040     Two-way hyperlinks on the footnote marker ("1." above) would also
1041     help to differentiate footnotes from enumerated lists.
1042
1043     If converted to endnotes (by a directive/transform), a horizontal
1044     half-line might be used instead.  Page-oriented output formats
1045     would typically use the horizontal line for true footnotes.
1046
1047 + Footnote references:
1048
1049   - Current syntax::
1050
1051         [1]_, [#]_, [#label]_
1052
1053   - Minimal syntax to match the minimal footnote syntax above::
1054
1055         1_, #_, #label_
1056
1057     As a consequence, pure-numeric hyperlink references would not be
1058     possible; they'd be interpreted as footnote references.
1059
1060 + Citation references: no change is proposed from the current footnote
1061   reference syntax::
1062
1063       [GVR2001]_
1064
1065 + Citations:
1066
1067   - Current syntax (footnote syntax)::
1068
1069         .. [GVR2001] Python Documentation; van Rossum, Drake, et al.;
1070            http://www.python.org/doc/
1071
1072   - Possible new syntax::
1073
1074         _[GVR2001] Python Documentation; van Rossum, Drake, et al.;
1075                    http://www.python.org/doc/
1076
1077         _[DJG2002]
1078             Docutils: Python Documentation Utilities project; Goodger
1079             et al.; http://docutils.sourceforge.net/
1080
1081     Without the ".. " marker, subsequent lines would either have to
1082     align as in one of the above, or we'd have to allow loose
1083     alignment (I'd rather not)::
1084
1085         _[GVR2001] Python Documentation; van Rossum, Drake, et al.;
1086             http://www.python.org/doc/
1087
1088 I proposed adopting the "minimal" syntax for footnotes and footnote
1089 references, and adding citations and citation references to
1090 reStructuredText's repertoire.  The current footnote syntax for
1091 citations is better than the alternatives given.
1092
1093 From a reply by Tony Ibbs on 2002-03-01:
1094
1095     However, I think easier with examples, so let's create one::
1096
1097         Fans of Terry Pratchett are perhaps more likely to use
1098         footnotes [1]_ in their own writings than other people
1099         [2]_.  Of course, in *general*, one only sees footnotes
1100         in academic or technical writing - it's use in fiction
1101         and letter writing is not normally considered good
1102         style [4]_, particularly in emails (not a medium that
1103         lends itself to footnotes).
1104
1105         .. [1] That is, little bits of referenced text at the
1106            bottom of the page.
1107         .. [2] Because Terry himself does, of course [3]_.
1108         .. [3] Although he has the distinction of being
1109            *funny* when he does it, and his fans don't always
1110            achieve that aim.
1111         .. [4] Presumably because it detracts from linear
1112            reading of the text - this is, of course, the point.
1113
1114     and look at it with the second syntax proposal::
1115
1116         Fans of Terry Pratchett are perhaps more likely to use
1117         footnotes [1]_ in their own writings than other people
1118         [2]_.  Of course, in *general*, one only sees footnotes
1119         in academic or technical writing - it's use in fiction
1120         and letter writing is not normally considered good
1121         style [4]_, particularly in emails (not a medium that
1122         lends itself to footnotes).
1123
1124         _[1] That is, little bits of referenced text at the
1125              bottom of the page.
1126         _[2] Because Terry himself does, of course [3]_.
1127         _[3] Although he has the distinction of being
1128              *funny* when he does it, and his fans don't always
1129              achieve that aim.
1130         _[4] Presumably because it detracts from linear
1131              reading of the text - this is, of course, the point.
1132
1133     (I note here that if I have gotten the indentation of the
1134     footnotes themselves correct, this is clearly not as nice.  And if
1135     the indentation should be to the left margin instead, I like that
1136     even less).
1137
1138     and the third (new) proposal::
1139
1140         Fans of Terry Pratchett are perhaps more likely to use
1141         footnotes 1_ in their own writings than other people
1142         2_.  Of course, in *general*, one only sees footnotes
1143         in academic or technical writing - it's use in fiction
1144         and letter writing is not normally considered good
1145         style 4_, particularly in emails (not a medium that
1146         lends itself to footnotes).
1147
1148         _1. That is, little bits of referenced text at the
1149             bottom of the page.
1150         _2. Because Terry himself does, of course 3_.
1151         _3. Although he has the distinction of being
1152             *funny* when he does it, and his fans don't always
1153             achieve that aim.
1154         _4. Presumably because it detracts from linear
1155             reading of the text - this is, of course, the point.
1156
1157     I think I don't, in practice, mind the targets too much (the use
1158     of a dot after the number helps a lot here), but I do have a
1159     problem with the body text, in that I don't naturally separate out
1160     the footnotes as different than the rest of the text - instead I
1161     keep wondering why there are numbers interspered in the text.  The
1162     use of brackets around the numbers ([ and ]) made me somehow parse
1163     the footnote references as "odd" - i.e., not part of the body text
1164     - and thus both easier to skip, and also (paradoxically) easier to
1165     pick out so that I could follow them.
1166
1167     Thus, for the moment (and as always susceptable to argument), I'd
1168     say -1 on the new form of footnote reference (i.e., I much prefer
1169     the existing ``[1]_`` over the proposed ``1_``), and ambivalent
1170     over the proposed target change.
1171
1172     That leaves David's problem of wanting to distinguish footnotes
1173     and citations - and the only thing I can propose there is that
1174     footnotes are numeric or # and citations are not (which, as a
1175     human being, I can probably cope with!).
1176
1177 From a reply by Paul Moore on 2002-03-01:
1178
1179     I think the current footnote syntax ``[1]_`` is *exactly* the
1180     right balance of distinctness vs unobtrusiveness.  I very
1181     definitely don't think this should change.
1182
1183     On the target change, it doesn't matter much to me.
1184
1185 From a further reply by Tony Ibbs on 2002-03-01, referring to the
1186 "[1]" form and actual usage in email:
1187
1188     Clearly this is a form people are used to, and thus we should
1189     consider it strongly (in the same way that the usage of ``*..*``
1190     to mean emphasis was taken partly from email practise).
1191
1192     Equally clearly, there is something "magical" for people in the
1193     use of a similar form (i.e., ``[1]``) for both footnote reference
1194     and footnote target - it seems natural to keep them similar.
1195
1196     ...
1197
1198     I think that this established plaintext usage leads me to strongly
1199     believe we should retain square brackets at both ends of a
1200     footnote.  The markup of the reference end (a single trailing
1201     underscore) seems about as minimal as we can get away with.  The
1202     markup of the target end depends on how one envisages the thing -
1203     if ".." means "I am a target" (as I tend to see it), then that's
1204     good, but one can also argue that the "_[1]" syntax has a neat
1205     symmetry with the footnote reference itself, if one wishes (in
1206     which case ".." presumably means "hidden/special" as David seems
1207     to think, which is why one needs a ".." *and* a leading underline
1208     for hyperlink targets.
1209
1210 Given the persuading arguments voiced, we'll leave footnote & footnote
1211 reference syntax alone.  Except that these discussions gave rise to
1212 the "auto-symbol footnote" concept, which has been added.  Citations
1213 and citation references have also been added.
1214
1215
1216 Auto-Enumerated Lists
1217 =====================
1218
1219 The advantage of auto-numbered enumerated lists would be similar to
1220 that of auto-numbered footnotes: lists could be written and rearranged
1221 without having to manually renumber them.  The disadvantages are also
1222 the same: input and output wouldn't match exactly; the markup may be
1223 ugly or confusing (depending on which alternative is chosen).
1224
1225 1. Use the "#" symbol.  Example::
1226
1227        #. Item 1.
1228        #. Item 2.
1229        #. Item 3.
1230
1231    Advantages: simple, explicit.  Disadvantage: enumeration sequence
1232    cannot be specified (limited to arabic numerals); ugly.
1233
1234 2. As a variation on #1, first initialize the enumeration sequence?
1235    For example::
1236
1237        a) Item a.
1238        #) Item b.
1239        #) Item c.
1240
1241    Advantages: simple, explicit, any enumeration sequence possible.
1242    Disadvantages: ugly; perhaps confusing with mixed concrete/abstract
1243    enumerators.
1244
1245 3. Alternative suggested by Fred Bremmer, from experience with MoinMoin::
1246
1247        1. Item 1.
1248        1. Item 2.
1249        1. Item 3.
1250
1251    Advantages: enumeration sequence is explicit (could be multiple
1252    "a." or "(I)" tokens).  Disadvantages: perhaps confusing; otherwise
1253    erroneous input (e.g., a duplicate item "1.") would pass silently,
1254    either causing a problem later in the list (if no blank lines
1255    between items) or creating two lists (with blanks).
1256
1257    Take this input for example::
1258
1259        1. Item 1.
1260
1261        1. Unintentional duplicate of item 1.
1262
1263        2. Item 2.
1264
1265    Currently the parser will produce two list, "1" and "1,2" (no
1266    warnings, because of the presence of blank lines).  Using Fred's
1267    notation, the current behavior is "1,1,2 -> 1 1,2" (without blank
1268    lines between items, it would be "1,1,2 -> 1 [WARNING] 1,2").  What
1269    should the behavior be with auto-numbering?
1270
1271    Fred has produced a patch__, whose initial behavior is as follows::
1272
1273        1,1,1   -> 1,2,3
1274        1,2,2   -> 1,2,3
1275        3,3,3   -> 3,4,5
1276        1,2,2,3 -> 1,2,3 [WARNING] 3
1277        1,1,2   -> 1,2 [WARNING] 2
1278
1279    (After the "[WARNING]", the "3" would begin a new list.)
1280
1281    I have mixed feelings about adding this functionality to the spec &
1282    parser.  It would certainly be useful to some users (myself
1283    included; I often have to renumber lists).  Perhaps it's too
1284    clever, asking the parser to guess too much.  What if you *do* want
1285    three one-item lists in a row, each beginning with "1."?  You'd
1286    have to use empty comments to force breaks.  Also, I question
1287    whether "1,2,2 -> 1,2,3" is optimal behavior.
1288
1289    In response, Fred came up with "a stricter and more explicit rule
1290    [which] would be to only auto-number silently if *all* the
1291    enumerators of a list were identical".  In that case::
1292
1293        1,1,1   -> 1,2,3
1294        1,2,2   -> 1,2 [WARNING] 2
1295        3,3,3   -> 3,4,5
1296        1,2,2,3 -> 1,2 [WARNING] 2,3
1297        1,1,2   -> 1,2 [WARNING] 2
1298
1299    Should any start-value be allowed ("3,3,3"), or should
1300    auto-numbered lists be limited to begin with ordinal-1 ("1", "A",
1301    "a", "I", or "i")?
1302
1303    __ http://sourceforge.net/tracker/index.php?func=detail&aid=548802
1304       &group_id=38414&atid=422032
1305
1306 4. Alternative proposed by Tony Ibbs::
1307
1308        #1. First item.
1309        #3. Aha - I edited this in later.
1310        #2. Second item.
1311
1312    The initial proposal required unique enumerators within a list, but
1313    this limits the convenience of a feature of already limited
1314    applicability and convenience.  Not a useful requirement; dropped.
1315
1316    Instead, simply prepend a "#" to a standard list enumerator to
1317    indicate auto-enumeration.  The numbers (or letters) of the
1318    enumerators themselves are not significant, except:
1319
1320    - as a sequence indicator (arabic, roman, alphabetic; upper/lower),
1321
1322    - and perhaps as a start value (first list item).
1323
1324    Advantages: explicit, any enumeration sequence possible.
1325    Disadvantages: a bit ugly.
1326
1327
1328 Inline External Targets
1329 =======================
1330
1331 Currently reStructuredText has two hyperlink syntax variations:
1332
1333 * Named hyperlinks::
1334
1335       This is a named reference_ of one word ("reference").  Here is
1336       a `phrase reference`_.  Phrase references may even cross `line
1337       boundaries`_.
1338
1339       .. _reference: http://www.example.org/reference/
1340       .. _phrase reference: http://www.example.org/phrase_reference/
1341       .. _line boundaries: http://www.example.org/line_boundaries/
1342
1343   + Advantages:
1344
1345     - The plaintext is readable.
1346     - Each target may be reused multiple times (e.g., just write
1347       ``"reference_"`` again).
1348     - No syncronized ordering of references and targets is necessary.
1349
1350   + Disadvantages:
1351
1352     - The reference text must be repeated as target names; could lead
1353       to mistakes.
1354     - The target URLs may be located far from the references, and hard
1355       to find in the plaintext.
1356
1357 * Anonymous hyperlinks (in current reStructuredText)::
1358
1359       This is an anonymous reference__.  Here is an anonymous
1360       `phrase reference`__.  Phrase references may even cross `line
1361       boundaries`__.
1362
1363       __ http://www.example.org/reference/
1364       __ http://www.example.org/phrase_reference/
1365       __ http://www.example.org/line_boundaries/
1366
1367   + Advantages:
1368
1369     - The plaintext is readable.
1370     - The reference text does not have to be repeated.
1371
1372   + Disadvantages:
1373
1374     - References and targets must be kept in sync.
1375     - Targets cannot be reused.
1376     - The target URLs may be located far from the references.
1377
1378 For comparison and historical background, StructuredText also has two
1379 syntaxes for hyperlinks:
1380
1381 * First, ``"reference text":URL``::
1382
1383       This is a "reference":http://www.example.org/reference/
1384       of one word ("reference").  Here is a "phrase
1385       reference":http://www.example.org/phrase_reference/.
1386
1387 * Second, ``"reference text", http://example.com/absolute_URL``::
1388
1389       This is a "reference", http://www.example.org/reference/
1390       of one word ("reference").  Here is a "phrase reference",
1391       http://www.example.org/phrase_reference/.
1392
1393 Both syntaxes share advantages and disadvantages:
1394
1395 + Advantages:
1396
1397   - The target is specified immediately adjacent to the reference.
1398
1399 + Disadvantages:
1400
1401   - Poor plaintext readability.
1402   - Targets cannot be reused.
1403   - Both syntaxes use double quotes, common in ordinary text.
1404   - In the first syntax, the URL and the last word are stuck
1405     together, exacerbating the line wrap problem.
1406   - The second syntax is too magical; text could easily be written
1407     that way by accident (although only absolute URLs are recognized
1408     here, perhaps because of the potential for ambiguity).
1409
1410 A new type of "inline external hyperlink" has been proposed.
1411
1412 1. On 2002-06-28, Simon Budig proposed__ a new syntax for
1413    reStructuredText hyperlinks::
1414
1415        This is a reference_(http://www.example.org/reference/) of one
1416        word ("reference").  Here is a `phrase
1417        reference`_(http://www.example.org/phrase_reference/).  Are
1418        these examples, (single-underscore), named?  If so, `anonymous
1419        references`__(http://www.example.org/anonymous/) using two
1420        underscores would probably be preferable.
1421
1422    __ http://mail.python.org/pipermail/doc-sig/2002-June/002648.html
1423
1424    The syntax, advantages, and disadvantages are similar to those of
1425    StructuredText.
1426
1427    + Advantages:
1428
1429      - The target is specified immediately adjacent to the reference.
1430
1431    + Disadvantages:
1432
1433      - Poor plaintext readability.
1434      - Targets cannot be reused (unless named, but the semantics are
1435        unclear).
1436
1437    + Problems:
1438
1439      - The ``"`ref`_(URL)"`` syntax forces the last word of the
1440        reference text to be joined to the URL, making a potentially
1441        very long word that can't be wrapped (URLs can be very long).
1442        The reference and the URL should be separate.  This is a
1443        symptom of the following point:
1444
1445      - The syntax produces a single compound construct made up of two
1446        equally important parts, *with syntax in the middle*, *between*
1447        the reference and the target.  This is unprecedented in
1448        reStructuredText.
1449
1450      - The "inline hyperlink" text is *not* a named reference (there's
1451        no lookup by name), so it shouldn't look like one.
1452
1453      - According to the IETF standards RFC 2396 and RFC 2732,
1454        parentheses are legal URI characters and curly braces are legal
1455        email characters, making their use prohibitively difficult.
1456
1457      - The named/anonymous semantics are unclear.
1458
1459 2. After an analysis__ of the syntax of (1) above, we came up with the
1460    following compromise syntax::
1461
1462        This is an anonymous reference__
1463        __<http://www.example.org/reference/> of one word
1464        ("reference").  Here is a `phrase reference`__
1465        __<http://www.example.org/phrase_reference/>.  `Named
1466        references`_ _<http://www.example.org/anonymous/> use single
1467        underscores.
1468
1469    __ http://mail.python.org/pipermail/doc-sig/2002-July/002670.html
1470
1471    The syntax builds on that of the existing "inline internal
1472    targets": ``an _`inline internal target`.``
1473
1474    + Advantages:
1475
1476      - The target is specified immediately adjacent to the reference,
1477        improving maintainability:
1478
1479        - References and targets are easily kept in sync.
1480        - The reference text does not have to be repeated.
1481
1482      - The construct is executed in two parts: references identical to
1483        existing references, and targets that are new but not too big a
1484        stretch from current syntax.
1485
1486      - There's overwhelming precedent for quoting URLs with angle
1487        brackets [#]_.
1488
1489    + Disadvantages:
1490
1491      - Poor plaintext readability.
1492      - Lots of "line noise".
1493      - Targets cannot be reused (unless named; see below).
1494
1495    To alleviate the readability issue slightly, we could allow the
1496    target to appear later, such as after the end of the sentence::
1497
1498        This is a named reference__ of one word ("reference").
1499        __<http://www.example.org/reference/>  Here is a `phrase
1500        reference`__.  __<http://www.example.org/phrase_reference/>
1501
1502    Problem: this could only work for one reference at a time
1503    (reference/target pairs must be proximate [refA trgA refB trgB],
1504    not interleaved [refA refB trgA trgB] or nested [refA refB trgB
1505    trgA]).  This variation is too problematic; references and inline
1506    external targets will have to be kept imediately adjacent (see (3)
1507    below).
1508
1509    The ``"reference__ __<target>"`` syntax is actually for "anonymous
1510    inline external targets", emphasized by the double underscores.  It
1511    follows that single trailing and leading underscores would lead to
1512    *implicitly named* inline external targets.  This would allow the
1513    reuse of targets by name.  So after ``"reference_ _<target>"``,
1514    another ``"reference_"`` would point to the same target.
1515
1516    .. [#]
1517       From RFC 2396 (URI syntax):
1518
1519           The angle-bracket "<" and ">" and double-quote (")
1520           characters are excluded [from URIs] because they are often
1521           used as the delimiters around URI in text documents and
1522           protocol fields.
1523
1524           Using <> angle brackets around each URI is especially
1525           recommended as a delimiting style for URI that contain
1526           whitespace.
1527
1528       From RFC 822 (email headers):
1529
1530           Angle brackets ("<" and ">") are generally used to indicate
1531           the presence of a one machine-usable reference (e.g.,
1532           delimiting mailboxes), possibly including source-routing to
1533           the machine.
1534
1535 3. If it is best for references and inline external targets to be
1536    immediately adjacent, then they might as well be integrated.
1537    Here's an alternative syntax embedding the target URL in the
1538    reference::
1539
1540        This is an anonymous `reference <http://www.example.org
1541        /reference/>`__ of one word ("reference").  Here is a `phrase
1542        reference <http://www.example.org/phrase_reference/>`__.
1543
1544    Advantages and disadvantages are similar to those in (2).
1545    Readability is still an issue, but the syntax is a bit less
1546    heavyweight (reduced line noise).  Backquotes are required, even
1547    for one-word references; the target URL is included within the
1548    reference text, forcing a phrase context.
1549
1550    We'll call this variant "embedded URIs".
1551
1552    Problem: how to refer to a title like "HTML Anchors: <a>" (which
1553    ends with an HTML/SGML/XML tag)?  We could either require more
1554    syntax on the target (like ``"`reference text
1555    __<http://example.com/>`__"``), or require the odd conflicting
1556    title to be escaped (like ``"`HTML Anchors: \<a>`__"``).  The
1557    latter seems preferable, and not too onerous.
1558
1559    Similarly to (2) above, a single trailing underscore would convert
1560    the reference & inline external target from anonymous to implicitly
1561    named, allowing reuse of targets by name.
1562
1563    I think this is the least objectionable of the syntax alternatives.
1564
1565 Other syntax variations have been proposed (by Brett Cannon and Benja
1566 Fallenstein)::
1567
1568     `phrase reference`->http://www.example.com
1569
1570     `phrase reference`@http://www.example.com
1571
1572     `phrase reference`__ ->http://www.example.com
1573
1574     `phrase reference` [-> http://www.example.com]
1575
1576     `phrase reference`__ [-> http://www.example.com]
1577
1578     `phrase reference` <http://www.example.com>_
1579
1580 None of these variations are clearly superior to #3 above.  Some have
1581 problems that exclude their use.
1582
1583 With any kind of inline external target syntax it comes down to the
1584 conflict between maintainability and plaintext readability.  I don't
1585 see a major problem with reStructuredText's maintainability, and I
1586 don't want to sacrifice plaintext readability to "improve" it.
1587
1588 The proponents of inline external targets want them for easily
1589 maintainable web pages.  The arguments go something like this:
1590
1591 - Named hyperlinks are difficult to maintain because the reference
1592   text is duplicated as the target name.
1593
1594   To which I said, "So use anonymous hyperlinks."
1595
1596 - Anonymous hyperlinks are difficult to maintain becuase the
1597   references and targets have to be kept in sync.
1598
1599   "So keep the targets close to the references, grouped after each
1600   paragraph.  Maintenance is trivial."
1601
1602 - But targets grouped after paragraphs break the flow of text.
1603
1604   "Surely less than URLs embedded in the text!  And if the intent is
1605   to produce web pages, not readable plaintext, then who cares about
1606   the flow of text?"
1607
1608 Many participants have voiced their objections to the proposed syntax:
1609
1610     Garth Kidd: "I strongly prefer the current way of doing it.
1611     Inline is spectactularly messy, IMHO."
1612
1613     Tony Ibbs: "I vehemently agree... that the inline alternatives
1614     being suggested look messy - there are/were good reasons they've
1615     been taken out...  I don't believe I would gain from the new
1616     syntaxes."
1617
1618     Paul Moore: "I agree as well.  The proposed syntax is far too
1619     punctuation-heavy, and any of the alternatives discussed are
1620     ambiguous or too subtle."
1621
1622 Others have voiced their support:
1623
1624     fantasai: "I agree with Simon.  In many cases, though certainly
1625     not in all, I find parenthesizing the url in plain text flows
1626     better than relegating it to a footnote."
1627
1628     Ken Manheimer: "I'd like to weigh in requesting some kind of easy,
1629     direct inline reference link."
1630
1631 (Interesting that those *against* the proposal have been using
1632 reStructuredText for a while, and those *for* the proposal are either
1633 new to the list ["fantasai", background unknown] or longtime
1634 StructuredText users [Ken Manheimer].)
1635
1636 I was initially ambivalent/against the proposed "inline external
1637 targets".  I value reStructuredText's readability very highly, and
1638 although the proposed syntax offers convenience, I don't know if the
1639 convenience is worth the cost in ugliness.  Does the proposed syntax
1640 compromise readability too much, or should the choice be left up to
1641 the author?  Perhaps if the syntax is *allowed* but its use strongly
1642 *discouraged*, for aesthetic/readability reasons?
1643
1644 After a great deal of thought and much input from users, I've decided
1645 that there are reasonable use cases for this construct.  The
1646 documentation should strongly caution against its use in most
1647 situations, recommending independent block-level targets instead.
1648 Syntax #3 above ("embedded URIs") will be used.
1649
1650
1651 Doctree Representation of Transitions
1652 =====================================
1653
1654 (Although not reStructuredText-specific, this section fits best in
1655 this document.)
1656
1657 Having added the "horizontal rule" construct to the `reStructuredText
1658 Markup Specification`_, a decision had to be made as to how to reflect
1659 the construct in the implementation of the document tree.  Given this
1660 source::
1661
1662     Document
1663     ========
1664
1665     Paragraph 1
1666
1667     --------
1668
1669     Paragraph 2
1670
1671 The horizontal rule indicates a "transition" (in prose terms) or the
1672 start of a new "division".  Before implementation, the parsed document
1673 tree would be::
1674
1675     <document>
1676         <section name="document">
1677             <title>
1678                 Document
1679             <paragraph>
1680                 Paragraph 1
1681             --------               <--- error here
1682             <paragraph>
1683                 Paragraph 2
1684
1685 There are several possibilities for the implementation:
1686
1687 1. Implement horizontal rules as "divisions" or segments.  A
1688    "division" is a title-less, non-hierarchical section.  The first
1689    try at an implementation looked like this::
1690
1691        <document>
1692            <section name="document">
1693                <title>
1694                    Document
1695                <paragraph>
1696                    Paragraph 1
1697                <division>
1698                    <paragraph>
1699                        Paragraph 2
1700
1701    But the two paragraphs are really at the same level; they shouldn't
1702    appear to be at different levels.  There's really an invisible
1703    "first division".  The horizontal rule splits the document body
1704    into two segments, which should be treated uniformly.
1705
1706 2. Treating "divisions" uniformly brings us to the second
1707    possibility::
1708
1709        <document>
1710            <section name="document">
1711                <title>
1712                    Document
1713                <division>
1714                    <paragraph>
1715                        Paragraph 1
1716                <division>
1717                    <paragraph>
1718                        Paragraph 2
1719
1720    With this change, documents and sections will directly contain
1721    divisions and sections, but not body elements.  Only divisions will
1722    directly contain body elements.  Even without a horizontal rule
1723    anywhere, the body elements of a document or section would be
1724    contained within a division element.  This makes the document tree
1725    deeper.  This is similar to the way HTML_ treats document contents:
1726    grouped within a ``<body>`` element.
1727
1728 3. Implement them as "transitions", empty elements::
1729
1730        <document>
1731            <section name="document">
1732                <title>
1733                    Document
1734                <paragraph>
1735                    Paragraph 1
1736                <transition>
1737                <paragraph>
1738                    Paragraph 2
1739
1740    A transition would be a "point element", not containing anything,
1741    only identifying a point within the document structure.  This keeps
1742    the document tree flatter, but the idea of a "point element" like
1743    "transition" smells bad.  A transition isn't a thing itself, it's
1744    the space between two divisions.  However, transitions are a
1745    practical solution.
1746
1747 Solution 3 was chosen for incorporation into the document tree model.
1748
1749 .. _HTML: http://www.w3.org/MarkUp/
1750
1751 \f
1752 ..
1753    Local Variables:
1754    mode: indented-text
1755    indent-tabs-mode: nil
1756    sentence-end-double-space: t
1757    fill-column: 70
1758    End: