docs/peps/pep-0258.txt

   1 PEP: 258
   2 Title: Docutils Design Specification
   3 Version: $Revision$
   4 Last-Modified: $Date$
   5 Author: David Goodger <goodger@python.org>
   6 Discussions-To: <doc-sig@python.org>
   7 Status: Draft
   8 Type: Standards Track
   9 Content-Type: text/x-rst
  10 Requires: 256, 257
  11 Created: 31-May-2001
  12 Post-History: 13-Jun-2001
  13
  14
  15 ==========
  16  Abstract
  17 ==========
  18
  19 This PEP documents design issues and implementation details for
  20 Docutils, a Python Docstring Processing System (DPS).  The rationale
  21 and high-level concepts of a DPS are documented in PEP 256, "Docstring
  22 Processing System Framework" [#PEP-256]_.  Also see PEP 256 for a
  23 "Road Map to the Docstring PEPs".
  24
  25 Docutils is being designed modularly so that any of its components can
  26 be replaced easily.  In addition, Docutils is not limited to the
  27 processing of Python docstrings; it processes standalone documents as
  28 well, in several contexts.
  29
  30 No changes to the core Python language are required by this PEP.  Its
  31 deliverables consist of a package for the standard library and its
  32 documentation.
  33
  34
  35 ===============
  36  Specification
  37 ===============
  38
  39 Docutils Project Model
  40 ======================
  41
  42 Project components and data flow::
  43
  44                      +---------------------------+
  45                      |        Docutils:          |
  46                      | docutils.core.Publisher,  |
  47                      | docutils.core.publish_*() |
  48                      +---------------------------+
  49                       /            |            \
  50                      /             |             \
  51             1,3,5   /        6     |              \ 7
  52            +--------+       +-------------+       +--------+
  53            | READER | ----> | TRANSFORMER | ====> | WRITER |
  54            +--------+       +-------------+       +--------+
  55             /     \\                                  |
  56            /       \\                                 |
  57      2    /      4  \\                             8  |
  58     +-------+   +--------+                        +--------+
  59     | INPUT |   | PARSER |                        | OUTPUT |
  60     +-------+   +--------+                        +--------+
  61
  62 The numbers above each component indicate the path a document's data
  63 takes.  Double-width lines between Reader & Parser and between
  64 Transformer & Writer indicate that data sent along these paths should
  65 be standard (pure & unextended) Docutils doc trees.  Single-width
  66 lines signify that internal tree extensions or completely unrelated
  67 representations are possible, but they must be supported at both ends.
  68
  69
  70 Publisher
  71 ---------
  72
  73 The ``docutils.core`` module contains a "Publisher" facade class and
  74 several convenience functions: "publish_cmdline()" (for command-line
  75 front ends), "publish_file()" (for programmatic use with file-like
  76 I/O), and "publish_string()" (for programmatic use with string I/O).
  77 The Publisher class encapsulates the high-level logic of a Docutils
  78 system.  The Publisher class has overall responsibility for
  79 processing, controlled by the ``Publisher.publish()`` method:
  80
  81 1. Set up internal settings (may include config files & command-line
  82    options) and I/O objects.
  83
  84 2. Call the Reader object to read data from the source Input object
  85    and parse the data with the Parser object.  A document object is
  86    returned.
  87
  88 3. Set up and apply transforms via the Transformer object attached to
  89    the document.
  90
  91 4. Call the Writer object which translates the document to the final
  92    output format and writes the formatted data to the destination
  93    Output object.  Depending on the Output object, the output may be
  94    returned from the Writer, and then from the ``publish()`` method.
  95
  96 Calling the "publish" function (or instantiating a "Publisher" object)
  97 with component names will result in default behavior.  For custom
  98 behavior (customizing component settings), create custom component
  99 objects first, and pass *them* to the Publisher or ``publish_*``
 100 convenience functions.
 101
 102
 103 Readers
 104 -------
 105
 106 Readers understand the input context (where the data is coming from),
 107 send the whole input or discrete "chunks" to the parser, and provide
 108 the context to bind the chunks together back into a cohesive whole.
 109
 110 Each reader is a module or package exporting a "Reader" class with a
 111 "read" method.  The base "Reader" class can be found in the
 112 ``docutils/readers/__init__.py`` module.
 113
 114 Most Readers will have to be told what parser to use.  So far (see the
 115 list of examples below), only the Python Source Reader ("PySource";
 116 still incomplete) will be able to determine the parser on its own.
 117
 118 Responsibilities:
 119
 120 * Get input text from the source I/O.
 121
 122 * Pass the input text to the parser, along with a fresh `document
 123   tree`_ root.
 124
 125 Examples:
 126
 127 * Standalone (Raw/Plain): Just read a text file and process it.
 128   The reader needs to be told which parser to use.
 129
 130   The "Standalone Reader" has been implemented in module
 131   ``docutils.readers.standalone``.
 132
 133 * Python Source: See `Python Source Reader`_ below.  This Reader is
 134   currently in development in the Docutils sandbox.
 135
 136 * Email: RFC-822 headers, quoted excerpts, signatures, MIME parts.
 137
 138 * PEP: RFC-822 headers, "PEP xxxx" and "RFC xxxx" conversion to URIs.
 139   The "PEP Reader" has been implemented in module
 140   ``docutils.readers.pep``; see PEP 287 and PEP 12.
 141
 142 * Wiki: Global reference lookups of "wiki links" incorporated into
 143   transforms.  (CamelCase only or unrestricted?)  Lazy
 144   indentation?
 145
 146 * Web Page: As standalone, but recognize meta fields as meta tags.
 147   Support for templates of some sort?  (After ``<body>``, before
 148   ``</body>``?)
 149
 150 * FAQ: Structured "question & answer(s)" constructs.
 151
 152 * Compound document: Merge chapters into a book.  Master manifest
 153   file?
 154
 155
 156 Parsers
 157 -------
 158
 159 Parsers analyze their input and produce a Docutils `document tree`_.
 160 They don't know or care anything about the source or destination of
 161 the data.
 162
 163 Each input parser is a module or package exporting a "Parser" class
 164 with a "parse" method.  The base "Parser" class can be found in the
 165 ``docutils/parsers/__init__.py`` module.
 166
 167 Responsibilities: Given raw input text and a doctree root node,
 168 populate the doctree by parsing the input text.
 169
 170 Example: The only parser implemented so far is for the
 171 reStructuredText markup.  It is implemented in the
 172 ``docutils/parsers/rst/`` package.
 173
 174 The development and integration of other parsers is possible and
 175 encouraged.
 176
 177
 178 .. _transforms:
 179
 180 Transformer
 181 -----------
 182
 183 The Transformer class, in ``docutils/transforms/__init__.py``, stores
 184 transforms and applies them to documents.  A transformer object is
 185 attached to every new document tree.  The Publisher_ calls
 186 ``Transformer.apply_transforms()`` to apply all stored transforms to
 187 the document tree.  Transforms change the document tree from one form
 188 to another, add to the tree, or prune it.  Transforms resolve
 189 references and footnote numbers, process interpreted text, and do
 190 other context-sensitive processing.
 191
 192 Some transforms are specific to components (Readers, Parser, Writers,
 193 Input, Output).  Standard component-specific transforms are specified
 194 in the ``default_transforms`` attribute of component classes.  After
 195 the Reader has finished processing, the Publisher_ calls
 196 ``Transformer.populate_from_components()`` with a list of components
 197 and all default transforms are stored.
 198
 199 Each transform is a class in a module in the ``docutils/transforms/``
 200 package, a subclass of ``docutils.tranforms.Transform``.  Transform
 201 classes each have a ``default_priority`` attribute which is used by
 202 the Transformer to apply transforms in order (low to high).  The
 203 default priority can be overridden when adding transforms to the
 204 Transformer object.
 205
 206 Transformer responsibilities:
 207
 208 * Apply transforms to the document tree, in priority order.
 209
 210 * Store a mapping of component type name ('reader', 'writer', etc.) to
 211   component objects.  These are used by certain transforms (such as
 212   "components.Filter") to determine suitability.
 213
 214 Transform responsibilities:
 215
 216 * Modify a doctree in-place, either purely transforming one structure
 217   into another, or adding new structures based on the doctree and/or
 218   external data.
 219
 220 Examples of transforms (in the ``docutils/transforms/`` package):
 221
 222 * frontmatter.DocInfo: Conversion of document metadata (bibliographic
 223   information).
 224
 225 * references.AnonymousHyperlinks: Resolution of anonymous references
 226   to corresponding targets.
 227
 228 * parts.Contents: Generates a table of contents for a document.
 229
 230 * document.Merger: Combining multiple populated doctrees into one.
 231   (Not yet implemented or fully understood.)
 232
 233 * document.Splitter: Splits a document into a tree-structure of
 234   subdocuments, perhaps by section.  It will have to transform
 235   references appropriately.  (Neither implemented not remotely
 236   understood.)
 237
 238 * components.Filter: Includes or excludes elements which depend on a
 239   specific Docutils component.
 240
 241
 242 Writers
 243 -------
 244
 245 Writers produce the final output (HTML, XML, TeX, etc.).  Writers
 246 translate the internal `document tree`_ structure into the final data
 247 format, possibly running Writer-specific transforms_ first.
 248
 249 By the time the document gets to the Writer, it should be in final
 250 form.  The Writer's job is simply (and only) to translate from the
 251 Docutils doctree structure to the target format.  Some small
 252 transforms may be required, but they should be local and
 253 format-specific.
 254
 255 Each writer is a module or package exporting a "Writer" class with a
 256 "write" method.  The base "Writer" class can be found in the
 257 ``docutils/writers/__init__.py`` module.
 258
 259 Responsibilities:
 260
 261 * Translate doctree(s) into specific output formats.
 262
 263   - Transform references into format-native forms.
 264
 265 * Write the translated output to the destination I/O.
 266
 267 Examples:
 268
 269 * XML: Various forms, such as:
 270
 271   - Docutils XML (an expression of the internal document tree,
 272     implemented as ``docutils.writers.docutils_xml``).
 273
 274   - DocBook (being implemented in the Docutils sandbox).
 275
 276 * HTML (XHTML implemented as ``docutils.writers.html4css1``).
 277
 278 * PDF (a ReportLabs interface is being developed in the Docutils
 279   sandbox).
 280
 281 * TeX (a LaTeX Writer is being implemented in the sandbox).
 282
 283 * Docutils-native pseudo-XML (implemented as
 284   ``docutils.writers.pseudoxml``, used for testing).
 285
 286 * Plain text
 287
 288 * reStructuredText?
 289
 290
 291 Input/Output
 292 ------------
 293
 294 I/O classes provide a uniform API for low-level input and output.
 295 Subclasses will exist for a variety of input/output mechanisms.
 296 However, they can be considered an implementation detail.  Most
 297 applications should be satisfied using one of the convenience
 298 functions associated with the Publisher_.
 299
 300 I/O classes are currently in the preliminary stages; there's a lot of
 301 work yet to be done.  Issues:
 302
 303 * How to represent multi-file input (files & directories) in the API?
 304
 305 * How to represent multi-file output?  Perhaps "Writer" variants, one
 306   for each output distribution type?  Or Output objects with
 307   associated transforms?
 308
 309 Responsibilities:
 310
 311 * Read data from the input source (Input objects) or write data to the
 312   output destination (Output objects).
 313
 314 Examples of input sources:
 315
 316 * A single file on disk or a stream (implemented as
 317   ``docutils.io.FileInput``).
 318
 319 * Multiple files on disk (``MultiFileInput``?).
 320
 321 * Python source files: modules and packages.
 322
 323 * Python strings, as received from a client application
 324   (implemented as ``docutils.io.StringInput``).
 325
 326 Examples of output destinations:
 327
 328 * A single file on disk or a stream (implemented as
 329   ``docutils.io.FileOutput``).
 330
 331 * A tree of directories and files on disk.
 332
 333 * A Python string, returned to a client application (implemented as
 334   ``docutils.io.StringOutput``).
 335
 336 * No output; useful for programmatic applications where only a portion
 337   of the normal output is to be used (implemented as
 338   ``docutils.io.NullOutput``).
 339
 340 * A single tree-shaped data structure in memory.
 341
 342 * Some other set of data structures in memory.
 343
 344
 345 Docutils Package Structure
 346 ==========================
 347
 348 * Package "docutils".
 349
 350   - Module "__init__.py" contains: class "Component", a base class for
 351     Docutils components; class "SettingsSpec", a base class for
 352     specifying runtime settings (used by docutils.frontend); and class
 353     "TransformSpec", a base class for specifying transforms.
 354
 355   - Module "docutils.core" contains facade class "Publisher" and
 356     convenience functions.  See `Publisher`_ above.
 357
 358   - Module "docutils.frontend" provides runtime settings support, for
 359     programmatic use and front-end tools (including configuration file
 360     support, and command-line argument and option processing).
 361
 362   - Module "docutils.io" provides a uniform API for low-level input
 363     and output.  See `Input/Output`_ above.
 364
 365   - Module "docutils.nodes" contains the Docutils document tree
 366     element class library plus tree-traversal Visitor pattern base
 367     classes.  See `Document Tree`_ below.
 368
 369   - Module "docutils.statemachine" contains a finite state machine
 370     specialized for regular-expression-based text filters and parsers.
 371     The reStructuredText parser implementation is based on this
 372     module.
 373
 374   - Module "docutils.urischemes" contains a mapping of known URI
 375     schemes ("http", "ftp", "mail", etc.).
 376
 377   - Module "docutils.utils" contains utility functions and classes,
 378     including a logger class ("Reporter"; see `Error Handling`_
 379     below).
 380
 381   - Package "docutils.parsers": markup parsers_.
 382
 383     - Function "get_parser_class(parser_name)" returns a parser module
 384       by name.  Class "Parser" is the base class of specific parsers.
 385       (``docutils/parsers/__init__.py``)
 386
 387     - Package "docutils.parsers.rst": the reStructuredText parser.
 388
 389     - Alternate markup parsers may be added.
 390
 391     See `Parsers`_ above.
 392
 393   - Package "docutils.readers": context-aware input readers.
 394
 395     - Function "get_reader_class(reader_name)" returns a reader module
 396       by name or alias.  Class "Reader" is the base class of specific
 397       readers.  (``docutils/readers/__init__.py``)
 398
 399     - Module "docutils.readers.standalone" reads independent document
 400       files.
 401
 402     - Module "docutils.readers.pep" reads PEPs (Python Enhancement
 403       Proposals).
 404
 405     - Module "docutils.readers.doctree" is used to re-read a
 406       previously stored document tree for reprocessing.
 407
 408     - Readers to be added for: Python source code (structure &
 409       docstrings), email, FAQ, and perhaps Wiki and others.
 410
 411     See `Readers`_ above.
 412
 413   - Package "docutils.writers": output format writers.
 414
 415     - Function "get_writer_class(writer_name)" returns a writer module
 416       by name.  Class "Writer" is the base class of specific writers.
 417       (``docutils/writers/__init__.py``)
 418
 419     - Package "docutils.writers.html4css1" is a simple HyperText
 420       Markup Language document tree writer for HTML 4.01 and CSS1.
 421
 422     - Package "docutils.writers.pep_html" generates HTML from
 423       reStructuredText PEPs.
 424
 425     - Package "docutils.writers.s5_html" generates S5/HTML slide
 426       shows.
 427
 428     - Package "docutils.writers.latex2e" writes LaTeX.
 429
 430     - Package "docutils.writers.newlatex2e" also writes LaTeX; it is a
 431       new implementation.
 432
 433     - Module "docutils.writers.docutils_xml" writes the internal
 434       document tree in XML form.
 435
 436     - Module "docutils.writers.pseudoxml" is a simple internal
 437       document tree writer; it writes indented pseudo-XML.
 438
 439     - Module "docutils.writers.null" is a do-nothing writer; it is
 440       used for specialized purposes such as storing the internal
 441       document tree.
 442
 443     - Writers to be added: HTML 3.2 or 4.01-loose, XML (various forms,
 444       such as DocBook), PDF, plaintext, reStructuredText, and perhaps
 445       others.
 446
 447     Subpackages of "docutils.writers" contain modules and data files
 448     (such as stylesheets) that support the individual writers.
 449
 450     See `Writers`_ above.
 451
 452   - Package "docutils.transforms": tree transform classes.
 453
 454     - Class "Transformer" stores transforms and applies them to
 455       document trees.  (``docutils/transforms/__init__.py``)
 456
 457     - Class "Transform" is the base class of specific transforms.
 458       (``docutils/transforms/__init__.py``)
 459
 460     - Each module contains related transform classes.
 461
 462     See `Transforms`_ above.
 463
 464   - Package "docutils.languages": Language modules contain
 465     language-dependent strings and mappings.  They are named for their
 466     language identifier (as defined in `Choice of Docstring Format`_
 467     below), converting dashes to underscores.
 468
 469     - Function "get_language(language_code)", returns matching
 470       language module.  (``docutils/languages/__init__.py``)
 471
 472     - Modules: en.py (English), de.py (German), fr.py (French), it.py
 473       (Italian), sk.py (Slovak), sv.py (Swedish).
 474
 475     - Other languages to be added.
 476
 477 * Third-party modules: "extras" directory.  These modules are
 478   installed only if they're not already present in the Python
 479   installation.
 480
 481   - ``extras/roman.py`` contains Roman numeral conversion routines.
 482
 483
 484 Front-End Tools
 485 ===============
 486
 487 The ``tools/`` directory contains several front ends for common
 488 Docutils processing.  See `Docutils Front-End Tools`_ for details.
 489
 490 .. _Docutils Front-End Tools:
 491    http://docutils.sourceforge.net/docs/user/tools.html
 492
 493
 494 Document Tree
 495 =============
 496
 497 A single intermediate data structure is used internally by Docutils,
 498 in the interfaces between components; it is defined in the
 499 ``docutils.nodes`` module.  It is not required that this data
 500 structure be used *internally* by any of the components, just
 501 *between* components as outlined in the diagram in the `Docutils
 502 Project Model`_ above.
 503
 504 Custom node types are allowed, provided that either (a) a transform
 505 converts them to standard Docutils nodes before they reach the Writer
 506 proper, or (b) the custom node is explicitly supported by certain
 507 Writers, and is wrapped in a filtered "pending" node.  An example of
 508 condition (a) is the `Python Source Reader`_ (see below), where a
 509 "stylist" transform converts custom nodes.  The HTML ``<meta>`` tag is
 510 an example of condition (b); it is supported by the HTML Writer but
 511 not by others.  The reStructuredText "meta" directive creates a
 512 "pending" node, which contains knowledge that the embedded "meta" node
 513 can only be handled by HTML-compatible writers.  The "pending" node is
 514 resolved by the ``docutils.transforms.components.Filter`` transform,
 515 which checks that the calling writer supports HTML; if it doesn't, the
 516 "pending" node (and enclosed "meta" node) is removed from the
 517 document.
 518
 519 The document tree data structure is similar to a DOM tree, but with
 520 specific node names (classes) instead of DOM's generic nodes. The
 521 schema is documented in an XML DTD (eXtensible Markup Language
 522 Document Type Definition), which comes in two parts:
 523
 524 * the Docutils Generic DTD, docutils.dtd_, and
 525
 526 * the OASIS Exchange Table Model, soextbl.dtd_.
 527
 528 The DTD defines a rich set of elements, suitable for many input and
 529 output formats.  The DTD retains all information necessary to
 530 reconstruct the original input text, or a reasonable facsimile
 531 thereof.
 532
 533 See `The Docutils Document Tree`_ for details (incomplete).
 534
 535
 536 Error Handling
 537 ==============
 538
 539 When the parser encounters an error in markup, it inserts a system
 540 message (DTD element "system_message").  There are five levels of
 541 system messages:
 542
 543 * Level-0, "DEBUG": an internal reporting issue.  There is no effect
 544   on the processing.  Level-0 system messages are handled separately
 545   from the others.
 546
 547 * Level-1, "INFO": a minor issue that can be ignored.  There is little
 548   or no effect on the processing.  Typically level-1 system messages
 549   are not reported.
 550
 551 * Level-2, "WARNING": an issue that should be addressed.  If ignored,
 552   there may be minor problems with the output.  Typically level-2
 553   system messages are reported but do not halt processing.
 554
 555 * Level-3, "ERROR": a major issue that should be addressed.  If
 556   ignored, the output will contain unpredictable errors.  Typically
 557   level-3 system messages are reported but do not halt processing.
 558
 559 * Level-4, "SEVERE": a critical error that must be addressed.
 560   Typically level-4 system messages are turned into exceptions which
 561   do halt processing.  If ignored, the output will contain severe
 562   errors.
 563
 564 Although the initial message levels were devised independently, they
 565 have a strong correspondence to `VMS error condition severity
 566 levels`_; the names in quotes for levels 1 through 4 were borrowed
 567 from VMS.  Error handling has since been influenced by the `log4j
 568 project`_.
 569
 570
 571 Python Source Reader
 572 ====================
 573
 574 The Python Source Reader ("PySource") is the Docutils component that
 575 reads Python source files, extracts docstrings in context, then
 576 parses, links, and assembles the docstrings into a cohesive whole.  It
 577 is a major and non-trivial component, currently under experimental
 578 development in the Docutils sandbox.  High-level design issues are
 579 presented here.
 580
 581
 582 Processing Model
 583 ----------------
 584
 585 This model will evolve over time, incorporating experience and
 586 discoveries.
 587
 588 1. The PySource Reader uses an Input class to read in Python packages
 589    and modules, into a tree of strings.
 590
 591 2. The Python modules are parsed, converting the tree of strings into
 592    a tree of abstract syntax trees with docstring nodes.
 593
 594 3. The abstract syntax trees are converted into an internal
 595    representation of the packages/modules.  Docstrings are extracted,
 596    as well as code structure details.  See `AST Mining`_ below.
 597    Namespaces are constructed for lookup in step 6.
 598
 599 4. One at a time, the docstrings are parsed, producing standard
 600    Docutils doctrees.
 601
 602 5. PySource assembles all the individual docstrings' doctrees into a
 603    Python-specific custom Docutils tree paralleling the
 604    package/module/class structure; this is a custom Reader-specific
 605    internal representation (see the `Docutils Python Source DTD`_).
 606    Namespaces must be merged: Python identifiers, hyperlink targets.
 607
 608 6. Cross-references from docstrings (interpreted text) to Python
 609    identifiers are resolved according to the Python namespace lookup
 610    rules.  See `Identifier Cross-References`_ below.
 611
 612 7. A "Stylist" transform is applied to the custom doctree (by the
 613    Transformer_), custom nodes are rendered using standard nodes as
 614    primitives, and a standard document tree is emitted.  See `Stylist
 615    Transforms`_ below.
 616
 617 8. Other transforms are applied to the standard doctree by the
 618    Transformer_.
 619
 620 9. The standard doctree is sent to a Writer, which translates the
 621    document into a concrete format (HTML, PDF, etc.).
 622
 623 10. The Writer uses an Output class to write the resulting data to its
 624     destination (disk file, directories and files, etc.).
 625
 626
 627 AST Mining
 628 ----------
 629
 630 Abstract Syntax Tree mining code will be written (or adapted) that
 631 scans a parsed Python module, and returns an ordered tree containing
 632 the names, docstrings (including attribute and additional docstrings;
 633 see below), and additional info (in parentheses below) of all of the
 634 following objects:
 635
 636 * packages
 637 * modules
 638 * module attributes (+ initial values)
 639 * classes (+ inheritance)
 640 * class attributes (+ initial values)
 641 * instance attributes (+ initial values)
 642 * methods (+ parameters & defaults)
 643 * functions (+ parameters & defaults)
 644
 645 (Extract comments too?  For example, comments at the start of a module
 646 would be a good place for bibliographic field lists.)
 647
 648 In order to evaluate interpreted text cross-references, namespaces for
 649 each of the above will also be required.
 650
 651 See the python-dev/docstring-develop thread "AST mining", started on
 652 2001-08-14.
 653
 654
 655 Docstring Extraction Rules
 656 --------------------------
 657
 658 1. What to examine:
 659
 660    a) If the "``__all__``" variable is present in the module being
 661       documented, only identifiers listed in "``__all__``" are
 662       examined for docstrings.
 663
 664    b) In the absence of "``__all__``", all identifiers are examined,
 665       except those whose names are private (names begin with "_" but
 666       don't begin and end with "__").
 667
 668    c) 1a and 1b can be overridden by runtime settings.
 669
 670 2. Where:
 671
 672    Docstrings are string literal expressions, and are recognized in
 673    the following places within Python modules:
 674
 675    a) At the beginning of a module, function definition, class
 676       definition, or method definition, after any comments.  This is
 677       the standard for Python ``__doc__`` attributes.
 678
 679    b) Immediately following a simple assignment at the top level of a
 680       module, class definition, or ``__init__`` method definition,
 681       after any comments.  See `Attribute Docstrings`_ below.
 682
 683    c) Additional string literals found immediately after the
 684       docstrings in (a) and (b) will be recognized, extracted, and
 685       concatenated.  See `Additional Docstrings`_ below.
 686
 687    d) @@@ 2.2-style "properties" with attribute docstrings?  Wait for
 688       syntax?
 689
 690 3. How:
 691
 692    Whenever possible, Python modules should be parsed by Docutils, not
 693    imported.  There are several reasons:
 694
 695    - Importing untrusted code is inherently insecure.
 696
 697    - Information from the source is lost when using introspection to
 698      examine an imported module, such as comments and the order of
 699      definitions.
 700
 701    - Docstrings are to be recognized in places where the byte-code
 702      compiler ignores string literal expressions (2b and 2c above),
 703      meaning importing the module will lose these docstrings.
 704
 705    Of course, standard Python parsing tools such as the "parser"
 706    library module should be used.
 707
 708    When the Python source code for a module is not available
 709    (i.e. only the ``.pyc`` file exists) or for C extension modules, to
 710    access docstrings the module can only be imported, and any
 711    limitations must be lived with.
 712
 713 Since attribute docstrings and additional docstrings are ignored by
 714 the Python byte-code compiler, no namespace pollution or runtime bloat
 715 will result from their use.  They are not assigned to ``__doc__`` or
 716 to any other attribute.  The initial parsing of a module may take a
 717 slight performance hit.
 718
 719
 720 Attribute Docstrings
 721 ''''''''''''''''''''
 722
 723 (This is a simplified version of PEP 224 [#PEP-224]_.)
 724
 725 A string literal immediately following an assignment statement is
 726 interpreted by the docstring extraction machinery as the docstring of
 727 the target of the assignment statement, under the following
 728 conditions:
 729
 730 1. The assignment must be in one of the following contexts:
 731
 732    a) At the top level of a module (i.e., not nested inside a compound
 733       statement such as a loop or conditional): a module attribute.
 734
 735    b) At the top level of a class definition: a class attribute.
 736
 737    c) At the top level of the "``__init__``" method definition of a
 738       class: an instance attribute.  Instance attributes assigned in
 739       other methods are assumed to be implementation details.  (@@@
 740       ``__new__`` methods?)
 741
 742    d) A function attribute assignment at the top level of a module or
 743       class definition.
 744
 745    Since each of the above contexts are at the top level (i.e., in the
 746    outermost suite of a definition), it may be necessary to place
 747    dummy assignments for attributes assigned conditionally or in a
 748    loop.
 749
 750 2. The assignment must be to a single target, not to a list or a tuple
 751    of targets.
 752
 753 3. The form of the target:
 754
 755    a) For contexts 1a and 1b above, the target must be a simple
 756       identifier (not a dotted identifier, a subscripted expression,
 757       or a sliced expression).
 758
 759    b) For context 1c above, the target must be of the form
 760       "``self.attrib``", where "``self``" matches the "``__init__``"
 761       method's first parameter (the instance parameter) and "attrib"
 762       is a simple identifier as in 3a.
 763
 764    c) For context 1d above, the target must be of the form
 765       "``name.attrib``", where "``name``" matches an already-defined
 766       function or method name and "attrib" is a simple identifier as
 767       in 3a.
 768
 769 Blank lines may be used after attribute docstrings to emphasize the
 770 connection between the assignment and the docstring.
 771
 772 Examples::
 773
 774     g = 'module attribute (module-global variable)'
 775     """This is g's docstring."""
 776
 777     class AClass:
 778
 779         c = 'class attribute'
 780         """This is AClass.c's docstring."""
 781
 782         def __init__(self):
 783             """Method __init__'s docstring."""
 784
 785             self.i = 'instance attribute'
 786             """This is self.i's docstring."""
 787
 788     def f(x):
 789         """Function f's docstring."""
 790         return x**2
 791
 792     f.a = 1
 793     """Function attribute f.a's docstring."""
 794
 795
 796 Additional Docstrings
 797 '''''''''''''''''''''
 798
 799 (This idea was adapted from PEP 216 [#PEP-216]_.)
 800
 801 Many programmers would like to make extensive use of docstrings for
 802 API documentation.  However, docstrings do take up space in the
 803 running program, so some programmers are reluctant to "bloat up" their
 804 code.  Also, not all API documentation is applicable to interactive
 805 environments, where ``__doc__`` would be displayed.
 806
 807 Docutils' docstring extraction tools will concatenate all string
 808 literal expressions which appear at the beginning of a definition or
 809 after a simple assignment.  Only the first strings in definitions will
 810 be available as ``__doc__``, and can be used for brief usage text
 811 suitable for interactive sessions; subsequent string literals and all
 812 attribute docstrings are ignored by the Python byte-code compiler and
 813 may contain more extensive API information.
 814
 815 Example::
 816
 817     def function(arg):
 818         """This is __doc__, function's docstring."""
 819         """
 820         This is an additional docstring, ignored by the byte-code
 821         compiler, but extracted by Docutils.
 822         """
 823         pass
 824
 825 .. topic:: Issue: ``from __future__ import``
 826
 827    This would break "``from __future__ import``" statements introduced
 828    in Python 2.1 for multiple module docstrings (main docstring plus
 829    additional docstring(s)).  The Python Reference Manual specifies:
 830
 831        A future statement must appear near the top of the module.  The
 832        only lines that can appear before a future statement are:
 833
 834        * the module docstring (if any),
 835        * comments,
 836        * blank lines, and
 837        * other future statements.
 838
 839    Resolution?
 840
 841    1. Should we search for docstrings after a ``__future__``
 842       statement?  Very ugly.
 843
 844    2. Redefine ``__future__`` statements to allow multiple preceding
 845       string literals?
 846
 847    3. Or should we not even worry about this?  There probably
 848       shouldn't be ``__future__`` statements in production code, after
 849       all.  Perhaps modules with ``__future__`` statements will simply
 850       have to put up with the single-docstring limitation.
 851
 852
 853 Choice of Docstring Format
 854 --------------------------
 855
 856 Rather than force everyone to use a single docstring format, multiple
 857 input formats are allowed by the processing system.  A special
 858 variable, ``__docformat__``, may appear at the top level of a module
 859 before any function or class definitions.  Over time or through
 860 decree, a standard format or set of formats should emerge.
 861
 862 A module's ``__docformat__`` variable only applies to the objects
 863 defined in the module's file.  In particular, the ``__docformat__``
 864 variable in a package's ``__init__.py`` file does not apply to objects
 865 defined in subpackages and submodules.
 866
 867 The ``__docformat__`` variable is a string containing the name of the
 868 format being used, a case-insensitive string matching the input
 869 parser's module or package name (i.e., the same name as required to
 870 "import" the module or package), or a registered alias.  If no
 871 ``__docformat__`` is specified, the default format is "plaintext" for
 872 now; this may be changed to the standard format if one is ever
 873 established.
 874
 875 The ``__docformat__`` string may contain an optional second field,
 876 separated from the format name (first field) by a single space: a
 877 case-insensitive language identifier as defined in RFC 1766.  A
 878 typical language identifier consists of a 2-letter language code from
 879 `ISO 639`_ (3-letter codes used only if no 2-letter code exists; RFC
 880 1766 is currently being revised to allow 3-letter codes).  If no
 881 language identifier is specified, the default is "en" for English.
 882 The language identifier is passed to the parser and can be used for
 883 language-dependent markup features.
 884
 885
 886 Identifier Cross-References
 887 ---------------------------
 888
 889 In Python docstrings, interpreted text is used to classify and mark up
 890 program identifiers, such as the names of variables, functions,
 891 classes, and modules.  If the identifier alone is given, its role is
 892 inferred implicitly according to the Python namespace lookup rules.
 893 For functions and methods (even when dynamically assigned),
 894 parentheses ('()') may be included::
 895
 896     This function uses `another()` to do its work.
 897
 898 For class, instance and module attributes, dotted identifiers are used
 899 when necessary.  For example (using reStructuredText markup)::
 900
 901     class Keeper(Storer):
 902
 903         """
 904         Extend `Storer`.  Class attribute `instances` keeps track
 905         of the number of `Keeper` objects instantiated.
 906         """
 907
 908         instances = 0
 909         """How many `Keeper` objects are there?"""
 910
 911         def __init__(self):
 912             """
 913             Extend `Storer.__init__()` to keep track of instances.
 914
 915             Keep count in `Keeper.instances`, data in `self.data`.
 916             """
 917             Storer.__init__(self)
 918             Keeper.instances += 1
 919
 920             self.data = []
 921             """Store data in a list, most recent last."""
 922
 923         def store_data(self, data):
 924             """
 925             Extend `Storer.store_data()`; append new `data` to a
 926             list (in `self.data`).
 927             """
 928             self.data = data
 929
 930 Each of the identifiers quoted with backquotes ("`") will become
 931 references to the definitions of the identifiers themselves.
 932
 933
 934 Stylist Transforms
 935 ------------------
 936
 937 Stylist transforms are specialized transforms specific to the PySource
 938 Reader.  The PySource Reader doesn't have to make any decisions as to
 939 style; it just produces a logically constructed document tree, parsed
 940 and linked, including custom node types.  Stylist transforms
 941 understand the custom nodes created by the Reader and convert them
 942 into standard Docutils nodes.
 943
 944 Multiple Stylist transforms may be implemented and one can be chosen
 945 at runtime (through a "--style" or "--stylist" command-line option).
 946 Each Stylist transform implements a different layout or style; thus
 947 the name.  They decouple the context-understanding part of the Reader
 948 from the layout-generating part of processing, resulting in a more
 949 flexible and robust system.  This also serves to "separate style from
 950 content", the SGML/XML ideal.
 951
 952 By keeping the piece of code that does the styling small and modular,
 953 it becomes much easier for people to roll their own styles.  The
 954 "barrier to entry" is too high with existing tools; extracting the
 955 stylist code will lower the barrier considerably.
 956
 957
 958 ==========================
 959  References and Footnotes
 960 ==========================
 961
 962 .. [#PEP-256] PEP 256, Docstring Processing System Framework, Goodger
 963    (http://www.python.org/peps/pep-0256.html)
 964
 965 .. [#PEP-224] PEP 224, Attribute Docstrings, Lemburg
 966    (http://www.python.org/peps/pep-0224.html)
 967
 968 .. [#PEP-216] PEP 216, Docstring Format, Zadka
 969    (http://www.python.org/peps/pep-0216.html)
 970
 971 .. _docutils.dtd:
 972    http://docutils.sourceforge.net/docs/ref/docutils.dtd
 973
 974 .. _soextbl.dtd:
 975    http://docutils.sourceforge.net/docs/ref/soextblx.dtd
 976
 977 .. _The Docutils Document Tree:
 978    http://docutils.sourceforge.net/docs/ref/doctree.html
 979
 980 .. _VMS error condition severity levels:
 981    http://www.openvms.compaq.com:8000/73final/5841/841pro_027.html
 982    #error_cond_severity
 983
 984 .. _log4j project: http://logging.apache.org/log4j/docs/index.html
 985
 986 .. _Docutils Python Source DTD:
 987    http://docutils.sourceforge.net/docs/dev/pysource.dtd
 988
 989 .. _ISO 639: http://www.loc.gov/standards/iso639-2/englangn.html
 990
 991 .. _Python Doc-SIG: http://www.python.org/sigs/doc-sig/
 992
 993
 994
 995 ==================
 996  Project Web Site
 997 ==================
 998
 999 A SourceForge project has been set up for this work at
1000 http://docutils.sourceforge.net/.
1001
1002
1003 ===========
1004  Copyright
1005 ===========
1006
1007 This document has been placed in the public domain.
1008
1009
1010 ==================
1011  Acknowledgements
1012 ==================
1013
1014 This document borrows ideas from the archives of the `Python
1015 Doc-SIG`_.  Thanks to all members past & present.
1016
1017
1018 \f
1019 ..
1020    Local Variables:
1021    mode: indented-text
1022    indent-tabs-mode: nil
1023    sentence-end-double-space: t
1024    fill-column: 70
1025    End: