pylit.py

   1 #!/usr/bin/env python
   2 # -*- coding: iso-8859-1 -*-
   3
   4 # pylit.py
   5 # ********
   6 # Literate programming with reStructuredText
   7 # ++++++++++++++++++++++++++++++++++++++++++
   8 #
   9 # :Date:      $Date: 2011-10-12 12:51:40 +0200 (Mi, 12. Okt 2011) $
  10 # :Revision:  $Revision: 125 $
  11 # :URL:       $URL: svn+ssh://svn.berlios.de/svnroot/repos/pylit/trunk/src/pylit.py $
  12 # :Copyright: © 2005, 2007 Günter Milde.
  13 #             Released without warranty under the terms of the
  14 #             GNU General Public License (v. 2 or later)
  15 #
  16 # ::
  17
  18 """pylit: bidirectional text <-> code converter
  19
  20 Covert between a *text source* with embedded computer code and a *code source*
  21 with embedded documentation.
  22 """
  23
  24 # .. contents::
  25 #
  26 # Frontmatter
  27 # ===========
  28 #
  29 # Changelog
  30 # ---------
  31 #
  32 # .. class:: borderless
  33 #
  34 # ======  ==========  ===========================================================
  35 # 0.1     2005-06-29  Initial version.
  36 # 0.1.1   2005-06-30  First literate version.
  37 # 0.1.2   2005-07-01  Object orientated script using generators.
  38 # 0.1.3   2005-07-10  Two state machine (later added 'header' state).
  39 # 0.2b    2006-12-04  Start of work on version 0.2 (code restructuring).
  40 # 0.2     2007-01-23  Published at http://pylit.berlios.de.
  41 # 0.2.1   2007-01-25  Outsourced non-core documentation to the PyLit pages.
  42 # 0.2.2   2007-01-26  New behaviour of `diff` function.
  43 # 0.2.3   2007-01-29  New `header` methods after suggestion by Riccardo Murri.
  44 # 0.2.4   2007-01-31  Raise Error if code indent is too small.
  45 # 0.2.5   2007-02-05  New command line option --comment-string.
  46 # 0.2.6   2007-02-09  Add section with open questions,
  47 #                     Code2Text: let only blank lines (no comment str)
  48 #                     separate text and code,
  49 #                     fix `Code2Text.header`.
  50 # 0.2.7   2007-02-19  Simplify `Code2Text.header`,
  51 #                     new `iter_strip` method replacing a lot of ``if``-s.
  52 # 0.2.8   2007-02-22  Set `mtime` of outfile to the one of infile.
  53 # 0.3     2007-02-27  New `Code2Text` converter after an idea by Riccardo Murri,
  54 #                     explicit `option_defaults` dict for easier customisation.
  55 # 0.3.1   2007-03-02  Expand hard-tabs to prevent errors in indentation,
  56 #                     `Text2Code` now also works on blocks,
  57 #                     removed dependency on SimpleStates module.
  58 # 0.3.2   2007-03-06  Bug fix: do not set `language` in `option_defaults`
  59 #                     renamed `code_languages` to `languages`.
  60 # 0.3.3   2007-03-16  New language css,
  61 #                     option_defaults -> defaults = optparse.Values(),
  62 #                     simpler PylitOptions: don't store parsed values,
  63 #                     don't parse at initialisation,
  64 #                     OptionValues: return `None` for non-existing attributes,
  65 #                     removed -infile and -outfile, use positional arguments.
  66 # 0.3.4   2007-03-19  Documentation update,
  67 #                     separate `execute` function.
  68 #         2007-03-21  Code cleanup in `Text2Code.__iter__`.
  69 # 0.3.5   2007-03-23  Removed "css" from known languages after learning that
  70 #                     there is no C++ style "// " comment string in CSS2.
  71 # 0.3.6   2007-04-24  Documentation update.
  72 # 0.4     2007-05-18  Implement Converter.__iter__ as stack of iterator
  73 #                     generators. Iterating over a converter instance now
  74 #                     yields lines instead of blocks.
  75 #                     Provide "hooks" for pre- and postprocessing filters.
  76 #                     Rename states to reduce confusion with formats:
  77 #                     "text" -> "documentation", "code" -> "code_block".
  78 # 0.4.1   2007-05-22  Converter.__iter__: cleanup and reorganisation,
  79 #                     rename parent class Converter -> TextCodeConverter.
  80 # 0.4.2   2007-05-23  Merged Text2Code.converter and Code2Text.converter into
  81 #                     TextCodeConverter.converter.
  82 # 0.4.3   2007-05-30  Replaced use of defaults.code_extensions with
  83 #                     values.languages.keys().
  84 #                     Removed spurious `print` statement in code_block_handler.
  85 #                     Added basic support for 'c' and 'css' languages
  86 #                     with `dumb_c_preprocessor`_ and `dumb_c_postprocessor`_.
  87 # 0.5     2007-06-06  Moved `collect_blocks`_ out of `TextCodeConverter`_,
  88 #                     bug fix: collect all trailing blank lines into a block.
  89 #                     Expand tabs with `expandtabs_filter`_.
  90 # 0.6     2007-06-20  Configurable code-block marker (default ``::``)
  91 # 0.6.1   2007-06-28  Bug fix: reset self.code_block_marker_missing.
  92 # 0.7     2007-12-12  prepending an empty string to sys.path in run_doctest()
  93 #                     to allow imports from the current working dir.
  94 # 0.7.1   2008-01-07  If outfile does not exist, do a round-trip conversion
  95 #                     and report differences (as with outfile=='-').
  96 # 0.7.2   2008-01-28  Do not add missing code-block separators with
  97 #                     `doctest_run` on the code source. Keeps lines consistent.
  98 # 0.7.3   2008-04-07  Use value of code_block_marker for insertion of missing
  99 #                     transition marker in Code2Text.code_block_handler
 100 #                     Add "shell" to defaults.languages
 101 # 0.7.4   2008-06-23  Add "latex" to defaults.languages
 102 # 0.7.5   2009-05-14  Bugfix: ignore blank lines in test for end of code block
 103 # 0.7.6   2009-12-15  language-dependent code-block markers (after a
 104 #                     `feature request and patch by jrioux`_),
 105 #                     use DefaultDict for language-dependent defaults,
 106 #                     new defaults setting `add_missing_marker`_.
 107 # 0.7.7   2010-06-23  New command line option --codeindent.
 108 # 0.7.8   2011-03-30  bugfix: do not overwrite custom `add_missing_marker` value,
 109 #                     allow directive options following the 'code' directive.
 110 # 0.7.9   2011-04-05  Decode doctest string if 'magic comment' gives encoding.
 111 # 0.7.10  2013-06-07  Add "lua" to defaults.languages
 112 # ======  ==========  ===========================================================
 113 #
 114 # ::
 115
 116 _version = "0.7.9"
 117
 118 __docformat__ = 'restructuredtext'
 119
 120
 121 # Introduction
 122 # ------------
 123 #
 124 # PyLit is a bidirectional converter between two formats of a computer
 125 # program source:
 126 #
 127 # * a (reStructured) text document with program code embedded in
 128 #   *code blocks*, and
 129 # * a compilable (or executable) code source with *documentation*
 130 #   embedded in comment blocks
 131 #
 132 #
 133 # Requirements
 134 # ------------
 135 #
 136 # ::
 137
 138 import __builtin__, os, sys
 139 import re, optparse
 140
 141
 142 # DefaultDict
 143 # ~~~~~~~~~~~
 144 # As `collections.defaultdict` is only introduced in Python 2.5, we
 145 # define a simplified version of the dictionary with default from
 146 # http://code.activestate.com/recipes/389639/
 147 # ::
 148
 149 class DefaultDict(dict):
 150     """Minimalistic Dictionary with default value."""
 151     def __init__(self, default=None, *args, **kwargs):
 152         self.update(dict(*args, **kwargs))
 153         self.default = default
 154
 155     def __getitem__(self, key):
 156         return self.get(key, self.default)
 157
 158
 159 # Defaults
 160 # ========
 161 #
 162 # The `defaults` object provides a central repository for default
 163 # values and their customisation. ::
 164
 165 defaults = optparse.Values()
 166
 167 # It is used for
 168 #
 169 # * the initialisation of data arguments in TextCodeConverter_ and
 170 #   PylitOptions_
 171 #
 172 # * completion of command line options in `PylitOptions.complete_values`_.
 173 #
 174 # This allows the easy creation of back-ends that customise the
 175 # defaults and then call `main`_ e.g.:
 176 #
 177 # >>> import pylit
 178 # >>> pylit.defaults.comment_string = "## "
 179 # >>> pylit.defaults.codeindent = 4
 180 # >>> pylit.main()
 181 #
 182 # The following default values are defined in pylit.py:
 183 #
 184 # languages
 185 # ---------
 186 #
 187 # Mapping of code file extensions to code language::
 188
 189 defaults.languages  = DefaultDict("python", # fallback language
 190                                   {".c":   "c",
 191                                    ".cc":  "c++",
 192                                    ".css": "css",
 193                                    ".lua": "lua",
 194                                    ".py":  "python",
 195                                    ".sh":  "shell",
 196                                    ".sl":  "slang",
 197                                    ".sty": "latex",
 198                                    ".tex": "latex"
 199                                   })
 200
 201 # Will be overridden by the ``--language`` command line option.
 202 #
 203 # The first argument is the fallback language, used if there is no
 204 # matching extension (e.g. if pylit is used as filter) and no
 205 # ``--language`` is specified. It can be changed programmatically by
 206 # assignment to the ``.default`` attribute, e.g.
 207 #
 208 # >>> defaults.languages.default='c++'
 209 #
 210 #
 211 # .. _text_extension:
 212 #
 213 # text_extensions
 214 # ---------------
 215 #
 216 # List of known extensions of (reStructured) text files. The first
 217 # extension in this list is used by the `_get_outfile_name`_ method to
 218 # generate a text output filename::
 219
 220 defaults.text_extensions = [".txt", ".rst"]
 221
 222
 223 # comment_strings
 224 # ---------------
 225 #
 226 # Comment strings for known languages. Used in Code2Text_ to recognise
 227 # text blocks and in Text2Code_ to format text blocks as comments.
 228 # Defaults to ``'# '``.
 229 #
 230 # **Comment strings include trailing whitespace.** ::
 231
 232 defaults.comment_strings = DefaultDict('# ',
 233                                        {"css":    '// ',
 234                                         "c":      '// ',
 235                                         "c++":    '// ',
 236                                         "lua":    '-- ',
 237                                         "latex":  '% ',
 238                                         "python": '# ',
 239                                         "shell":  '# ',
 240                                         "slang":  '% '
 241                                        })
 242
 243
 244 # header_string
 245 # -------------
 246 #
 247 # Marker string for a header code block in the text source. No trailing
 248 # whitespace needed as indented code follows.
 249 # Must be a valid rst directive that accepts code on the same line, e.g.
 250 # ``'..admonition::'``.
 251 #
 252 # Default is a comment marker::
 253
 254 defaults.header_string = '..'
 255
 256
 257 # .. _code_block_marker:
 258 #
 259 # code_block_markers
 260 # ------------------
 261 #
 262 # Markup at the end of a documentation block.
 263 # Default is Docutils' marker for a `literal block`_::
 264
 265 defaults.code_block_markers = DefaultDict('::')
 266
 267 # The `code_block_marker` string is `inserted into a regular expression`_.
 268 # Language-specific markers can be defined programmatically, e.g. in a
 269 # wrapper script.
 270 #
 271 # In a document where code examples are only one of several uses of
 272 # literal blocks, it is more appropriate to single out the source code
 273 # ,e.g. with the double colon at a separate line ("expanded form")
 274 #
 275 #   ``defaults.code_block_marker.default = ':: *'``
 276 #
 277 # or a dedicated ``.. code-block::`` directive [#]_
 278 #
 279 #   ``defaults.code_block_marker['c++'] = '.. code-block:: *c++'``
 280 #
 281 # The latter form also allows code in different languages kept together
 282 # in one literate source file.
 283 #
 284 # .. [#] The ``.. code-block::`` directive is not (yet) supported by
 285 #    standard Docutils.  It is provided by several add-ons, including
 286 #    the `code-block directive`_ project in the Docutils Sandbox and
 287 #    Sphinx_.
 288 #
 289 #
 290 # strip
 291 # -----
 292 #
 293 # Export to the output format stripping documentation or code blocks::
 294
 295 defaults.strip = False
 296
 297 # strip_marker
 298 # ------------
 299 #
 300 # Strip literal marker from the end of documentation blocks when
 301 # converting  to code format. Makes the code more concise but looses the
 302 # synchronisation of line numbers in text and code formats. Can also be used
 303 # (together with the auto-completion of the code-text conversion) to change
 304 # the `code_block_marker`::
 305
 306 defaults.strip_marker = False
 307
 308 # add_missing_marker
 309 # ------------------
 310 #
 311 # When converting from code format to text format, add a `code_block_marker`
 312 # at the end of documentation blocks if it is missing::
 313
 314 defaults.add_missing_marker = True
 315
 316 # Keep this at ``True``, if you want to re-convert to code format later!
 317 #
 318 #
 319 # .. _defaults.preprocessors:
 320 #
 321 # preprocessors
 322 # -------------
 323 #
 324 # Preprocess the data with language-specific filters_
 325 # Set below in Filters_::
 326
 327 defaults.preprocessors = {}
 328
 329 # .. _defaults.postprocessors:
 330 #
 331 # postprocessors
 332 # --------------
 333 #
 334 # Postprocess the data with language-specific filters_::
 335
 336 defaults.postprocessors = {}
 337
 338 # .. _defaults.codeindent:
 339 #
 340 # codeindent
 341 # ----------
 342 #
 343 # Number of spaces to indent code blocks in `Code2Text.code_block_handler`_::
 344
 345 defaults.codeindent =  2
 346
 347 # In `Text2Code.code_block_handler`_, the codeindent is determined by the
 348 # first recognised code line (header or first indented literal block
 349 # of the text source).
 350 #
 351 # overwrite
 352 # ---------
 353 #
 354 # What to do if the outfile already exists? (ignored if `outfile` == '-')::
 355
 356 defaults.overwrite = 'update'
 357
 358 # Recognised values:
 359 #
 360 #  :'yes':    overwrite eventually existing `outfile`,
 361 #  :'update': fail if the `outfile` is newer than `infile`,
 362 #  :'no':     fail if `outfile` exists.
 363 #
 364 #
 365 # Extensions
 366 # ==========
 367 #
 368 # Try to import optional extensions::
 369
 370 try:
 371     import pylit_elisp
 372 except ImportError:
 373     pass
 374
 375
 376 # Converter Classes
 377 # =================
 378 #
 379 # The converter classes implement a simple state machine to separate and
 380 # transform documentation and code blocks. For this task, only a very limited
 381 # parsing is needed. PyLit's parser assumes:
 382 #
 383 # * `indented literal blocks`_ in a text source are code blocks.
 384 #
 385 # * comment blocks in a code source where every line starts with a matching
 386 #   comment string are documentation blocks.
 387 #
 388 # TextCodeConverter
 389 # -----------------
 390 # ::
 391
 392 class TextCodeConverter(object):
 393     """Parent class for the converters `Text2Code` and `Code2Text`.
 394     """
 395
 396 # The parent class defines data attributes and functions used in both
 397 # `Text2Code`_ converting a text source to executable code source, and
 398 # `Code2Text`_ converting commented code to a text source.
 399 #
 400 # Data attributes
 401 # ~~~~~~~~~~~~~~~
 402 #
 403 # Class default values are fetched from the `defaults`_ object and can be
 404 # overridden by matching keyword arguments during class instantiation. This
 405 # also works with keyword arguments to `get_converter`_ and `main`_, as these
 406 # functions pass on unused keyword args to the instantiation of a converter
 407 # class. ::
 408
 409     language = defaults.languages.default
 410     comment_strings = defaults.comment_strings
 411     comment_string = "" # set in __init__ (if empty)
 412     codeindent =  defaults.codeindent
 413     header_string = defaults.header_string
 414     code_block_markers = defaults.code_block_markers
 415     code_block_marker = "" # set in __init__ (if empty)
 416     strip = defaults.strip
 417     strip_marker = defaults.strip_marker
 418     add_missing_marker = defaults.add_missing_marker
 419     directive_option_regexp = re.compile(r' +:(\w|[-._+:])+:( |$)')
 420     state = "" # type of current block, see `TextCodeConverter.convert`_
 421
 422 # Interface methods
 423 # ~~~~~~~~~~~~~~~~~
 424 #
 425 # .. _TextCodeConverter.__init__:
 426 #
 427 # __init__
 428 # """"""""
 429 #
 430 # Initialising sets the `data` attribute, an iterable object yielding lines of
 431 # the source to convert. [#]_
 432 #
 433 # .. [#] The most common choice of data is a `file` object with the text
 434 #        or code source.
 435 #
 436 #        To convert a string into a suitable object, use its splitlines method
 437 #        like ``"2 lines\nof source".splitlines(True)``.
 438 #
 439 #
 440 # Additional keyword arguments are stored as instance variables,
 441 # overwriting the class defaults::
 442
 443     def __init__(self, data, **keyw):
 444         """data   --  iterable data object
 445                       (list, file, generator, string, ...)
 446            **keyw --  remaining keyword arguments are
 447                       stored as data-attributes
 448         """
 449         self.data = data
 450         self.__dict__.update(keyw)
 451
 452 # If empty, `code_block_marker` and `comment_string` are set according
 453 # to the `language`::
 454
 455         if not self.code_block_marker:
 456             self.code_block_marker = self.code_block_markers[self.language]
 457         if not self.comment_string:
 458             self.comment_string = self.comment_strings[self.language]
 459         self.stripped_comment_string = self.comment_string.rstrip()
 460
 461 # Pre- and postprocessing filters are set (with
 462 # `TextCodeConverter.get_filter`_)::
 463
 464         self.preprocessor = self.get_filter("preprocessors", self.language)
 465         self.postprocessor = self.get_filter("postprocessors", self.language)
 466
 467 # .. _inserted into a regular expression:
 468 #
 469 # Finally, a regular_expression for the `code_block_marker` is compiled
 470 # to find valid cases of `code_block_marker` in a given line and return
 471 # the groups: ``\1 prefix, \2 code_block_marker, \3 remainder`` ::
 472
 473         marker = self.code_block_marker
 474         if marker == '::':
 475             # the default marker may occur at the end of a text line
 476             self.marker_regexp = re.compile('^( *(?!\.\.).*)(::)([ \n]*)$')
 477         else:
 478             # marker must be on a separate line
 479             self.marker_regexp = re.compile('^( *)(%s)(.*\n?)$' % marker)
 480
 481 # .. _TextCodeConverter.__iter__:
 482 #
 483 # __iter__
 484 # """"""""
 485 #
 486 # Return an iterator for the instance. Iteration yields lines of converted
 487 # data.
 488 #
 489 # The iterator is a chain of iterators acting on `self.data` that does
 490 #
 491 # * preprocessing
 492 # * text<->code format conversion
 493 # * postprocessing
 494 #
 495 # Pre- and postprocessing are only performed, if filters for the current
 496 # language are registered in `defaults.preprocessors`_ and|or
 497 # `defaults.postprocessors`_. The filters must accept an iterable as first
 498 # argument and yield the processed input data line-wise.
 499 # ::
 500
 501     def __iter__(self):
 502         """Iterate over input data source and yield converted lines
 503         """
 504         return self.postprocessor(self.convert(self.preprocessor(self.data)))
 505
 506
 507 # .. _TextCodeConverter.__call__:
 508 #
 509 # __call__
 510 # """"""""
 511 # The special `__call__` method allows the use of class instances as callable
 512 # objects. It returns the converted data as list of lines::
 513
 514     def __call__(self):
 515         """Iterate over state-machine and return results as list of lines"""
 516         return [line for line in self]
 517
 518
 519 # .. _TextCodeConverter.__str__:
 520 #
 521 # __str__
 522 # """""""
 523 # Return converted data as string::
 524
 525     def __str__(self):
 526         return "".join(self())
 527
 528
 529 # Helpers and convenience methods
 530 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 531 #
 532 # .. _TextCodeConverter.convert:
 533 #
 534 # convert
 535 # """""""
 536 #
 537 # The `convert` method generates an iterator that does the actual  code <-->
 538 # text format conversion. The converted data is yielded line-wise and the
 539 # instance's `status` argument indicates whether the current line is "header",
 540 # "documentation", or "code_block"::
 541
 542     def convert(self, lines):
 543         """Iterate over lines of a program document and convert
 544         between "text" and "code" format
 545         """
 546
 547 # Initialise internal data arguments. (Done here, so that every new iteration
 548 # re-initialises them.)
 549 #
 550 # `state`
 551 #   the "type" of the currently processed block of lines. One of
 552 #
 553 #   :"":              initial state: check for header,
 554 #   :"header":        leading code block: strip `header_string`,
 555 #   :"documentation": documentation part: comment out,
 556 #   :"code_block":    literal blocks containing source code: unindent.
 557 #
 558 # ::
 559
 560         self.state = ""
 561
 562 # `_codeindent`
 563 #   * Do not confuse the internal attribute `_codeindent` with the configurable
 564 #     `codeindent` (without the leading underscore).
 565 #   * `_codeindent` is set in `Text2Code.code_block_handler`_ to the indent of
 566 #     first non-blank "code_block" line and stripped from all "code_block" lines
 567 #     in the text-to-code conversion,
 568 #   * `codeindent` is set in `__init__` to `defaults.codeindent`_ and added to
 569 #     "code_block" lines in the code-to-text conversion.
 570 #
 571 # ::
 572
 573         self._codeindent = 0
 574
 575 # `_textindent`
 576 #   * set by `Text2Code.documentation_handler`_ to the minimal indent of a
 577 #     documentation block,
 578 #   * used in `Text2Code.set_state`_ to find the end of a code block.
 579 #
 580 # ::
 581
 582         self._textindent = 0
 583
 584 # `_add_code_block_marker`
 585 #   If the last paragraph of a documentation block does not end with a
 586 #   code_block_marker_, it should be added (otherwise, the back-conversion
 587 #   fails.).
 588 #
 589 #   `_add_code_block_marker` is set by `Code2Text.documentation_handler`_
 590 #   and evaluated by `Code2Text.code_block_handler`_, because the
 591 #   documentation_handler does not know whether the next block will be
 592 #   documentation (with no need for a code_block_marker) or a code block.
 593 #
 594 # ::
 595
 596         self._add_code_block_marker = False
 597
 598
 599
 600 # Determine the state of the block and convert with the matching "handler"::
 601
 602         for block in collect_blocks(expandtabs_filter(lines)):
 603             self.set_state(block)
 604             for line in getattr(self, self.state+"_handler")(block):
 605                 yield line
 606
 607
 608 # .. _TextCodeConverter.get_filter:
 609 #
 610 # get_filter
 611 # """"""""""
 612 # ::
 613
 614     def get_filter(self, filter_set, language):
 615         """Return language specific filter"""
 616         if self.__class__ == Text2Code:
 617             key = "text2"+language
 618         elif self.__class__ == Code2Text:
 619             key = language+"2text"
 620         else:
 621             key = ""
 622         try:
 623             return getattr(defaults, filter_set)[key]
 624         except (AttributeError, KeyError):
 625             # print "there is no %r filter in %r"%(key, filter_set)
 626             pass
 627         return identity_filter
 628
 629
 630 # get_indent
 631 # """"""""""
 632 # Return the number of leading spaces in `line`::
 633
 634     def get_indent(self, line):
 635         """Return the indentation of `string`.
 636         """
 637         return len(line) - len(line.lstrip())
 638
 639
 640 # Text2Code
 641 # ---------
 642 #
 643 # The `Text2Code` converter separates *code-blocks* [#]_ from *documentation*.
 644 # Code blocks are unindented, documentation is commented (or filtered, if the
 645 # ``strip`` option is True).
 646 #
 647 # .. [#] Only `indented literal blocks`_ are considered code-blocks. `quoted
 648 #        literal blocks`_, `parsed-literal blocks`_, and `doctest blocks`_ are
 649 #        treated as part of the documentation. This allows the inclusion of
 650 #        examples:
 651 #
 652 #           >>> 23 + 3
 653 #           26
 654 #
 655 #        Mark that there is no double colon before the doctest block in the
 656 #        text source.
 657 #
 658 # The class inherits the interface and helper functions from
 659 # TextCodeConverter_ and adds functions specific to the text-to-code format
 660 # conversion::
 661
 662 class Text2Code(TextCodeConverter):
 663     """Convert a (reStructured) text source to code source
 664     """
 665
 666 # .. _Text2Code.set_state:
 667 #
 668 # set_state
 669 # ~~~~~~~~~
 670 # ::
 671
 672     def set_state(self, block):
 673         """Determine state of `block`. Set `self.state`
 674         """
 675
 676 # `set_state` is used inside an iteration. Hence, if we are out of data, a
 677 # StopItertion exception should be raised::
 678
 679         if not block:
 680             raise StopIteration
 681
 682 # The new state depends on the active state (from the last block) and
 683 # features of the current block. It is either "header", "documentation", or
 684 # "code_block".
 685 #
 686 # If the current state is "" (first block), check for
 687 # the  `header_string` indicating a leading code block::
 688
 689         if self.state == "":
 690             # print "set state for %r"%block
 691             if block[0].startswith(self.header_string):
 692                 self.state = "header"
 693             else:
 694                 self.state = "documentation"
 695
 696 # If the current state is "documentation", the next block is also
 697 # documentation. The end of a documentation part is detected in the
 698 # `Text2Code.documentation_handler`_::
 699
 700         # elif self.state == "documentation":
 701         #    self.state = "documentation"
 702
 703 # A "code_block" ends with the first less indented, non-blank line.
 704 # `_textindent` is set by the documentation handler to the indent of the
 705 # preceding documentation block::
 706
 707         elif self.state in ["code_block", "header"]:
 708             indents = [self.get_indent(line) for line in block
 709                        if line.rstrip()]
 710             # print "set_state:", indents, self._textindent
 711             if indents and min(indents) <= self._textindent:
 712                 self.state = 'documentation'
 713             else:
 714                 self.state = 'code_block'
 715
 716 # TODO: (or not to do?) insert blank line before the first line with too-small
 717 # codeindent using self.ensure_trailing_blank_line(lines, line) (would need
 718 # split and push-back of the documentation part)?
 719 #
 720 # .. _Text2Code.header_handler:
 721 #
 722 # header_handler
 723 # ~~~~~~~~~~~~~~
 724 #
 725 # Sometimes code needs to remain on the first line(s) of the document to be
 726 # valid. The most common example is the "shebang" line that tells a POSIX
 727 # shell how to process an executable file::
 728
 729 #!/usr/bin/env python
 730
 731 # In Python, the special comment to indicate the encoding, e.g.
 732 # ``# -*- coding: iso-8859-1 -*-``, must occur before any other comment
 733 # or code too.
 734 #
 735 # If we want to keep the line numbers in sync for text and code source, the
 736 # reStructured Text markup for these header lines must start at the same line
 737 # as the first header line. Therefore, header lines could not be marked as
 738 # literal block (this would require the ``::`` and an empty line above the
 739 # code_block).
 740 #
 741 # OTOH, a comment may start at the same line as the comment marker and it
 742 # includes subsequent indented lines. Comments are visible in the reStructured
 743 # Text source but hidden in the pretty-printed output.
 744 #
 745 # With a header converted to comment in the text source, everything before
 746 # the first documentation block (i.e. before the first paragraph using the
 747 # matching comment string) will be hidden away (in HTML or PDF output).
 748 #
 749 # This seems a good compromise, the advantages
 750 #
 751 # * line numbers are kept
 752 # * the "normal" code_block conversion rules (indent/unindent by `codeindent` apply
 753 # * greater flexibility: you can hide a repeating header in a project
 754 #   consisting of many source files.
 755 #
 756 # set off the disadvantages
 757 #
 758 # - it may come as surprise if a part of the file is not "printed",
 759 # - one more syntax element to learn for rst newbies to start with pylit,
 760 #   (however, starting from the code source, this will be auto-generated)
 761 #
 762 # In the case that there is no matching comment at all, the complete code
 763 # source will become a comment -- however, in this case it is not very likely
 764 # the source is a literate document anyway.
 765 #
 766 # If needed for the documentation, it is possible to quote the header in (or
 767 # after) the first documentation block, e.g. as `parsed literal`.
 768 # ::
 769
 770     def header_handler(self, lines):
 771         """Format leading code block"""
 772         # strip header string from first line
 773         lines[0] = lines[0].replace(self.header_string, "", 1)
 774         # yield remaining lines formatted as code-block
 775         for line in self.code_block_handler(lines):
 776             yield line
 777
 778
 779 # .. _Text2Code.documentation_handler:
 780 #
 781 # documentation_handler
 782 # ~~~~~~~~~~~~~~~~~~~~~
 783 #
 784 # The 'documentation' handler processes everything that is not recognised as
 785 # "code_block". Documentation is quoted with `self.comment_string`
 786 # (or filtered with `--strip=True`).
 787 #
 788 # If end-of-documentation marker is detected,
 789 #
 790 # * set state to 'code_block'
 791 # * set `self._textindent` (needed by `Text2Code.set_state`_ to find the
 792 #   next "documentation" block)
 793 #
 794 # ::
 795
 796     def documentation_handler(self, lines):
 797         """Convert documentation blocks from text to code format
 798         """
 799         for line in lines:
 800             # test lines following the code-block marker for false positives
 801             if (self.state == "code_block" and line.rstrip()
 802                 and not self.directive_option_regexp.search(line)):
 803                 self.state = "documentation"
 804             # test for end of documentation block
 805             if self.marker_regexp.search(line):
 806                 self.state = "code_block"
 807                 self._textindent = self.get_indent(line)
 808             # yield lines
 809             if self.strip:
 810                 continue
 811             # do not comment blank lines preceding a code block
 812             if self.state == "code_block" and not line.rstrip():
 813                 yield line
 814             else:
 815                 yield self.comment_string + line
 816
 817
 818
 819
 820 # .. _Text2Code.code_block_handler:
 821 #
 822 # code_block_handler
 823 # ~~~~~~~~~~~~~~~~~~
 824 #
 825 # The "code_block" handler is called with an indented literal block. It
 826 # removes leading whitespace up to the indentation of the first code line in
 827 # the file (this deviation from Docutils behaviour allows indented blocks of
 828 # Python code). ::
 829
 830     def code_block_handler(self, block):
 831         """Convert indented literal blocks to source code format
 832         """
 833
 834 # If still unset, determine the indentation of code blocks from first non-blank
 835 # code line::
 836
 837         if self._codeindent == 0:
 838             self._codeindent = self.get_indent(block[0])
 839
 840 # Yield unindented lines after check whether we can safely unindent. If the
 841 # line is less indented then `_codeindent`, something got wrong. ::
 842
 843         for line in block:
 844             if line.lstrip() and self.get_indent(line) < self._codeindent:
 845                 raise ValueError, "code block contains line less indented " \
 846                       "than %d spaces \n%r"%(self._codeindent, block)
 847             yield line.replace(" "*self._codeindent, "", 1)
 848
 849
 850 # Code2Text
 851 # ---------
 852 #
 853 # The `Code2Text` converter does the opposite of `Text2Code`_ -- it processes
 854 # a source in "code format" (i.e. in a programming language), extracts
 855 # documentation from comment blocks, and puts program code in literal blocks.
 856 #
 857 # The class inherits the interface and helper functions from
 858 # TextCodeConverter_ and adds functions specific to the text-to-code  format
 859 # conversion::
 860
 861 class Code2Text(TextCodeConverter):
 862     """Convert code source to text source
 863     """
 864
 865 # set_state
 866 # ~~~~~~~~~
 867 #
 868 # Check if block is "header", "documentation", or "code_block":
 869 #
 870 # A paragraph is "documentation", if every non-blank line starts with a
 871 # matching comment string (including whitespace except for commented blank
 872 # lines) ::
 873
 874     def set_state(self, block):
 875         """Determine state of `block`."""
 876         for line in block:
 877             # skip documentation lines (commented, blank or blank comment)
 878             if (line.startswith(self.comment_string)
 879                 or not line.rstrip()
 880                 or line.rstrip() == self.comment_string.rstrip()
 881                ):
 882                 continue
 883             # non-commented line found:
 884             if self.state == "":
 885                 self.state = "header"
 886             else:
 887                 self.state = "code_block"
 888             break
 889         else:
 890             # no code line found
 891             # keep state if the block is just a blank line
 892             # if len(block) == 1 and self._is_blank_codeline(line):
 893             #     return
 894             self.state = "documentation"
 895
 896
 897 # header_handler
 898 # ~~~~~~~~~~~~~~
 899 #
 900 # Handle a leading code block. (See `Text2Code.header_handler`_ for a
 901 # discussion of the "header" state.) ::
 902
 903     def header_handler(self, lines):
 904         """Format leading code block"""
 905         if self.strip == True:
 906             return
 907         # get iterator over the lines that formats them as code-block
 908         lines = iter(self.code_block_handler(lines))
 909         # prepend header string to first line
 910         yield self.header_string + lines.next()
 911         # yield remaining lines
 912         for line in lines:
 913             yield line
 914
 915 # .. _Code2Text.documentation_handler:
 916 #
 917 # documentation_handler
 918 # ~~~~~~~~~~~~~~~~~~~~~
 919 #
 920 # The *documentation state* handler converts a comment to a documentation
 921 # block by stripping the leading `comment string` from every line::
 922
 923     def documentation_handler(self, block):
 924         """Uncomment documentation blocks in source code
 925         """
 926
 927 # Strip comment strings::
 928
 929         lines = [self.uncomment_line(line) for line in block]
 930
 931 # If the code block is stripped, the literal marker would lead to an
 932 # error when the text is converted with Docutils. Strip it as well. ::
 933
 934         if self.strip or self.strip_marker:
 935             self.strip_code_block_marker(lines)
 936
 937 # Otherwise, check for the `code_block_marker`_ at the end of the
 938 # documentation block (skipping directive options that might follow it)::
 939
 940         elif self.add_missing_marker:
 941             for line in lines[::-1]:
 942                 if self.marker_regexp.search(line):
 943                     self._add_code_block_marker = False
 944                     break
 945                 if (line.rstrip() and
 946                     not self.directive_option_regexp.search(line)):
 947                     self._add_code_block_marker = True
 948                     break
 949             else:
 950                 self._add_code_block_marker = True
 951
 952 # Yield lines::
 953
 954         for line in lines:
 955             yield line
 956
 957 # uncomment_line
 958 # ~~~~~~~~~~~~~~
 959 #
 960 # Return documentation line after stripping comment string. Consider the
 961 # case that a blank line has a comment string without trailing whitespace::
 962
 963     def uncomment_line(self, line):
 964         """Return uncommented documentation line"""
 965         line = line.replace(self.comment_string, "", 1)
 966         if line.rstrip() == self.stripped_comment_string:
 967             line = line.replace(self.stripped_comment_string, "", 1)
 968         return line
 969
 970 # .. _Code2Text.code_block_handler:
 971 #
 972 # code_block_handler
 973 # ~~~~~~~~~~~~~~~~~~
 974 #
 975 # The `code_block` handler returns the code block as indented literal
 976 # block (or filters it, if ``self.strip == True``). The amount of the code
 977 # indentation is controlled by `self.codeindent` (default 2).  ::
 978
 979     def code_block_handler(self, lines):
 980         """Covert code blocks to text format (indent or strip)
 981         """
 982         if self.strip == True:
 983             return
 984         # eventually insert transition marker
 985         if self._add_code_block_marker:
 986             self.state = "documentation"
 987             yield self.code_block_marker + "\n"
 988             yield "\n"
 989             self._add_code_block_marker = False
 990             self.state = "code_block"
 991         for line in lines:
 992             yield " "*self.codeindent + line
 993
 994
 995
 996 # strip_code_block_marker
 997 # ~~~~~~~~~~~~~~~~~~~~~~~
 998 #
 999 # Replace the literal marker with the equivalent of Docutils replace rules
1000 #
1001 # * strip ``::``-line (and preceding blank line) if on a line on its own
1002 # * strip ``::`` if it is preceded by whitespace.
1003 # * convert ``::`` to a single colon if preceded by text
1004 #
1005 # `lines` is a list of documentation lines (with a trailing blank line).
1006 # It is modified in-place::
1007
1008     def strip_code_block_marker(self, lines):
1009         try:
1010             line = lines[-2]
1011         except IndexError:
1012             return # just one line (no trailing blank line)
1013
1014         # match with regexp: `match` is None or has groups
1015         # \1 leading text, \2 code_block_marker, \3 remainder
1016         match = self.marker_regexp.search(line)
1017
1018         if not match:                 # no code_block_marker present
1019             return
1020         if not match.group(1):        # `code_block_marker` on an extra line
1021             del(lines[-2])
1022             # delete preceding line if it is blank
1023             if len(lines) >= 2 and not lines[-2].lstrip():
1024                 del(lines[-2])
1025         elif match.group(1).rstrip() < match.group(1):
1026             # '::' follows whitespace
1027             lines[-2] = match.group(1).rstrip() + match.group(3)
1028         else:                         # '::' follows text
1029             lines[-2] = match.group(1).rstrip() + ':' + match.group(3)
1030
1031 # Filters
1032 # =======
1033 #
1034 # Filters allow pre- and post-processing of the data to bring it in a format
1035 # suitable for the "normal" text<->code conversion. An example is conversion
1036 # of `C` ``/*`` ``*/`` comments into C++ ``//`` comments (and back).
1037 # Another example is the conversion of `C` ``/*`` ``*/`` comments into C++
1038 # ``//`` comments (and back).
1039 #
1040 # Filters are generator functions that return an iterator acting on a
1041 # `data` iterable and yielding processed `data` lines.
1042 #
1043 # identity_filter
1044 # ---------------
1045 #
1046 # The most basic filter is the identity filter, that returns its argument as
1047 # iterator::
1048
1049 def identity_filter(data):
1050     """Return data iterator without any processing"""
1051     return iter(data)
1052
1053 # expandtabs_filter
1054 # -----------------
1055 #
1056 # Expand hard-tabs in every line of `data` (cf. `str.expandtabs`).
1057 #
1058 # This filter is applied to the input data by `TextCodeConverter.convert`_ as
1059 # hard tabs can lead to errors when the indentation is changed. ::
1060
1061 def expandtabs_filter(data):
1062     """Yield data tokens with hard-tabs expanded"""
1063     for line in data:
1064         yield line.expandtabs()
1065
1066
1067 # collect_blocks
1068 # --------------
1069 #
1070 # A filter to aggregate "paragraphs" (blocks separated by blank
1071 # lines). Yields lists of lines::
1072
1073 def collect_blocks(lines):
1074     """collect lines in a list
1075
1076     yield list for each paragraph, i.e. block of lines separated by a
1077     blank line (whitespace only).
1078
1079     Trailing blank lines are collected as well.
1080     """
1081     blank_line_reached = False
1082     block = []
1083     for line in lines:
1084         if blank_line_reached and line.rstrip():
1085             yield block
1086             blank_line_reached = False
1087             block = [line]
1088             continue
1089         if not line.rstrip():
1090             blank_line_reached = True
1091         block.append(line)
1092     yield block
1093
1094
1095
1096 # dumb_c_preprocessor
1097 # -------------------
1098 #
1099 # This is a basic filter to convert `C` to `C++` comments. Works line-wise and
1100 # only converts lines that
1101 #
1102 # * start with "/\* " and end with " \*/" (followed by whitespace only)
1103 #
1104 # A more sophisticated version would also
1105 #
1106 # * convert multi-line comments
1107 #
1108 #   + Keep indentation or strip 3 leading spaces?
1109 #
1110 # * account for nested comments
1111 #
1112 # * only convert comments that are separated from code by a blank line
1113 #
1114 # ::
1115
1116 def dumb_c_preprocessor(data):
1117     """change `C` ``/* `` `` */`` comments into C++ ``// `` comments"""
1118     comment_string = defaults.comment_strings["c++"]
1119     boc_string = "/* "
1120     eoc_string = " */"
1121     for line in data:
1122         if (line.startswith(boc_string)
1123             and line.rstrip().endswith(eoc_string)
1124            ):
1125             line = line.replace(boc_string, comment_string, 1)
1126             line = "".join(line.rsplit(eoc_string, 1))
1127         yield line
1128
1129 # Unfortunately, the `replace` method of strings does not support negative
1130 # numbers for the `count` argument:
1131 #
1132 #   >>> "foo */ baz */ bar".replace(" */", "", -1) == "foo */ baz bar"
1133 #   False
1134 #
1135 # However, there is the `rsplit` method, that can be used together with `join`:
1136 #
1137 #   >>> "".join("foo */ baz */ bar".rsplit(" */", 1)) == "foo */ baz bar"
1138 #   True
1139 #
1140 # dumb_c_postprocessor
1141 # --------------------
1142 #
1143 # Undo the preparations by the dumb_c_preprocessor and re-insert valid comment
1144 # delimiters ::
1145
1146 def dumb_c_postprocessor(data):
1147     """change C++ ``// `` comments into `C` ``/* `` `` */`` comments"""
1148     comment_string = defaults.comment_strings["c++"]
1149     boc_string = "/* "
1150     eoc_string = " */"
1151     for line in data:
1152         if line.rstrip() == comment_string.rstrip():
1153             line = line.replace(comment_string, "", 1)
1154         elif line.startswith(comment_string):
1155             line = line.replace(comment_string, boc_string, 1)
1156             line = line.rstrip() + eoc_string + "\n"
1157         yield line
1158
1159
1160 # register filters
1161 # ----------------
1162 #
1163 # ::
1164
1165 defaults.preprocessors['c2text'] = dumb_c_preprocessor
1166 defaults.preprocessors['css2text'] = dumb_c_preprocessor
1167 defaults.postprocessors['text2c'] = dumb_c_postprocessor
1168 defaults.postprocessors['text2css'] = dumb_c_postprocessor
1169
1170
1171 # Command line use
1172 # ================
1173 #
1174 # Using this script from the command line will convert a file according to its
1175 # extension. This default can be overridden by a couple of options.
1176 #
1177 # Dual source handling
1178 # --------------------
1179 #
1180 # How to determine which source is up-to-date?
1181 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1182 #
1183 # - set modification date of `outfile` to the one of `infile`
1184 #
1185 #   Points out that the source files are 'synchronised'.
1186 #
1187 #   * Are there problems to expect from "backdating" a file? Which?
1188 #
1189 #     Looking at http://www.unix.com/showthread.php?t=20526, it seems
1190 #     perfectly legal to set `mtime` (while leaving `ctime`) as `mtime` is a
1191 #     description of the "actuality" of the data in the file.
1192 #
1193 #   * Should this become a default or an option?
1194 #
1195 # - alternatively move input file to a backup copy (with option: `--replace`)
1196 #
1197 # - check modification date before overwriting
1198 #   (with option: `--overwrite=update`)
1199 #
1200 # - check modification date before editing (implemented as `Jed editor`_
1201 #   function `pylit_check()` in `pylit.sl`_)
1202 #
1203 # .. _Jed editor: http://www.jedsoft.org/jed/
1204 # .. _pylit.sl: http://jedmodes.sourceforge.net/mode/pylit/
1205 #
1206 # Recognised Filename Extensions
1207 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1208 #
1209 # Instead of defining a new extension for "pylit" literate programs,
1210 # by default ``.txt`` will be appended for the text source and stripped by
1211 # the conversion to the code source. I.e. for a Python program foo:
1212 #
1213 # * the code source is called ``foo.py``
1214 # * the text source is called ``foo.py.txt``
1215 # * the html rendering is called ``foo.py.html``
1216 #
1217 #
1218 # OptionValues
1219 # ------------
1220 #
1221 # The following class adds `as_dict`_, `complete`_ and `__getattr__`_
1222 # methods to `optparse.Values`::
1223
1224 class OptionValues(optparse.Values):
1225
1226 # .. _OptionValues.as_dict:
1227 #
1228 # as_dict
1229 # ~~~~~~~
1230 #
1231 # For use as keyword arguments, it is handy to have the options in a
1232 # dictionary. `as_dict` returns a copy of the instances object dictionary::
1233
1234     def as_dict(self):
1235         """Return options as dictionary object"""
1236         return self.__dict__.copy()
1237
1238 # .. _OptionValues.complete:
1239 #
1240 # complete
1241 # ~~~~~~~~
1242 #
1243 # ::
1244
1245     def complete(self, **keyw):
1246         """
1247         Complete the option values with keyword arguments.
1248
1249         Do not overwrite existing values. Only use arguments that do not
1250         have a corresponding attribute in `self`,
1251         """
1252         for key in keyw:
1253             if not self.__dict__.has_key(key):
1254                 setattr(self, key, keyw[key])
1255
1256 # .. _OptionValues.__getattr__:
1257 #
1258 # __getattr__
1259 # ~~~~~~~~~~~
1260 #
1261 # To replace calls using ``options.ensure_value("OPTION", None)`` with the
1262 # more concise ``options.OPTION``, we define `__getattr__` [#]_ ::
1263
1264     def __getattr__(self, name):
1265         """Return default value for non existing options"""
1266         return None
1267
1268
1269 # .. [#] The special method `__getattr__` is only called when an attribute
1270 #        look-up has not found the attribute in the usual places (i.e. it is
1271 #        not an instance attribute nor is it found in the class tree for
1272 #        self).
1273 #
1274 #
1275 # PylitOptions
1276 # ------------
1277 #
1278 # The `PylitOptions` class comprises an option parser and methods for parsing
1279 # and completion of command line options::
1280
1281 class PylitOptions(object):
1282     """Storage and handling of command line options for pylit"""
1283
1284 # Instantiation
1285 # ~~~~~~~~~~~~~
1286 #
1287 # ::
1288
1289     def __init__(self):
1290         """Set up an `OptionParser` instance for pylit command line options
1291
1292         """
1293         p = optparse.OptionParser(usage=main.__doc__, version=_version)
1294
1295         # Conversion settings
1296
1297         p.add_option("-c", "--code2txt", dest="txt2code", action="store_false",
1298                      help="convert code source to text source")
1299         p.add_option("-t", "--txt2code", action="store_true",
1300                      help="convert text source to code source")
1301         p.add_option("--language",
1302                      choices = defaults.languages.values(),
1303                      help="use LANGUAGE native comment style")
1304         p.add_option("--comment-string", dest="comment_string",
1305                      help="documentation block marker in code source "
1306                      "(including trailing whitespace, "
1307                      "default: language dependent)")
1308         p.add_option("-m", "--code-block-marker", dest="code_block_marker",
1309                      help="syntax token starting a code block. (default '::')")
1310         p.add_option("--codeindent", type="int",
1311                      help="Number of spaces to indent code blocks with "
1312                      "text2code (default %d)" % defaults.codeindent)
1313
1314         # Output file handling
1315
1316         p.add_option("--overwrite", action="store",
1317                      choices = ["yes", "update", "no"],
1318                      help="overwrite output file (default 'update')")
1319         p.add_option("--replace", action="store_true",
1320                      help="move infile to a backup copy (appending '~')")
1321         # TODO: do we need this? If yes, make mtime update depend on it!
1322         # p.add_option("--keep-mtime", action="store_true",
1323         #              help="do not set the modification time of the outfile "
1324         #              "to the corresponding value of the infile")
1325         p.add_option("-s", "--strip", action="store_true",
1326                      help='"export" by stripping documentation or code')
1327
1328         # Special actions
1329
1330         p.add_option("-d", "--diff", action="store_true",
1331                      help="test for differences to existing file")
1332         p.add_option("--doctest", action="store_true",
1333                      help="run doctest.testfile() on the text version")
1334         p.add_option("-e", "--execute", action="store_true",
1335                      help="execute code (Python only)")
1336
1337         self.parser = p
1338
1339 # .. _PylitOptions.parse_args:
1340 #
1341 # parse_args
1342 # ~~~~~~~~~~
1343 #
1344 # The `parse_args` method calls the `optparse.OptionParser` on command
1345 # line or provided args and returns the result as `PylitOptions.Values`
1346 # instance. Defaults can be provided as keyword arguments::
1347
1348     def parse_args(self, args=sys.argv[1:], **keyw):
1349         """parse command line arguments using `optparse.OptionParser`
1350
1351            parse_args(args, **keyw) -> OptionValues instance
1352
1353             args --  list of command line arguments.
1354             keyw --  keyword arguments or dictionary of option defaults
1355         """
1356         # parse arguments
1357         (values, args) = self.parser.parse_args(args, OptionValues(keyw))
1358         # Convert FILE and OUTFILE positional args to option values
1359         # (other positional arguments are ignored)
1360         try:
1361             values.infile = args[0]
1362             values.outfile = args[1]
1363         except IndexError:
1364             pass
1365
1366         return values
1367
1368 # .. _PylitOptions.complete_values:
1369 #
1370 # complete_values
1371 # ~~~~~~~~~~~~~~~
1372 #
1373 # Complete an OptionValues instance `values`.  Use module-level defaults and
1374 # context information to set missing option values to sensible defaults (if
1375 # possible) ::
1376
1377     def complete_values(self, values):
1378         """complete option values with module and context sensible defaults
1379
1380         x.complete_values(values) -> values
1381         values -- OptionValues instance
1382         """
1383
1384 # Complete with module-level defaults_::
1385
1386         values.complete(**defaults.__dict__)
1387
1388 # Ensure infile is a string::
1389
1390         values.ensure_value("infile", "")
1391
1392 # Guess conversion direction from `infile` filename::
1393
1394         if values.txt2code is None:
1395             in_extension = os.path.splitext(values.infile)[1]
1396             if in_extension in values.text_extensions:
1397                 values.txt2code = True
1398             elif in_extension in values.languages.keys():
1399                 values.txt2code = False
1400
1401 # Auto-determine the output file name::
1402
1403         values.ensure_value("outfile", self._get_outfile_name(values))
1404
1405 # Second try: Guess conversion direction from outfile filename::
1406
1407         if values.txt2code is None:
1408             out_extension = os.path.splitext(values.outfile)[1]
1409             values.txt2code = not (out_extension in values.text_extensions)
1410
1411 # Set the language of the code::
1412
1413         if values.txt2code is True:
1414             code_extension = os.path.splitext(values.outfile)[1]
1415         elif values.txt2code is False:
1416             code_extension = os.path.splitext(values.infile)[1]
1417         values.ensure_value("language", values.languages[code_extension])
1418
1419         return values
1420
1421 # _get_outfile_name
1422 # ~~~~~~~~~~~~~~~~~
1423 #
1424 # Construct a matching filename for the output file. The output filename is
1425 # constructed from `infile` by the following rules:
1426 #
1427 # * '-' (stdin) results in '-' (stdout)
1428 # * strip the `text_extension`_ (txt2code) or
1429 # * add the `text_extension`_ (code2txt)
1430 # * fallback: if no guess can be made, add ".out"
1431 #
1432 #   .. TODO: use values.outfile_extension if it exists?
1433 #
1434 # ::
1435
1436     def _get_outfile_name(self, values):
1437         """Return a matching output filename for `infile`
1438         """
1439         # if input is stdin, default output is stdout
1440         if values.infile == '-':
1441             return '-'
1442
1443         # Derive from `infile` name: strip or add text extension
1444         (base, ext) = os.path.splitext(values.infile)
1445         if ext in values.text_extensions:
1446             return base # strip
1447         if ext in values.languages.keys() or values.txt2code == False:
1448             return values.infile + values.text_extensions[0] # add
1449         # give up
1450         return values.infile + ".out"
1451
1452 # .. _PylitOptions.__call__:
1453 #
1454 # __call__
1455 # ~~~~~~~~
1456 #
1457 # The special `__call__` method allows to use PylitOptions instances as
1458 # *callables*: Calling an instance parses the argument list to extract option
1459 # values and completes them based on "context-sensitive defaults".  Keyword
1460 # arguments are passed to `PylitOptions.parse_args`_ as default values. ::
1461
1462     def __call__(self, args=sys.argv[1:], **keyw):
1463         """parse and complete command line args return option values
1464         """
1465         values = self.parse_args(args, **keyw)
1466         return self.complete_values(values)
1467
1468
1469
1470 # Helper functions
1471 # ----------------
1472 #
1473 # open_streams
1474 # ~~~~~~~~~~~~
1475 #
1476 # Return file objects for in- and output. If the input path is missing,
1477 # write usage and abort. (An alternative would be to use stdin as default.
1478 # However,  this leaves the uninitiated user with a non-responding application
1479 # if (s)he just tries the script without any arguments) ::
1480
1481 def open_streams(infile = '-', outfile = '-', overwrite='update', **keyw):
1482     """Open and return the input and output stream
1483
1484     open_streams(infile, outfile) -> (in_stream, out_stream)
1485
1486     in_stream   --  file(infile) or sys.stdin
1487     out_stream  --  file(outfile) or sys.stdout
1488     overwrite   --  'yes': overwrite eventually existing `outfile`,
1489                     'update': fail if the `outfile` is newer than `infile`,
1490                     'no': fail if `outfile` exists.
1491
1492                     Irrelevant if `outfile` == '-'.
1493     """
1494     if not infile:
1495         strerror = "Missing input file name ('-' for stdin; -h for help)"
1496         raise IOError, (2, strerror, infile)
1497     if infile == '-':
1498         in_stream = sys.stdin
1499     else:
1500         in_stream = file(infile, 'r')
1501     if outfile == '-':
1502         out_stream = sys.stdout
1503     elif overwrite == 'no' and os.path.exists(outfile):
1504         raise IOError, (1, "Output file exists!", outfile)
1505     elif overwrite == 'update' and is_newer(outfile, infile) is None:
1506         raise IOError, (1, "Output file is as old as input file!", outfile)
1507     elif overwrite == 'update' and is_newer(outfile, infile):
1508         raise IOError, (1, "Output file is newer than input file!", outfile)
1509     else:
1510         out_stream = file(outfile, 'w')
1511     return (in_stream, out_stream)
1512
1513 # is_newer
1514 # ~~~~~~~~
1515 #
1516 # ::
1517
1518 def is_newer(path1, path2):
1519     """Check if `path1` is newer than `path2` (using mtime)
1520
1521     Compare modification time of files at path1 and path2.
1522
1523     Non-existing files are considered oldest: Return False if path1 does not
1524     exist and True if path2 does not exist.
1525
1526     Return None for equal modification time. (This evaluates to False in a
1527     Boolean context but allows a test for equality.)
1528
1529     """
1530     try:
1531         mtime1 = os.path.getmtime(path1)
1532     except OSError:
1533         mtime1 = -1
1534     try:
1535         mtime2 = os.path.getmtime(path2)
1536     except OSError:
1537         mtime2 = -1
1538     # print "mtime1", mtime1, path1, "\n", "mtime2", mtime2, path2
1539
1540     if mtime1 == mtime2:
1541         return None
1542     return mtime1 > mtime2
1543
1544
1545 # get_converter
1546 # ~~~~~~~~~~~~~
1547 #
1548 # Get an instance of the converter state machine::
1549
1550 def get_converter(data, txt2code=True, **keyw):
1551     if txt2code:
1552         return Text2Code(data, **keyw)
1553     else:
1554         return Code2Text(data, **keyw)
1555
1556
1557 # Use cases
1558 # ---------
1559 #
1560 # run_doctest
1561 # ~~~~~~~~~~~
1562 # ::
1563
1564 def run_doctest(infile="-", txt2code=True,
1565                 globs={}, verbose=False, optionflags=0, **keyw):
1566     """run doctest on the text source
1567     """
1568
1569 # Allow imports from the current working dir by prepending an empty string to
1570 # sys.path (see doc of sys.path())::
1571
1572     sys.path.insert(0, '')
1573
1574 # Import classes from the doctest module::
1575
1576     from doctest import DocTestParser, DocTestRunner
1577
1578 # Read in source. Make sure it is in text format, as tests in comments are not
1579 # found by doctest::
1580
1581     (data, out_stream) = open_streams(infile, "-")
1582     if txt2code is False:
1583         keyw.update({'add_missing_marker': False})
1584         converter = Code2Text(data, **keyw)
1585         docstring = str(converter)
1586     else:
1587         docstring = data.read()
1588
1589 # decode doc string if there is a "magic comment" in the first or second line
1590 # (http://docs.python.org/reference/lexical_analysis.html#encoding-declarations)
1591 # ::
1592
1593     firstlines = ' '.join(docstring.splitlines()[:2])
1594     match = re.search('coding[=:]\s*([-\w.]+)', firstlines)
1595     if match:
1596         docencoding = match.group(1)
1597         docstring = docstring.decode(docencoding)
1598
1599 # Use the doctest Advanced API to run all doctests in the source text::
1600
1601     test = DocTestParser().get_doctest(docstring, globs, name="",
1602                                        filename=infile, lineno=0)
1603     runner = DocTestRunner(verbose, optionflags)
1604     runner.run(test)
1605     runner.summarize
1606     # give feedback also if no failures occurred
1607     if not runner.failures:
1608         print "%d failures in %d tests"%(runner.failures, runner.tries)
1609     return runner.failures, runner.tries
1610
1611
1612 # diff
1613 # ~~~~
1614 #
1615 # ::
1616
1617 def diff(infile='-', outfile='-', txt2code=True, **keyw):
1618     """Report differences between converted infile and existing outfile
1619
1620     If outfile does not exist or is '-', do a round-trip conversion and
1621     report differences.
1622     """
1623
1624     import difflib
1625
1626     instream = file(infile)
1627     # for diffing, we need a copy of the data as list::
1628     data = instream.readlines()
1629     # convert
1630     converter = get_converter(data, txt2code, **keyw)
1631     new = converter()
1632
1633     if outfile != '-' and os.path.exists(outfile):
1634         outstream = file(outfile)
1635         old = outstream.readlines()
1636         oldname = outfile
1637         newname = "<conversion of %s>"%infile
1638     else:
1639         old = data
1640         oldname = infile
1641         # back-convert the output data
1642         converter = get_converter(new, not txt2code)
1643         new = converter()
1644         newname = "<round-conversion of %s>"%infile
1645
1646     # find and print the differences
1647     is_different = False
1648     # print type(old), old
1649     # print type(new), new
1650     delta = difflib.unified_diff(old, new,
1651     # delta = difflib.unified_diff(["heute\n", "schon\n"], ["heute\n", "noch\n"],
1652                                       fromfile=oldname, tofile=newname)
1653     for line in delta:
1654         is_different = True
1655         print line,
1656     if not is_different:
1657         print oldname
1658         print newname
1659         print "no differences found"
1660     return is_different
1661
1662
1663 # execute
1664 # ~~~~~~~
1665 #
1666 # Works only for python code.
1667 #
1668 # Does not work with `eval`, as code is not just one expression. ::
1669
1670 def execute(infile="-", txt2code=True, **keyw):
1671     """Execute the input file. Convert first, if it is a text source.
1672     """
1673
1674     data = file(infile)
1675     if txt2code:
1676         data = str(Text2Code(data, **keyw))
1677     # print "executing " + options.infile
1678     exec data
1679
1680
1681 # main
1682 # ----
1683 #
1684 # If this script is called from the command line, the `main` function will
1685 # convert the input (file or stdin) between text and code formats.
1686 #
1687 # Option default values for the conversion can be given as keyword arguments
1688 # to `main`_.  The option defaults will be updated by command line options and
1689 # extended with "intelligent guesses" by `PylitOptions`_ and passed on to
1690 # helper functions and the converter instantiation.
1691 #
1692 # This allows easy customisation for programmatic use -- just call `main`
1693 # with the appropriate keyword options, e.g. ``pylit.main(comment_string="## ")``
1694 #
1695 # ::
1696
1697 def main(args=sys.argv[1:], **defaults):
1698     """%prog [options] INFILE [OUTFILE]
1699
1700     Convert between (reStructured) text source with embedded code,
1701     and code source with embedded documentation (comment blocks)
1702
1703     The special filename '-' stands for standard in and output.
1704     """
1705
1706 # Parse and complete the options::
1707
1708     options = PylitOptions()(args, **defaults)
1709     # print "infile", repr(options.infile)
1710
1711 # Special actions with early return::
1712
1713     if options.doctest:
1714         return run_doctest(**options.as_dict())
1715
1716     if options.diff:
1717         return diff(**options.as_dict())
1718
1719     if options.execute:
1720         return execute(**options.as_dict())
1721
1722 # Open in- and output streams::
1723
1724     try:
1725         (data, out_stream) = open_streams(**options.as_dict())
1726     except IOError, ex:
1727         print "IOError: %s %s" % (ex.filename, ex.strerror)
1728         sys.exit(ex.errno)
1729
1730 # Get a converter instance::
1731
1732     converter = get_converter(data, **options.as_dict())
1733
1734 # Convert and write to out_stream::
1735
1736     out_stream.write(str(converter))
1737
1738     if out_stream is not sys.stdout:
1739         print "extract written to", out_stream.name
1740         out_stream.close()
1741
1742 # If input and output are from files, set the modification time (`mtime`) of
1743 # the output file to the one of the input file to indicate that the contained
1744 # information is equal. [#]_ ::
1745
1746
1747         # print "fractions?", os.stat_float_times()
1748         try:
1749             os.utime(options.outfile, (os.path.getatime(options.outfile),
1750                                        os.path.getmtime(options.infile))
1751                     )
1752         except OSError:
1753             pass
1754
1755     ## print "mtime", os.path.getmtime(options.infile),  options.infile
1756     ## print "mtime", os.path.getmtime(options.outfile), options.outfile
1757
1758
1759 # .. [#] Make sure the corresponding file object (here `out_stream`) is
1760 #        closed, as otherwise the change will be overwritten when `close` is
1761 #        called afterwards (either explicitly or at program exit).
1762 #
1763 #
1764 # Rename the infile to a backup copy if ``--replace`` is set::
1765
1766     if options.replace:
1767         os.rename(options.infile, options.infile + "~")
1768
1769
1770 # Run main, if called from the command line::
1771
1772 if __name__ == '__main__':
1773     main()
1774
1775
1776 # Open questions
1777 # ==============
1778 #
1779 # Open questions and ideas for further development
1780 #
1781 # Clean code
1782 # ----------
1783 #
1784 # * can we gain from using "shutils" over "os.path" and "os"?
1785 # * use pylint or pyChecker to enforce a consistent style?
1786 #
1787 # Options
1788 # -------
1789 #
1790 # * Use templates for the "intelligent guesses" (with Python syntax for string
1791 #   replacement with dicts: ``"hello %(what)s" % {'what': 'world'}``)
1792 #
1793 # * Is it sensible to offer the `header_string` option also as command line
1794 #   option?
1795 #
1796 # treatment of blank lines
1797 # ------------------------
1798 #
1799 # Alternatives: Keep blank lines blank
1800 #
1801 # - "never" (current setting) -> "visually merges" all documentation
1802 #    if there is no interjacent code
1803 #
1804 # - "always" -> disrupts documentation blocks,
1805 #
1806 # - "if empty" (no whitespace). Comment if there is whitespace.
1807 #
1808 #   This would allow non-obstructing markup but unfortunately this is (in
1809 #   most editors) also non-visible markup.
1810 #
1811 # + "if double" (if there is more than one consecutive blank line)
1812 #
1813 #   With this handling, the "visual gap" remains in both, text and code
1814 #   source.
1815 #
1816 #
1817 # Parsing Problems
1818 # ----------------
1819 #
1820 # * Ignore "matching comments" in literal strings?
1821 #
1822 #   Too complicated: Would need a specific detection algorithm for every
1823 #   language that supports multi-line literal strings (C++, PHP, Python)
1824 #
1825 # * Warn if a comment in code will become documentation after round-trip?
1826 #
1827 #
1828 # docstrings in code blocks
1829 # -------------------------
1830 #
1831 # * How to handle docstrings in code blocks? (it would be nice to convert them
1832 #   to rst-text if ``__docformat__ == restructuredtext``)
1833 #
1834 # TODO: Ask at Docutils users|developers
1835 #
1836 # Plug-ins
1837 # --------
1838 #
1839 # Specify a path for user additions and plug-ins. This would require to
1840 # convert Pylit from a pure module to a package...
1841 #
1842 #   6.4.3 Packages in Multiple Directories
1843 #
1844 #   Packages support one more special attribute, __path__. This is initialized
1845 #   to be a list containing the name of the directory holding the package's
1846 #   __init__.py before the code in that file is executed. This
1847 #   variable can be modified; doing so affects future searches for modules and
1848 #   subpackages contained in the package.
1849 #
1850 #   While this feature is not often needed, it can be used to extend the set
1851 #   of modules found in a package.
1852 #
1853 #
1854 # .. References
1855 #
1856 # .. _Docutils: http://docutils.sourceforge.net/
1857 # .. _Sphinx: http://sphinx.pocoo.org
1858 # .. _Pygments: http://pygments.org/
1859 # .. _code-block directive:
1860 #     http://docutils.sourceforge.net/sandbox/code-block-directive/
1861 # .. _literal block:
1862 # .. _literal blocks:
1863 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#literal-blocks
1864 # .. _indented literal block:
1865 # .. _indented literal blocks:
1866 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#indented-literal-blocks
1867 # .. _quoted literal block:
1868 # .. _quoted literal blocks:
1869 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#quoted-literal-blocks
1870 # .. _parsed-literal blocks:
1871 #     http://docutils.sf.net/docs/ref/rst/directives.html#parsed-literal-block
1872 # .. _doctest block:
1873 # .. _doctest blocks:
1874 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#doctest-blocks
1875 #
1876 # .. _feature request and patch by jrioux:
1877 #     http://developer.berlios.de/feature/?func=detailfeature&feature_id=4890&group_id=7974