rstdocs/examples/pylit.py.txt

   1 ..  #!/usr/bin/env python
   2   # -*- coding: iso-8859-1 -*-
   3
   4 ===============================================================
   5 pylit.py: Literate programming with reStructuredText
   6 ===============================================================
   7
   8 :Date:      $Date$
   9 :Version:   SVN-Revision $Revision$
  10 :URL:       $URL$
  11 :Copyright: 2005, 2007 Guenter Milde.
  12             Released under the terms of the GNU General Public License
  13             (v. 2 or later)
  14
  15 .. sectnum::
  16 .. contents::
  17
  18 Frontmatter
  19 ===========
  20
  21 Changelog
  22 ---------
  23
  24 :2005-06-29: Initial version.
  25 :2005-06-30: First literate version.
  26 :2005-07-01: Object orientated script using generators.
  27 :2005-07-10: Two state machine (later added 'header' state).
  28 :2006-12-04: Start of work on version 0.2 (code restructuring).
  29 :2007-01-23: 0.2   Published at http://pylit.berlios.de.
  30 :2007-01-25: 0.2.1 Outsourced non-core documentation to the PyLit pages.
  31 :2007-01-26: 0.2.2 New behaviour of `diff` function.
  32 :2007-01-29: 0.2.3 New `header` methods after suggestion by Riccardo Murri.
  33 :2007-01-31: 0.2.4 Raise Error if code indent is too small.
  34 :2007-02-05: 0.2.5 New command line option --comment-string.
  35 :2007-02-09: 0.2.6 Add section with open questions,
  36                    Code2Text: let only blank lines (no comment str)
  37                    separate text and code,
  38                    fix `Code2Text.header`.
  39 :2007-02-19: 0.2.7 Simplify `Code2Text.header`,
  40                    new `iter_strip` method replacing a lot of ``if``-s.
  41 :2007-02-22: 0.2.8 Set `mtime` of outfile to the one of infile.
  42 :2007-02-27: 0.3   New `Code2Text` converter after an idea by Riccardo Murri,
  43                    explicite `option_defaults` dict for easier customization.
  44 :2007-03-02: 0.3.1 Expand hard-tabs to prevent errors in indentation,
  45                    `Text2Code` now also works on blocks,
  46                    removed dependency on SimpleStates module.
  47 :2007-03-06: 0.3.2 Bugfix: do not set `language` in `option_defaults`
  48                    renamed `code_languages` to `languages`.
  49 :2007-03-16: 0.3.3 New language css,
  50                    option_defaults -> defaults = optparse.Values(),
  51                    simpler PylitOptions: don't store parsed values,
  52                    don't parse at initialization,
  53                    OptionValues: return `None` for non-existing attributes,
  54                    removed -infile and -outfile, use positional arguments.
  55 :2007-03-19: 0.3.4 Documentation update,
  56                    separate `execute` function.
  57 :2007-03-21:       Code cleanup in `Text2Code.__iter__`.
  58 :2007-03-23: 0.3.5 Removed "css" from known languages after learning that
  59                    there is no C++ style "// " comment string in CSS2.
  60 :2007-04-24: 0.3.6 Documentation update.
  61 :2007-05-18: 0.4   Implement Converter.__iter__ as stack of iterator
  62                    generators. Iterating over a converter instance now
  63                    yields lines instead of blocks.
  64                    Provide "hooks" for pre- and postprocessing filters.
  65                    Rename states to avoid confusion with formats:
  66                    "text" -> "documentation", "code" -> "code_block".
  67 :2007-05-22: 0.4.1 Converter.__iter__: cleanup and reorganization,
  68                    rename parent class Converter -> TextCodeConverter.
  69 :2007-05-23: 0.4.2 Merged Text2Code.converter and Code2Text.converter into
  70                    TextCodeConverter.converter.
  71 :2007-05-30: 0.4.3 Replaced use of defaults.code_extensions with
  72                    values.languages.keys().
  73                    Removed spurious `print` statement in code_block_handler.
  74                    Added basic support for 'c' and 'css' languages
  75                    with `dumb_c_preprocessor`_ and `dumb_c_postprocessor`_.
  76 :2007-06-06: 0.5   Moved `collect_blocks`_ out of `TextCodeConverter`_,
  77                    bugfix: collect all trailing blank lines into a block.
  78                    Expand tabs with `expandtabs_filter`_.
  79 :2007-06-20: 0.6   Configurable code-block marker (default ``::``)
  80
  81 ::
  82
  83   """pylit: bidirectional converter between a *text source* with embedded
  84   computer code and a *code source* with embedded documentation.
  85   """
  86
  87   __docformat__ = 'restructuredtext'
  88
  89   _version = "0.5"
  90
  91
  92 Introduction
  93 ------------
  94
  95 PyLit is a bidirectional converter between two formats of a computer
  96 program source:
  97
  98 * a (reStructured) text document with program code embedded in
  99   *code blocks*, and
 100 * a compilable (or executable) code source with *documentation* embedded in
 101   comment blocks
 102
 103
 104 Requirements
 105 ------------
 106
 107 ::
 108
 109   import re
 110   import os
 111   import sys
 112   import optparse
 113
 114 Customisation
 115 =============
 116
 117 defaults
 118 --------
 119
 120 The `defaults` object provides a central repository for default values
 121 and their customisation. ::
 122
 123   defaults = optparse.Values()
 124
 125 It is used for
 126
 127 * the initialization of data arguments in TextCodeConverter_ and
 128   PylitOptions_
 129
 130 * completion of command line options in `PylitOptions.complete_values`_.
 131
 132 This allows the easy creation of custom back-ends that customise the
 133 defaults and then call main_ e.g.:
 134
 135   >>> import pylit
 136   >>> defaults.comment_string = "## "
 137   >>> defaults.codeindent = 4
 138   >>> main()
 139
 140 The following default values are defined in pylit.py:
 141
 142 defaults.languages
 143 ~~~~~~~~~~~~~~~~~~
 144
 145 Mapping of code file extension to code language.
 146 Used by `OptionValues.complete`_ to set the `defaults.language`.
 147 The ``--language`` command line option or setting ``defaults.language`` in
 148 programmatic use override this auto-setting feature. ::
 149
 150   defaults.languages  = {".py": "python",
 151                          ".sl": "slang",
 152                          ".css": "css",
 153                          ".c": "c",
 154                          ".cc": "c++"}
 155
 156
 157 defaults.fallback_language
 158 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 159
 160 Language to use, if there is no matching extension (e.g. if pylit is used as
 161 filter) and no `language` is specified::
 162
 163   defaults.fallback_language = "python"
 164
 165 defaults.text_extensions
 166 ~~~~~~~~~~~~~~~~~~~~~~~~
 167
 168 List of known extensions of (reStructured) text files.
 169 Used by `OptionValues._get_outfile` to auto-determine the output filename.
 170 ::
 171
 172   defaults.text_extensions = [".txt"]
 173
 174
 175 defaults.comment_strings
 176 ~~~~~~~~~~~~~~~~~~~~~~~~
 177
 178 Dictionary of comment strings for known languages. Comment strings include
 179 trailing whitespace. ::
 180
 181   defaults.comment_strings = {"python": '# ',
 182                               "slang":  '% ',
 183                               "css":    '// ',
 184                               "c":      '// ',
 185                               "c++":    '// '}
 186
 187 Used in Code2Text_ to recognise text blocks and in Text2Code_ to format
 188 text blocks as comments.
 189
 190 defaults.header_string
 191 ~~~~~~~~~~~~~~~~~~~~~~
 192
 193 Marker string for a header code block in the text source. No trailing
 194 whitespace needed as indented code follows.
 195 Must be a valid rst directive that accepts code on the same line, e.g.
 196 ``'..admonition::'``.
 197
 198 Default is a comment marker::
 199
 200   defaults.header_string = '..'
 201
 202 defaults.code_block_marker
 203 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 204
 205 Marker string for a code block in the text source.
 206
 207 Default is a literal-block marker::
 208
 209   defaults.code_block_marker = '::'
 210
 211 In a document where code examples are only one of several uses of literal
 212 blocks, it is more appropriate to single out the sourcecode with a dedicated
 213 "code-block" directive.
 214
 215 Some highlight plug-ins require a special "sourcecode" or "code-block"
 216 directive instead of the ``::`` literal block marker. Actually,
 217 syntax-highlight is possible without changes to docutils with the Pygments_
 218 package using a "code-block" directive. See the `syntax highlight`_ section
 219 in the features documentation.
 220
 221 The `code_block_marker` string is used in a regular expression. Examples for
 222 alternative forms are ``.. code-block::`` or ``.. code-block:: .* python``.
 223 The second example can differentiate between Python code blocks and
 224 code-blocks in other languages.
 225
 226 Another use would be to mark some code-blocks inactive allowing a literate
 227 source to contain code-blocks that should become active only in some cases.
 228
 229
 230
 231 defaults.strip
 232 ~~~~~~~~~~~~~~
 233
 234 Export to the output format stripping documentation or code blocks::
 235
 236   defaults.strip = False
 237
 238 defaults.strip_marker
 239 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 240
 241 Strip literal marker from the end of documentation blocks when
 242 converting  to code format. Makes the code more concise but looses the
 243 synchronization of line numbers in text and code formats. Can also be used
 244 (together with the auto-completion of the code-text conversion) to change
 245 the `code_block_marker`::
 246
 247   defaults.strip_marker = False
 248
 249 defaults.preprocessors
 250 ~~~~~~~~~~~~~~~~~~~~~~
 251
 252 Preprocess the data with language-specific filters_
 253 Set below in Filters_::
 254
 255   defaults.preprocessors = {}
 256
 257 defaults.postprocessors
 258 ~~~~~~~~~~~~~~~~~~~~~~~
 259
 260 Postprocess the data with language-specific filters_::
 261
 262   defaults.postprocessors = {}
 263
 264 defaults.codeindent
 265 ~~~~~~~~~~~~~~~~~~~
 266
 267 Number of spaces to indent code blocks in `Code2Text.code_block_handler`_::
 268
 269   defaults.codeindent =  2
 270
 271 In `Text2Code.code_block_handler`_, the codeindent is determined by the
 272 first recognized code line (header or first indented literal block
 273 of the text source).
 274
 275 defaults.overwrite
 276 ~~~~~~~~~~~~~~~~~~
 277
 278 What to do if the outfile already exists? (ignored if `outfile` == '-')::
 279
 280   defaults.overwrite = 'update'
 281
 282 Recognized values:
 283
 284  :'yes':    overwrite eventually existing `outfile`,
 285  :'update': fail if the `outfile` is newer than `infile`,
 286  :'no':     fail if `outfile` exists.
 287
 288
 289 Extensions
 290 ----------
 291
 292 Try to import optional extensions::
 293
 294   try:
 295       import pylit_elisp
 296   except ImportError:
 297       pass
 298
 299
 300 Converter Classes
 301 =================
 302
 303 The converter classes implement a simple state machine to separate and
 304 transform documentation and code blocks. For this task, only a very limited
 305 parsing is needed. PyLit's parser assumes:
 306
 307 * `indented literal blocks`_ in a text source are code blocks.
 308
 309 * comment blocks in a code source where every line starts with a matching
 310   comment string are documentation blocks.
 311
 312 TextCodeConverter
 313 -----------------
 314 ::
 315
 316   class TextCodeConverter(object):
 317       """Parent class for the converters `Text2Code` and `Code2Text`.
 318       """
 319
 320 The parent class defines data attributes and functions used in both
 321 `Text2Code`_ converting a text source to executable code source, and
 322 `Code2Text`_ converting commented code to a text source.
 323
 324 Data attributes
 325 ~~~~~~~~~~~~~~~
 326
 327 Class default values are fetched from the `defaults`_ object and can be
 328 overridden by matching keyword arguments during class instantiation. This
 329 also works with keyword arguments to `get_converter`_ and `main`_, as these
 330 functions pass on unused keyword args to the instantiation of a converter
 331 class. ::
 332
 333       language = defaults.fallback_language
 334       comment_strings = defaults.comment_strings
 335       comment_string = "" # set in __init__ (if empty)
 336       codeindent =  defaults.codeindent
 337       header_string = defaults.header_string
 338       code_block_marker = defaults.code_block_marker
 339       strip = defaults.strip
 340       strip_marker = defaults.strip_marker
 341       state = "" # type of current block, see `TextCodeConverter.convert`_
 342
 343 Interface methods
 344 ~~~~~~~~~~~~~~~~~
 345
 346 TextCodeConverter.__init__
 347 """"""""""""""""""""""""""
 348
 349 Initializing sets the `data` attribute, an iterable object yielding lines of
 350 the source to convert. [1]_
 351
 352 Additional keyword arguments are stored as instance variables, overwriting
 353 the class defaults. If still empty, `comment_string` is set accordign to the
 354 `language`
 355
 356 ::
 357
 358       def __init__(self, data, **keyw):
 359           """data   --  iterable data object
 360                         (list, file, generator, string, ...)
 361              **keyw --  remaining keyword arguments are
 362                         stored as data-attributes
 363           """
 364           self.data = data
 365           self.__dict__.update(keyw)
 366           if not self.comment_string:
 367               self.comment_string = self.comment_strings[self.language]
 368
 369 Pre- and postprocessing filters are set (with
 370 `TextCodeConverter.get_filter`_)::
 371
 372           self.preprocessor = self.get_filter("preprocessors", self.language)
 373           self.postprocessor = self.get_filter("postprocessors", self.language)
 374
 375 Finally,  the regular_expression for the `code_block_marker` is compiled to
 376 find valid cases of code_block_marker in a given line and return the groups:
 377
 378 \1 prefix, \2 code_block_marker, \3 remainder
 379 ::
 380
 381           marker = self.code_block_marker
 382           if marker == '::':
 383               self.marker_regexp = re.compile('^( *(?!\.\.).*)(%s)([ \n]*)$'
 384                                               % marker)
 385           else:
 386               # assume code_block_marker is a directive like '.. code-block::'
 387               self.marker_regexp = re.compile('^( *)(%s)(.*)$' % marker)
 388
 389 .. [1] The most common choice of data is a `file` object with the text
 390        or code source.
 391
 392        To convert a string into a suitable object, use its splitlines method
 393        like ``"2 lines\nof source".splitlines(True)``.
 394
 395
 396 TextCodeConverter.__iter__
 397 """"""""""""""""""""""""""
 398
 399 Return an iterator for the instance. Iteration yields lines of converted
 400 data.
 401
 402 The iterator is a chain of iterators acting on `self.data` that does
 403
 404 * preprocessing
 405 * text<->code format conversion
 406 * postprocessing
 407
 408 Pre- and postprocessing are only performed, if filters for the current
 409 language are registered in `defaults.preprocessors`_ and|or
 410 `defaults.postprocessors`_. The filters must accept an iterable as first
 411 argument and yield the processed input data linewise.
 412 ::
 413
 414       def __iter__(self):
 415           """Iterate over input data source and yield converted lines
 416           """
 417           return self.postprocessor(self.convert(self.preprocessor(self.data)))
 418
 419
 420 TextCodeConverter.__call__
 421 """"""""""""""""""""""""""
 422 The special `__call__` method allows the use of class instances as callable
 423 objects. It returns the converted data as list of lines::
 424
 425       def __call__(self):
 426           """Iterate over state-machine and return results as list of lines"""
 427           return [line for line in self]
 428
 429
 430 TextCodeConverter.__str__
 431 """""""""""""""""""""""""
 432 Return converted data as string::
 433
 434       def __str__(self):
 435           return "".join(self())
 436
 437
 438 Helpers and convenience methods
 439 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 440
 441 TextCodeConverter.convert
 442 """""""""""""""""""""""""
 443
 444 The `convert` method generates an iterator that does the actual  code <-->
 445 text format conversion. The converted data is yielded line-wise and the
 446 instance's `status` argument indicates whether the current line is "header",
 447 "documentation", or "code_block"::
 448
 449       def convert(self, lines):
 450           """Iterate over lines of a program document and convert
 451           between "text" and "code" format
 452           """
 453
 454 Initialise internal data arguments. (Done here, so that every new iteration
 455 re-initialises them.)
 456
 457 `state`
 458   the "type" of the currently processed block of lines. One of
 459
 460   :"":              initial state: check for header,
 461   :"header":        leading code block: strip `header_string`,
 462   :"documentation": documentation part: comment out,
 463   :"code_block":    literal blocks containing source code: unindent.
 464
 465 `_codeindent`
 466   * Do not confuse the internal attribute `_codeindent` with the configurable
 467     `codeindent` (without the leading underscore).
 468   * `_codeindent` is set in `Text2Code.code_block_handler`_ to the indent of
 469     first non-blank "code_block" line and stripped from all "code_block" lines
 470     in the text-to-code conversion,
 471   * `codeindent` is set in `__init__` to `defaults.codeindent`_ and added to
 472     "code_block" lines in the code-to-text conversion.
 473
 474 `_textindent`
 475   * set by `Text2Code.documentation_handler`_ to the minimal indent of a
 476     documentation block,
 477   * used in `Text2Code.set_state`_ to find the end of a code block.
 478
 479 `code_block_marker_missing`
 480   If the last paragraph of a documentation block does not end with a
 481   "code_block_marker" (the literal-block marker ``::``), it must
 482   be added (otherwise, the back-conversion fails.).
 483
 484   `code_block_marker_missing` is set by `Code2Text.documentation_handler`_
 485   and evaluated by `Code2Text.code_block_handler`_, because the
 486   documentation_handler does not know whether the next bloc will be
 487   documentation (with no need for a code_block_marker) or a code block.
 488
 489 ::
 490
 491           self.state = ""
 492           self._codeindent = 0
 493           self._textindent = 0
 494           self.code_block_marker_missing = False
 495
 496 Determine the state of the block and convert with the matching "handler"::
 497
 498           for block in collect_blocks(expandtabs_filter(lines)):
 499               self.set_state(block)
 500               for line in getattr(self, self.state+"_handler")(block):
 501                   yield line
 502
 503
 504 TextCodeConverter.get_filter
 505 """"""""""""""""""""""""""""
 506 ::
 507
 508       def get_filter(self, filter_set, language):
 509           """Return language specific filter"""
 510           if self.__class__ == Text2Code:
 511               key = "text2"+language
 512           elif self.__class__ == Code2Text:
 513               key = language+"2text"
 514           else:
 515               key = ""
 516           try:
 517               return getattr(defaults, filter_set)[key]
 518           except (AttributeError, KeyError):
 519               # print "there is no %r filter in %r"%(key, filter_set)
 520               pass
 521           return identity_filter
 522
 523
 524 TextCodeConverter.get_indent
 525 """"""""""""""""""""""""""""
 526 Return the number of leading spaces in `line`::
 527
 528       def get_indent(self, line):
 529           """Return the indentation of `string`.
 530           """
 531           return len(line) - len(line.lstrip())
 532
 533
 534 Text2Code
 535 ---------
 536
 537 The `Text2Code` converter separates *code-blocks* [#]_ from *documentation*.
 538 Code blocks are unindented, documentation is commented (or filtered, if the
 539 ``strip`` option is True).
 540
 541 .. [#] Only `indented literal blocks`_ are considered code-blocks. `quoted
 542        literal blocks`_, `parsed-literal blocks`_, and `doctest blocks`_ are
 543        treated as part of the documentation. This allows the inclusion of
 544        examples:
 545
 546           >>> 23 + 3
 547           26
 548
 549        Mark that there is no double colon before the doctest block in the
 550        text source.
 551
 552 The class inherits the interface and helper functions from
 553 TextCodeConverter_ and adds functions specific to the text-to-code format
 554 conversion::
 555
 556   class Text2Code(TextCodeConverter):
 557       """Convert a (reStructured) text source to code source
 558       """
 559
 560 Text2Code.set_state
 561 ~~~~~~~~~~~~~~~~~~~~~
 562 ::
 563
 564       def set_state(self, block):
 565           """Determine state of `block`. Set `self.state`
 566           """
 567
 568 `set_state` is used inside an iteration. Hence, if we are out of data, a
 569 StopItertion exception should be raised::
 570
 571           if not block:
 572               raise StopIteration
 573
 574 The new state depends on the active state (from the last block) and
 575 features of the current block. It is either "header", "documentation", or
 576 "code_block".
 577
 578 If the current state is "" (first block), check for
 579 the  `header_string` indicating a leading code block::
 580
 581           if self.state == "":
 582               # print "set state for %r"%block
 583               if block[0].startswith(self.header_string):
 584                   self.state = "header"
 585               else:
 586                   self.state = "documentation"
 587
 588 If the current state is "documentation", the next block is also
 589 documentation. The end of a documentation part is detected in the
 590 `Text2Code.documentation_handler`_::
 591
 592           # elif self.state == "documentation":
 593           #    self.state = "documentation"
 594
 595 A "code_block" ends with the first less indented, nonblank line.
 596 `_textindent` is set by the documentation handler to the indent of the
 597 preceding documentation block::
 598
 599           elif self.state in ["code_block", "header"]:
 600               indents = [self.get_indent(line) for line in block]
 601               # print "set_state:", indents, self._textindent
 602               if indents and min(indents) <= self._textindent:
 603                   self.state = 'documentation'
 604               else:
 605                   self.state = 'code_block'
 606
 607 TODO: (or not to do?) insert blank line before the first line with too-small
 608 codeindent using self.ensure_trailing_blank_line(lines, line) (would need
 609 split and push-back of the documentation part)?
 610
 611 Text2Code.header_handler
 612 ~~~~~~~~~~~~~~~~~~~~~~~~
 613
 614 Sometimes code needs to remain on the first line(s) of the document to be
 615 valid. The most common example is the "shebang" line that tells a POSIX
 616 shell how to process an executable file::
 617
 618   #!/usr/bin/env python
 619
 620 In Python, the special comment to indicate the encoding, e.g.
 621 ``# -*- coding: iso-8859-1 -*-``, must occure before any other comment
 622 or code too.
 623
 624 If we want to keep the line numbers in sync for text and code source, the
 625 reStructured Text markup for these header lines must start at the same line
 626 as the first header line. Therfore, header lines could not be marked as
 627 literal block (this would require the ``::`` and an empty line above the
 628 code_block).
 629
 630 OTOH, a comment may start at the same line as the comment marker and it
 631 includes subsequent indented lines. Comments are visible in the reStructured
 632 Text source but hidden in the pretty-printed output.
 633
 634 With a header converted to comment in the text source, everything before
 635 the first documentation block (i.e. before the first paragraph using the
 636 matching comment string) will be hidden away (in HTML or PDF output).
 637
 638 This seems a good compromise, the advantages
 639
 640 * line numbers are kept
 641 * the "normal" code_block conversion rules (indent/unindent by `codeindent` apply
 642 * greater flexibility: you can hide a repeating header in a project
 643   consisting of many source files.
 644
 645 set off the disadvantages
 646
 647 - it may come as surprise if a part of the file is not "printed",
 648 - one more syntax element to learn for rst newbees to start with pylit,
 649   (however, starting from the code source, this will be auto-generated)
 650
 651 In the case that there is no matching comment at all, the complete code
 652 source will become a comment -- however, in this case it is not very likely
 653 the source is a literate document anyway.
 654
 655 If needed for the documentation, it is possible to quote the header in (or
 656 after) the first documentation block, e.g. as `parsed literal`.
 657 ::
 658
 659       def header_handler(self, lines):
 660           """Format leading code block"""
 661           # strip header string from first line
 662           lines[0] = lines[0].replace(self.header_string, "", 1)
 663           # yield remaining lines formatted as code-block
 664           for line in self.code_block_handler(lines):
 665               yield line
 666
 667
 668 Text2Code.documentation_handler
 669 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 670
 671 The 'documentation' handler processes everything that is not recognized as
 672 "code_block". Documentation is quoted with `self.comment_string`
 673 (or filtered with `--strip=True`). ::
 674
 675       def documentation_handler(self, lines):
 676           """Convert documentation blocks from text to code format
 677           """
 678
 679 Test for the end of the documentation block: does the second last line end
 680 with `::` but is neither a comment nor a directive?
 681
 682 If end-of-documentation marker is detected,
 683
 684 * set state to 'code_block'
 685 * set `self._textindent` (needed by `Text2Code.set_state`_ to find the
 686   next "documentation" block)
 687 * do not comment the last line (the blank line separating documentation
 688   and code blocks).
 689
 690 ::
 691
 692           endnum = len(lines) - 2
 693           for (num, line) in enumerate(lines):
 694               if not self.strip:
 695                   if self.state == "code_block":
 696                       yield line
 697                   else:
 698                       yield self.comment_string + line
 699               if (num == endnum and self.marker_regexp.search(line)):
 700                   self.state = "code_block"
 701                   self._textindent = self.get_indent(line)
 702
 703 TODO: Ensure a trailing blank line? Would need to test all documentation
 704 lines for end-of-documentation marker and add a line by calling the
 705 `ensure_trailing_blank_line` method (which also issues a warning)
 706
 707
 708 Text2Code.code_block_handler
 709 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 710
 711 The "code_block" handler is called with an indented literal block. It
 712 removes leading whitespace up to the indentation of the first code line in
 713 the file (this deviation from docutils behaviour allows indented blocks of
 714 Python code). ::
 715
 716       def code_block_handler(self, block):
 717           """Convert indented literal blocks to source code format
 718           """
 719
 720 If still unset, determine the indentation of code blocks from first non-blank
 721 code line::
 722
 723           if self._codeindent == 0:
 724               self._codeindent = self.get_indent(block[0])
 725
 726 Yield unindented lines after check whether we can safely unindent. If the
 727 line is less indented then `_codeindent`, something got wrong. ::
 728
 729           for line in block:
 730               if line.lstrip() and self.get_indent(line) < self._codeindent:
 731                   raise ValueError, "code block contains line less indented " \
 732                         "than %d spaces \n%r"%(self._codeindent, block)
 733               yield line.replace(" "*self._codeindent, "", 1)
 734
 735
 736 Code2Text
 737 ---------
 738
 739 The `Code2Text` converter does the opposite of `Text2Code`_ -- it processes
 740 a source in "code format" (i.e. in a programming language), extracts
 741 documentation from comment blocks, and puts program code in literal blocks.
 742
 743 The class inherits the interface and helper functions from
 744 TextCodeConverter_ and adds functions specific to the text-to-code  format
 745 conversion::
 746
 747   class Code2Text(TextCodeConverter):
 748       """Convert code source to text source
 749       """
 750
 751 Code2Text.set_state
 752 ~~~~~~~~~~~~~~~~~~~
 753
 754 Check if block is "header", "documentation", or "code_block":
 755
 756 A paragraph is "documentation", if every non-blank line starts with a
 757 matching comment string (including whitespace except for commented blank
 758 lines) ::
 759
 760       def set_state(self, block):
 761           """Determine state of `block`."""
 762           for line in block:
 763               # skip documentation lines (commented, blank or blank comment)
 764               if (line.startswith(self.comment_string)
 765                   or not line.rstrip()
 766                   or line.rstrip() == self.comment_string.rstrip()
 767                  ):
 768                   continue
 769               # non-commented line found:
 770               if self.state == "":
 771                   self.state = "header"
 772               else:
 773                   self.state = "code_block"
 774               break
 775           else:
 776               # no code line found
 777               # keep state if the block is just a blank line
 778               # if len(block) == 1 and self._is_blank_codeline(line):
 779               #     return
 780               self.state = "documentation"
 781
 782
 783 Code2Text.header_handler
 784 ~~~~~~~~~~~~~~~~~~~~~~~~
 785
 786 Handle a leading code block. (See `Text2Code.header_handler`_ for a
 787 discussion of the "header" state.) ::
 788
 789       def header_handler(self, lines):
 790           """Format leading code block"""
 791           if self.strip == True:
 792               return
 793           # get iterator over the lines that formats them as code-block
 794           lines = iter(self.code_block_handler(lines))
 795           # prepend header string to first line
 796           yield self.header_string + lines.next()
 797           # yield remaining lines
 798           for line in lines:
 799               yield line
 800
 801 Code2Text.documentation_handler
 802 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 803
 804 The *documentation state* handler converts a comment to a documentation
 805 block by stripping the leading `comment string` from every line::
 806
 807       def documentation_handler(self, lines):
 808           """Uncomment documentation blocks in source code
 809           """
 810
 811 If the code block is stripped, the literal marker would lead to an error
 812 when the text is converted with docutils. Strip it as well. Otherwise, check
 813 for the `code_block_marker` (default ``::``) at the end of the documentation
 814 block::
 815
 816           if self.strip or self.strip_marker:
 817               self.strip_code_block_marker(lines)
 818           else:
 819               try:
 820                   self.code_block_marker_missing = \
 821                       not self.marker_regexp.search(lines[-2])
 822               except IndexError:  # len(lines < 2), e.g. last line of document
 823                   self.code_block_marker_missing = True
 824
 825 Strip comment strings and yield lines. Consider the case that a blank line
 826 has a comment string without trailing whitespace::
 827
 828           stripped_comment_string = self.comment_string.rstrip()
 829
 830           for line in lines:
 831               line = line.replace(self.comment_string, "", 1)
 832               if line.rstrip() == stripped_comment_string:
 833                   line = line.replace(stripped_comment_string, "", 1)
 834               yield line
 835
 836
 837 Code2Text.code_block_handler
 838 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 839
 840 The `code_block` handler returns the code block as indented literal
 841 block (or filters it, if ``self.strip == True``). The amount of the code
 842 indentation is controled by `self.codeindent` (default 2).  ::
 843
 844       def code_block_handler(self, lines):
 845           """Covert code blocks to text format (indent or strip)
 846           """
 847           if self.strip == True:
 848               return
 849           # eventually insert transition marker
 850           if self.code_block_marker_missing:
 851               self.state = "documentation"
 852               yield "::\n"
 853               yield "\n"
 854               self.state = "code_block"
 855           for line in lines:
 856               yield " "*self.codeindent + line
 857
 858
 859
 860 Code2Text.strip_code_block_marker
 861 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 862
 863 Replace the literal marker with the equivalent of docutils replace rules
 864
 865 * strip `::`-line (and preceding blank line) if on a line on its own
 866 * strip `::` if it is preceded by whitespace.
 867 * convert `::` to a single colon if preceded by text
 868
 869 `lines` should be a list of documentation lines (with a trailing blank line).
 870 It is modified in-place::
 871
 872       def strip_code_block_marker(self, lines):
 873           try:
 874               line = lines[-2]
 875           except IndexError:
 876               return # just one line (no trailing blank line)
 877
 878           # match with regexp: `match` is None or has groups
 879           # \1 leading text, \2 code_block_marker, \3 remainder
 880           match = self.marker_regexp.search(line)
 881
 882           if not match:                 # no code_block_marker present
 883               return
 884           if not match.group(1):        # `code_block_marker` on an extra line
 885               del(lines[-2])
 886               # delete preceding line if it is blank
 887               if len(lines) >= 2 and not lines[-2].lstrip():
 888                   del(lines[-2])
 889           elif match.group(1).rstrip() < match.group(1):
 890               # '::' follows whitespace
 891               lines[-2] = match.group(1).rstrip() + match.group(3)
 892           else:                         # '::' follows text
 893               lines[-2] = match.group(1).rstrip() + ':' + match.group(3)
 894
 895 Filters
 896 =======
 897
 898 Filters allow pre- and post-processing of the data to bring it in a format
 899 suitable for the "normal" text<->code conversion. An example is conversion
 900 of `C` ``/*`` ``*/`` comments into C++ ``//`` comments (and back).
 901 Another example is the conversion of `C` ``/*`` ``*/`` comments into C++
 902 ``//`` comments (and back).
 903
 904 Filters are generator functions that return an iterator acting on a
 905 `data` iterable and yielding processed `data` lines.
 906
 907 identity_filter
 908 ---------------
 909
 910 The most basic filter is the identity filter, that returns its argument as
 911 iterator::
 912
 913   def identity_filter(data):
 914       """Return data iterator without any processing"""
 915       return iter(data)
 916
 917 expandtabs_filter
 918 -----------------
 919
 920 Expand hard-tabs in every line of `data` (cf. `str.expandtabs`).
 921
 922 This filter is applied to the input data by `TextCodeConverter.convert`_ as
 923 hard tabs can lead to errors when the indentation is changed. ::
 924
 925   def expandtabs_filter(data):
 926       """Yield data tokens with hard-tabs expanded"""
 927       for line in data:
 928           yield line.expandtabs()
 929
 930
 931 collect_blocks
 932 --------------
 933
 934 A filter to aggregate "paragraphs" (blocks separated by blank
 935 lines). Yields lists of lines::
 936
 937   def collect_blocks(lines):
 938       """collect lines in a list
 939
 940       yield list for each paragraph, i.e. block of lines separated by a
 941       blank line (whitespace only).
 942
 943       Trailing blank lines are collected as well.
 944       """
 945       blank_line_reached = False
 946       block = []
 947       for line in lines:
 948           if blank_line_reached and line.rstrip():
 949               yield block
 950               blank_line_reached = False
 951               block = [line]
 952               continue
 953           if not line.rstrip():
 954               blank_line_reached = True
 955           block.append(line)
 956       yield block
 957
 958
 959
 960 dumb_c_preprocessor
 961 -------------------
 962
 963 This is a basic filter to convert `C` to `C++` comments. Works line-wise and
 964 only converts lines that
 965
 966 * start with "/\* " and end with " \*/" (followed by whitespace only)
 967
 968 A more sophisticated version would also
 969
 970 * convert multi-line comments
 971
 972   + Keep indentation or strip 3 leading spaces?
 973
 974 * account for nested comments
 975
 976 * only convert comments that are separated from code by a blank line
 977
 978 ::
 979
 980   def dumb_c_preprocessor(data):
 981       """change `C` ``/* `` `` */`` comments into C++ ``// `` comments"""
 982       comment_string = defaults.comment_strings["c++"]
 983       boc_string = "/* "
 984       eoc_string = " */"
 985       for line in data:
 986           if (line.startswith(boc_string)
 987               and line.rstrip().endswith(eoc_string)
 988              ):
 989               line = line.replace(boc_string, comment_string, 1)
 990               line = "".join(line.rsplit(eoc_string, 1))
 991           yield line
 992
 993 Unfortunately, the `replace` method of strings does not support negative
 994 numbers for the `count` argument:
 995
 996 >>> "foo */ baz */ bar".replace(" */", "", -1) == "foo */ baz bar"
 997
 998 However, there is the `rsplit` method, that can be used together with `join`:
 999
1000 >>> "".join("foo */ baz */ bar".rsplit(" */", 1)) == "foo */ baz bar"
1001
1002
1003 dumb_c_postprocessor
1004 --------------------
1005
1006 Undo the preparations by the dumb_c_preprocessor and re-insert valid comment
1007 delimiters ::
1008
1009   def dumb_c_postprocessor(data):
1010       """change C++ ``// `` comments into `C` ``/* `` `` */`` comments"""
1011       comment_string = defaults.comment_strings["c++"]
1012       boc_string = "/* "
1013       eoc_string = " */"
1014       for line in data:
1015           if line.rstrip() == comment_string.rstrip():
1016               line = line.replace(comment_string, "", 1)
1017           elif line.startswith(comment_string):
1018               line = line.replace(comment_string, boc_string, 1)
1019               line = line.rstrip() + eoc_string + "\n"
1020           yield line
1021
1022
1023 register filters
1024 ----------------
1025
1026 ::
1027
1028   defaults.preprocessors['c2text'] = dumb_c_preprocessor
1029   defaults.preprocessors['css2text'] = dumb_c_preprocessor
1030   defaults.postprocessors['text2c'] = dumb_c_postprocessor
1031   defaults.postprocessors['text2css'] = dumb_c_postprocessor
1032
1033
1034 Command line use
1035 ================
1036
1037 Using this script from the command line will convert a file according to its
1038 extension. This default can be overridden by a couple of options.
1039
1040 Dual source handling
1041 --------------------
1042
1043 How to determine which source is up-to-date?
1044 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1045
1046 - set modification date of `oufile` to the one of `infile`
1047
1048   Points out that the source files are 'synchronized'.
1049
1050   * Are there problems to expect from "backdating" a file? Which?
1051
1052     Looking at http://www.unix.com/showthread.php?t=20526, it seems
1053     perfectly legal to set `mtime` (while leaving `ctime`) as `mtime` is a
1054     description of the "actuality" of the data in the file.
1055
1056   * Should this become a default or an option?
1057
1058 - alternatively move input file to a backup copy (with option: `--replace`)
1059
1060 - check modification date before overwriting
1061   (with option: `--overwrite=update`)
1062
1063 - check modification date before editing (implemented as `Jed editor`_
1064   function `pylit_check()` in `pylit.sl`_)
1065
1066 .. _Jed editor: http://www.jedsoft.org/jed/
1067 .. _pylit.sl: http://jedmodes.sourceforge.net/mode/pylit/
1068
1069 Recognised Filename Extensions
1070 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1071
1072 Instead of defining a new extension for "pylit" literate programms,
1073 by default ``.txt`` will be appended for the text source and stripped by
1074 the conversion to the code source. I.e. for a Python program foo:
1075
1076 * the code source is called ``foo.py``
1077 * the text source is called ``foo.py.txt``
1078 * the html rendering is called ``foo.py.html``
1079
1080
1081 OptionValues
1082 ------------
1083
1084 The following class adds `as_dict` and `__getattr__` methods to
1085 `optparse.Values`::
1086
1087   class OptionValues(optparse.Values):
1088
1089 OptionValues.as_dict
1090 ~~~~~~~~~~~~~~~~~~~~
1091
1092 For use as keyword arguments, it is handy to have the options in a
1093 dictionary. `as_dict` returns a copy of the instances object dictionary::
1094
1095       def as_dict(self):
1096           """Return options as dictionary object"""
1097           return self.__dict__.copy()
1098
1099 OptionValues.complete
1100 ~~~~~~~~~~~~~~~~~~~~~
1101
1102 ::
1103
1104       def complete(self, **keyw):
1105           """
1106           Complete the option values with keyword arguments.
1107
1108           Do not overwrite existing values. Only use arguments that do not
1109           have a corresponding attribute in `self`,
1110           """
1111           for key in keyw:
1112               if not self.__dict__.has_key(key):
1113                   setattr(self, key, keyw[key])
1114
1115 OptionValues.__getattr__
1116 ~~~~~~~~~~~~~~~~~~~~~~~~
1117
1118 To replace calls using ``options.ensure_value("OPTION", None)`` with the
1119 more concise ``options.OPTION``, we define `__getattr__` [#]_ ::
1120
1121       def __getattr__(self, name):
1122           """Return default value for non existing options"""
1123           return None
1124
1125
1126 .. [#] The special method `__getattr__` is only called when an attribute
1127        lookup has not found the attribute in the usual places (i.e. it is
1128        not an instance attribute nor is it found in the class tree for
1129        self).
1130
1131
1132 PylitOptions
1133 ------------
1134
1135 The `PylitOptions` class comprises an option parser and methods for parsing
1136 and completion of command line options::
1137
1138   class PylitOptions(object):
1139       """Storage and handling of command line options for pylit"""
1140
1141 Instantiation
1142 ~~~~~~~~~~~~~
1143
1144 ::
1145
1146       def __init__(self):
1147           """Set up an `OptionParser` instance for pylit command line options
1148
1149           """
1150           p = optparse.OptionParser(usage=main.__doc__, version=_version)
1151           # add the options
1152           p.add_option("-c", "--code2txt", dest="txt2code", action="store_false",
1153                        help="convert code source to text source")
1154           p.add_option("--comment-string", dest="comment_string",
1155                        help="documentation block marker (default '# ' (for python))" )
1156           p.add_option("-d", "--diff", action="store_true",
1157                        help="test for differences to existing file")
1158           p.add_option("--doctest", action="store_true",
1159                        help="run doctest.testfile() on the text version")
1160           p.add_option("-e", "--execute", action="store_true",
1161                        help="execute code (Python only)")
1162           p.add_option("--language", action="store",
1163                        choices = defaults.languages.values(),
1164                        help="use LANGUAGE native comment style")
1165           p.add_option("--overwrite", action="store",
1166                        choices = ["yes", "update", "no"],
1167                        help="overwrite output file (default 'update')")
1168           p.add_option("--replace", action="store_true",
1169                        help="move infile to a backup copy (appending '~')")
1170           p.add_option("-s", "--strip", action="store_true",
1171                        help="export by stripping documentation or code")
1172           p.add_option("-t", "--txt2code", action="store_true",
1173                        help="convert text source to code source")
1174           self.parser = p
1175
1176
1177 PylitOptions.parse_args
1178 ~~~~~~~~~~~~~~~~~~~~~~~
1179
1180 The `parse_args` method calls the `optparse.OptionParser` on command
1181 line or provided args and returns the result as `PylitOptions.Values`
1182 instance. Defaults can be provided as keyword arguments::
1183
1184       def parse_args(self, args=sys.argv[1:], **keyw):
1185           """parse command line arguments using `optparse.OptionParser`
1186
1187              parse_args(args, **keyw) -> OptionValues instance
1188
1189               args --  list of command line arguments.
1190               keyw --  keyword arguments or dictionary of option defaults
1191           """
1192           # parse arguments
1193           (values, args) = self.parser.parse_args(args, OptionValues(keyw))
1194           # Convert FILE and OUTFILE positional args to option values
1195           # (other positional arguments are ignored)
1196           try:
1197               values.infile = args[0]
1198               values.outfile = args[1]
1199           except IndexError:
1200               pass
1201
1202           return values
1203
1204 PylitOptions.complete_values
1205 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1206
1207 Complete an OptionValues instance `values`.  Use module-level defaults and
1208 context information to set missing option values to sensible defaults (if
1209 possible) ::
1210
1211       def complete_values(self, values):
1212           """complete option values with module and context sensible defaults
1213
1214           x.complete_values(values) -> values
1215           values -- OptionValues instance
1216           """
1217
1218 Complete with module-level defaults_::
1219
1220           values.complete(**defaults.__dict__)
1221
1222 Ensure infile is a string::
1223
1224           values.ensure_value("infile", "")
1225
1226 Guess conversion direction from `infile` filename::
1227
1228           if values.txt2code is None:
1229               in_extension = os.path.splitext(values.infile)[1]
1230               if in_extension in values.text_extensions:
1231                   values.txt2code = True
1232               elif in_extension in values.languages.keys():
1233                   values.txt2code = False
1234
1235 Auto-determine the output file name::
1236
1237           values.ensure_value("outfile", self._get_outfile_name(values))
1238
1239 Second try: Guess conversion direction from outfile filename::
1240
1241           if values.txt2code is None:
1242               out_extension = os.path.splitext(values.outfile)[1]
1243               values.txt2code = not (out_extension in values.text_extensions)
1244
1245 Set the language of the code::
1246
1247           if values.txt2code is True:
1248               code_extension = os.path.splitext(values.outfile)[1]
1249           elif values.txt2code is False:
1250               code_extension = os.path.splitext(values.infile)[1]
1251           values.ensure_value("language",
1252                               values.languages.get(code_extension,
1253                                                    values.fallback_language))
1254
1255           return values
1256
1257 PylitOptions._get_outfile_name
1258 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1259
1260 Construct a matching filename for the output file. The output filename is
1261 constructed from `infile` by the following rules:
1262
1263 * '-' (stdin) results in '-' (stdout)
1264 * strip the `txt_extension` (txt2code) or
1265 * add a `txt_ extension` (code2txt)
1266 * fallback: if no guess can be made, add ".out"
1267
1268   .. TODO: use values.outfile_extension if it exists?
1269
1270 ::
1271
1272       def _get_outfile_name(self, values):
1273           """Return a matching output filename for `infile`
1274           """
1275           # if input is stdin, default output is stdout
1276           if values.infile == '-':
1277               return '-'
1278
1279           # Derive from `infile` name: strip or add text extension
1280           (base, ext) = os.path.splitext(values.infile)
1281           if ext in values.text_extensions:
1282               return base # strip
1283           if ext in values.languages.keys() or values.txt2code == False:
1284               return values.infile + values.text_extensions[0] # add
1285           # give up
1286           return values.infile + ".out"
1287
1288 PylitOptions.__call__
1289 ~~~~~~~~~~~~~~~~~~~~~
1290
1291 The special `__call__` method allows to use PylitOptions instances as
1292 *callables*: Calling an instance parses the argument list to extract option
1293 values and completes them based on "context-sensitive defaults".  Keyword
1294 arguments are passed to `PylitOptions.parse_args`_ as default values. ::
1295
1296       def __call__(self, args=sys.argv[1:], **keyw):
1297           """parse and complete command line args return option values
1298           """
1299           values = self.parse_args(args, **keyw)
1300           return self.complete_values(values)
1301
1302
1303
1304 Helper functions
1305 ----------------
1306
1307 open_streams
1308 ~~~~~~~~~~~~
1309
1310 Return file objects for in- and output. If the input path is missing,
1311 write usage and abort. (An alternative would be to use stdin as default.
1312 However,  this leaves the uninitiated user with a non-responding application
1313 if (s)he just tries the script without any arguments) ::
1314
1315   def open_streams(infile = '-', outfile = '-', overwrite='update', **keyw):
1316       """Open and return the input and output stream
1317
1318       open_streams(infile, outfile) -> (in_stream, out_stream)
1319
1320       in_stream   --  file(infile) or sys.stdin
1321       out_stream  --  file(outfile) or sys.stdout
1322       overwrite   --  'yes': overwrite eventually existing `outfile`,
1323                       'update': fail if the `outfile` is newer than `infile`,
1324                       'no': fail if `outfile` exists.
1325
1326                       Irrelevant if `outfile` == '-'.
1327       """
1328       if not infile:
1329           strerror = "Missing input file name ('-' for stdin; -h for help)"
1330           raise IOError, (2, strerror, infile)
1331       if infile == '-':
1332           in_stream = sys.stdin
1333       else:
1334           in_stream = file(infile, 'r')
1335       if outfile == '-':
1336           out_stream = sys.stdout
1337       elif overwrite == 'no' and os.path.exists(outfile):
1338           raise IOError, (1, "Output file exists!", outfile)
1339       elif overwrite == 'update' and is_newer(outfile, infile):
1340           raise IOError, (1, "Output file is newer than input file!", outfile)
1341       else:
1342           out_stream = file(outfile, 'w')
1343       return (in_stream, out_stream)
1344
1345 is_newer
1346 ~~~~~~~~
1347
1348 ::
1349
1350   def is_newer(path1, path2):
1351       """Check if `path1` is newer than `path2` (using mtime)
1352
1353       Compare modification time of files at path1 and path2.
1354
1355       Non-existing files are considered oldest: Return False if path1 doesnot
1356       exist and True if path2 doesnot exist.
1357
1358       Return None for equal modification time. (This evaluates to False in a
1359       boolean context but allows a test for equality.)
1360
1361       """
1362       try:
1363           mtime1 = os.path.getmtime(path1)
1364       except OSError:
1365           mtime1 = -1
1366       try:
1367           mtime2 = os.path.getmtime(path2)
1368       except OSError:
1369           mtime2 = -1
1370       # print "mtime1", mtime1, path1, "\n", "mtime2", mtime2, path2
1371
1372       if mtime1 == mtime2:
1373           return None
1374       return mtime1 > mtime2
1375
1376
1377 get_converter
1378 ~~~~~~~~~~~~~
1379
1380 Get an instance of the converter state machine::
1381
1382   def get_converter(data, txt2code=True, **keyw):
1383       if txt2code:
1384           return Text2Code(data, **keyw)
1385       else:
1386           return Code2Text(data, **keyw)
1387
1388
1389 Use cases
1390 ---------
1391
1392 run_doctest
1393 ~~~~~~~~~~~
1394
1395 ::
1396
1397   def run_doctest(infile="-", txt2code=True,
1398                   globs={}, verbose=False, optionflags=0, **keyw):
1399       """run doctest on the text source
1400       """
1401       from doctest import DocTestParser, DocTestRunner
1402       (data, out_stream) = open_streams(infile, "-")
1403
1404 If source is code, convert to text, as tests in comments are not found by
1405 doctest::
1406
1407       if txt2code is False:
1408           converter = Code2Text(data, **keyw)
1409           docstring = str(converter)
1410       else:
1411           docstring = data.read()
1412
1413 Use the doctest Advanced API to do all doctests in a given string::
1414
1415       test = DocTestParser().get_doctest(docstring, globs={}, name="",
1416                                              filename=infile, lineno=0)
1417       runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
1418       runner.run(test)
1419       runner.summarize
1420       if not runner.failures:
1421           print "%d failures in %d tests"%(runner.failures, runner.tries)
1422       return runner.failures, runner.tries
1423
1424
1425 diff
1426 ~~~~
1427
1428 ::
1429
1430   def diff(infile='-', outfile='-', txt2code=True, **keyw):
1431       """Report differences between converted infile and existing outfile
1432
1433       If outfile is '-', do a round-trip conversion and report differences
1434       """
1435
1436       import difflib
1437
1438       instream = file(infile)
1439       # for diffing, we need a copy of the data as list::
1440       data = instream.readlines()
1441       # convert
1442       converter = get_converter(data, txt2code, **keyw)
1443       new = converter()
1444
1445       if outfile != '-':
1446           outstream = file(outfile)
1447           old = outstream.readlines()
1448           oldname = outfile
1449           newname = "<conversion of %s>"%infile
1450       else:
1451           old = data
1452           oldname = infile
1453           # back-convert the output data
1454           converter = get_converter(new, not txt2code)
1455           new = converter()
1456           newname = "<round-conversion of %s>"%infile
1457
1458       # find and print the differences
1459       is_different = False
1460       # print type(old), old
1461       # print type(new), new
1462       delta = difflib.unified_diff(old, new,
1463       # delta = difflib.unified_diff(["heute\n", "schon\n"], ["heute\n", "noch\n"],
1464                                         fromfile=oldname, tofile=newname)
1465       for line in delta:
1466           is_different = True
1467           print line,
1468       if not is_different:
1469           print oldname
1470           print newname
1471           print "no differences found"
1472       return is_different
1473
1474
1475 execute
1476 ~~~~~~~
1477
1478 Works only for python code.
1479
1480 Doesnot work with `eval`, as code is not just one expression. ::
1481
1482   def execute(infile="-", txt2code=True, **keyw):
1483       """Execute the input file. Convert first, if it is a text source.
1484       """
1485
1486       data = file(infile)
1487       if txt2code:
1488           data = str(Text2Code(data, **keyw))
1489       # print "executing " + options.infile
1490       exec data
1491
1492
1493 main
1494 ----
1495
1496 If this script is called from the command line, the `main` function will
1497 convert the input (file or stdin) between text and code formats.
1498
1499 Option default values for the conversion can be given as keyword arguments
1500 to `main`_.  The option defaults will be updated by command line options and
1501 extended with "intelligent guesses" by `PylitOptions`_ and passed on to
1502 helper functions and the converter instantiation.
1503
1504 This allows easy customization for programmatic use -- just call `main`
1505 with the appropriate keyword options, e.g.:
1506
1507 >>> main(comment_string="## ")
1508
1509 ::
1510
1511   def main(args=sys.argv[1:], **defaults):
1512       """%prog [options] INFILE [OUTFILE]
1513
1514       Convert between (reStructured) text source with embedded code,
1515       and code source with embedded documentation (comment blocks)
1516
1517       The special filename '-' stands for standard in and output.
1518       """
1519
1520 Parse and complete the options::
1521
1522       options = PylitOptions()(args, **defaults)
1523       # print "infile", repr(options.infile)
1524
1525 Special actions with early return::
1526
1527       if options.doctest:
1528           return run_doctest(**options.as_dict())
1529
1530       if options.diff:
1531           return diff(**options.as_dict())
1532
1533       if options.execute:
1534           return execute(**options.as_dict())
1535
1536 Open in- and output streams::
1537
1538       try:
1539           (data, out_stream) = open_streams(**options.as_dict())
1540       except IOError, ex:
1541           print "IOError: %s %s" % (ex.filename, ex.strerror)
1542           sys.exit(ex.errno)
1543
1544 Get a converter instance::
1545
1546       converter = get_converter(data, **options.as_dict())
1547
1548 Convert and write to out_stream::
1549
1550       out_stream.write(str(converter))
1551
1552       if out_stream is not sys.stdout:
1553           print "extract written to", out_stream.name
1554           out_stream.close()
1555
1556 If input and output are from files, set the modification time (`mtime`) of
1557 the output file to the one of the input file to indicate that the contained
1558 information is equal. [#]_ ::
1559
1560           try:
1561               os.utime(options.outfile, (os.path.getatime(options.outfile),
1562                                          os.path.getmtime(options.infile))
1563                       )
1564           except OSError:
1565               pass
1566
1567       ## print "mtime", os.path.getmtime(options.infile),  options.infile
1568       ## print "mtime", os.path.getmtime(options.outfile), options.outfile
1569
1570
1571 .. [#] Make sure the corresponding file object (here `out_stream`) is
1572        closed, as otherwise the change will be overwritten when `close` is
1573        called afterwards (either explicitely or at program exit).
1574
1575
1576 Rename the infile to a backup copy if ``--replace`` is set::
1577
1578       if options.replace:
1579           os.rename(options.infile, options.infile + "~")
1580
1581
1582 Run main, if called from the command line::
1583
1584   if __name__ == '__main__':
1585       main()
1586
1587
1588 Open questions
1589 ==============
1590
1591 Open questions and ideas for further development
1592
1593 Clean code
1594 ----------
1595
1596 * can we gain from using "shutils" over "os.path" and "os"?
1597 * use pylint or pyChecker to enfoce a consistent style?
1598
1599 Options
1600 -------
1601
1602 * Use templates for the "intelligent guesses" (with Python syntax for string
1603   replacement with dicts: ``"hello %(what)s" % {'what': 'world'}``)
1604
1605 * Is it sensible to offer the `header_string` option also as command line
1606   option?
1607
1608 * treatment of blank lines:
1609
1610   * Alternatives: Keep blank lines blank
1611
1612     + "always",
1613
1614     + "if empty" (no whitespace). Comment if there is whitespace.
1615
1616       This would allow non-obstructing markup but unfortunately this is (in
1617       most editors) also non-visible markup -> bad.
1618
1619     + "if double" (if there is more than one consecutive blank line)
1620
1621     + "never" (current setting)
1622
1623   So the setting could be something like::
1624
1625     defaults.keep_blank_lines = { "python": "if double",
1626                                   "elisp": "always"}
1627
1628
1629 Parsing Problems
1630 ----------------------
1631
1632 * Ignore "matching comments" in literal strings?
1633
1634   Too complicated: Would need a specific detection algorithm for every
1635   language that supports multi-line literal strings (C++, PHP, Python)
1636
1637 * Warn if a comment in code will become documentation after round-trip?
1638
1639
1640 doctstrings in code blocks
1641 --------------------------
1642
1643 * How to handle docstrings in code blocks? (it would be nice to convert them
1644   to rst-text if ``__docformat__ == restructuredtext``)
1645
1646 TODO: Ask at docutils users|developers
1647
1648 .. References
1649
1650 .. _docutils:
1651     http://docutils.sourceforge.net/
1652 .. _indented literal block:
1653 .. _indented literal blocks:
1654     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#indented-literal-blocks
1655 .. _quoted literal block:
1656 .. _quoted literal blocks:
1657     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#quoted-literal-blocks
1658 .. _doctest block:
1659 .. _doctest blocks:
1660     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#doctest-blocks
1661 .. _pygments: http://pygments.org/
1662 .. _syntax highlight: ../features/syntax-highlight.html
1663 .. _parsed-literal blocks:
1664     http://docutils.sf.net/docs/ref/rst/directives.html#parsed-literal-block