pylit.py

   1 #!/usr/bin/env python
   2 # -*- coding: utf8 -*-
   3
   4 """pylit: bidirectional text <-> code converter
   5
   6 Covert between a *text source* with embedded computer code
   7 and a *code source* with embedded documentation.
   8 """
   9
  10 from __future__ import print_function
  11
  12 # pylit.py
  13 # ********
  14 # Literate programming with reStructuredText
  15 # ++++++++++++++++++++++++++++++++++++++++++
  16 #
  17 # :Copyright: © 2005, 2007, 2015, 2021 Günter Milde.
  18 #             Released without warranty under the terms of the
  19 #             GNU General Public License (v. 3 or later)
  20 #
  21 # .. contents::
  22 #
  23 # Frontmatter
  24 # ===========
  25 #
  26 # Changelog
  27 # ---------
  28 #
  29 # .. class:: borderless
  30 #
  31 # ======  ==========  ==========================================================
  32 # 0.1     2005-06-29  Initial version.
  33 # 0.1.1   2005-06-30  First literate version.
  34 # 0.1.2   2005-07-01  Object oriented script using generators.
  35 # 0.1.3   2005-07-10  Two state machine (later added 'header' state).
  36 # 0.2b    2006-12-04  Start of work on version 0.2 (code restructuring).
  37 # 0.2     2007-01-23  Published at ``pylit.berlios.de``.
  38 # 0.2.1   2007-01-25  Outsourced non-core documentation to the PyLit pages.
  39 # 0.2.2   2007-01-26  New behaviour of `diff` function.
  40 # 0.2.3   2007-01-29  New `header` methods after suggestion by Riccardo Murri.
  41 # 0.2.4   2007-01-31  Raise Error if code indent is too small.
  42 # 0.2.5   2007-02-05  New command line option --comment-string.
  43 # 0.2.6   2007-02-09  Add section with open questions,
  44 #                     Code2Text: let only blank lines (no comment str)
  45 #                     separate text and code,
  46 #                     fix `Code2Text.header`.
  47 # 0.2.7   2007-02-19  Simplify `Code2Text.header`,
  48 #                     new `iter_strip` method replacing a lot of ``if``-s.
  49 # 0.2.8   2007-02-22  Set `mtime` of outfile to the one of infile.
  50 # 0.3     2007-02-27  New `Code2Text` converter after an idea by Riccardo Murri,
  51 #                     explicit `option_defaults` dict for easier customisation.
  52 # 0.3.1   2007-03-02  Expand hard-tabs to prevent errors in indentation,
  53 #                     `Text2Code` now also works on blocks,
  54 #                     removed dependency on SimpleStates module.
  55 # 0.3.2   2007-03-06  Bug fix: do not set `language` in `option_defaults`
  56 #                     renamed `code_languages` to `languages`.
  57 # 0.3.3   2007-03-16  New language css,
  58 #                     option_defaults -> defaults = optparse.Values(),
  59 #                     simpler PylitOptions: don't store parsed values,
  60 #                     don't parse at initialisation,
  61 #                     OptionValues: return `None` for non-existing attributes,
  62 #                     removed -infile and -outfile, use positional arguments.
  63 # 0.3.4   2007-03-19  Documentation update,
  64 #                     separate `execute` function.
  65 #         2007-03-21  Code cleanup in `Text2Code.__iter__`.
  66 # 0.3.5   2007-03-23  Removed "css" from known languages after learning that
  67 #                     there is no C++ style "// " comment string in CSS2.
  68 # 0.3.6   2007-04-24  Documentation update.
  69 # 0.4     2007-05-18  Implement Converter.__iter__ as stack of iterator
  70 #                     generators. Iterating over a converter instance now
  71 #                     yields lines instead of blocks.
  72 #                     Provide "hooks" for pre- and postprocessing filters.
  73 #                     Rename states to reduce confusion with formats:
  74 #                     "text" -> "documentation", "code" -> "code_block".
  75 # 0.4.1   2007-05-22  Converter.__iter__: cleanup and reorganisation,
  76 #                     rename parent class Converter -> TextCodeConverter.
  77 # 0.4.2   2007-05-23  Merged Text2Code.converter and Code2Text.converter into
  78 #                     TextCodeConverter.converter.
  79 # 0.4.3   2007-05-30  Replaced use of defaults.code_extensions with
  80 #                     values.languages.keys().
  81 #                     Removed spurious `print` statement in code_block_handler.
  82 #                     Added basic support for 'c' and 'css' languages
  83 #                     with `dumb_c_preprocessor`_ and `dumb_c_postprocessor`_.
  84 # 0.5     2007-06-06  Moved `collect_blocks`_ out of `TextCodeConverter`_,
  85 #                     bug fix: collect all trailing blank lines into a block.
  86 #                     Expand tabs with `expandtabs_filter`_.
  87 # 0.6     2007-06-20  Configurable code-block marker (default ``::``)
  88 # 0.6.1   2007-06-28  Bug fix: reset self.code_block_marker_missing.
  89 # 0.7     2007-12-12  prepending an empty string to sys.path in run_doctest()
  90 #                     to allow imports from the current working dir.
  91 # 0.7.1   2008-01-07  If outfile does not exist, do a round-trip conversion
  92 #                     and report differences (as with outfile=='-').
  93 # 0.7.2   2008-01-28  Do not add missing code-block separators with
  94 #                     `doctest_run` on the code source. Keeps lines consistent.
  95 # 0.7.3   2008-04-07  Use value of code_block_marker for insertion of missing
  96 #                     transition marker in Code2Text.code_block_handler
  97 #                     Add "shell" to defaults.languages
  98 # 0.7.4   2008-06-23  Add "latex" to defaults.languages
  99 # 0.7.5   2009-05-14  Bugfix: ignore blank lines in test for end of code block
 100 # 0.7.6   2009-12-15  language-dependent code-block markers (after a
 101 #                     feature request and patch by `jrioux`),
 102 #                     use DefaultDict for language-dependent defaults,
 103 #                     new defaults setting `add_missing_marker`_.
 104 # 0.7.7   2010-06-23  New command line option --codeindent.
 105 # 0.7.8   2011-03-30  Do not overwrite custom `add_missing_marker` value,
 106 #                     allow directive options following the 'code' directive.
 107 # 0.7.9   2011-04-05  Decode doctest string if 'magic comment' gives encoding.
 108 # 0.7.10  2013-06-07  Add "lua" to defaults.languages
 109 # 0.7.11  2020-10-10  Return 0, if input and output file are of same age.
 110 # 0.8.0   unpublishd  Fix ``--execute`` behaviour and tests.
 111 # ..                  Change default `codeindent` to 2.
 112 # ..                  Switch to `argparse`. Remove class `OptionValues`
 113 # ======  ==========  ==========================================================
 114 #
 115 # ::
 116
 117 __version__ = "0.8.0dev"
 118
 119 __docformat__ = 'restructuredtext'
 120
 121
 122 # Introduction
 123 # ------------
 124 #
 125 # PyLit is a bidirectional converter between two formats of a computer
 126 # program source:
 127 #
 128 # * a (reStructured) text document with program code embedded in
 129 #   *code blocks*, and
 130 # * a compilable (or executable) code source with *documentation*
 131 #   embedded in comment blocks
 132 #
 133 #
 134 # Requirements
 135 # ------------
 136 #
 137 # ::
 138
 139 import argparse
 140 import optparse
 141 import os
 142 import re
 143 import sys
 144
 145
 146 # DefaultDict
 147 # ~~~~~~~~~~~
 148 #
 149 # As `collections.defaultdict` adds key/value pairs when the default
 150 # constructor is called,  we  define an alternative that does not mutate the
 151 # dict as side-effect. ::
 152
 153 class DefaultDict(dict):
 154     """Dictionary with default value."""
 155
 156     default = 'python'
 157
 158     def __missing__(self, key):
 159         # cf. file:///usr/share/doc/python3/html/library/stdtypes.html#dict
 160         return self.default
 161
 162
 163 # defaults
 164 # ========
 165 #
 166 # The `defaults` object provides a central repository for default
 167 # values and their customisation. ::
 168
 169 defaults = argparse.Namespace()
 170
 171 # It is used for
 172 #
 173 # * the initialisation of data arguments in TextCodeConverter_ and
 174 #   PylitOptions_
 175 #
 176 # * completion of command line options in `PylitOptions.complete_values()`_.
 177 #
 178 # This allows the easy creation of back-ends that customise the
 179 # defaults and then call `main`_ e.g.:
 180 #
 181 # >>> import pylit
 182 # >>> pylit.defaults.comment_string = "## "
 183 # >>> pylit.defaults.codeindent = 4
 184 # >>> pylit.main()
 185 # 0 failures in 0 tests
 186 # (0, 0)
 187 #
 188 # The following default values are defined in pylit.py:
 189 #
 190 # languages
 191 # ---------
 192 #
 193 # Mapping of code file extensions to code language::
 194
 195 defaults.languages  = DefaultDict({".c":   "c",
 196                                    ".cc":  "c++",
 197                                    ".css": "css",
 198                                    ".lua": "lua",
 199                                    ".py":  "python",
 200                                    ".sh":  "shell",
 201                                    ".sl":  "slang",
 202                                    ".sty": "latex",
 203                                    ".tex": "latex"
 204                                   })
 205 defaults.languages.default = 'python'
 206
 207 # The result can be overridden by the ``--language`` command line option.
 208 #
 209 # The fallback language, used if there is no matching extension (e.g. if pylit
 210 # is used as filter) and no ``--language`` is specified is ``"python"``.
 211 # It can be changed programmatically by changing the ``.default``
 212 # attribute, e.g.
 213 #
 214 # >>> pylit.defaults.languages['.parrot']
 215 # 'python'
 216 # >>> pylit.defaults.languages.default = 'c++'
 217 # >>> pylit.defaults.languages['.camel']
 218 # 'c++'
 219 #
 220 # .. _text_extension:
 221 #
 222 # text_extensions
 223 # ---------------
 224 #
 225 # List of known extensions of (reStructured) text files. The first
 226 # extension in this list is used by the `_get_outfile_name`_ method to
 227 # generate a text output filename::
 228
 229 defaults.text_extensions = [".txt", ".rst"]
 230
 231
 232 # comment_strings
 233 # ---------------
 234 #
 235 # Comment strings for known languages. Used in Code2Text_ to recognise
 236 # text blocks and in Text2Code_ to format text blocks as comments.
 237 # Defaults to ``'# '``.
 238 #
 239 # **Comment strings include trailing whitespace.** ::
 240
 241 defaults.comment_strings = DefaultDict({"css":    '// ',
 242                                         "c":      '// ',
 243                                         "c++":    '// ',
 244                                         "lua":    '-- ',
 245                                         "latex":  '% ',
 246                                         "python": '# ',
 247                                         "shell":  '# ',
 248                                         "slang":  '% '
 249                                        })
 250 defaults.comment_strings.default = '# '
 251
 252 # header_string
 253 # -------------
 254 #
 255 # Marker string for a header code block in the text source. No trailing
 256 # whitespace needed as indented code follows.
 257 # Must be a valid rst directive that accepts code on the same line, e.g.
 258 # ``'..admonition::'``.
 259 #
 260 # Default is a comment marker::
 261
 262 defaults.header_string = '..'
 263
 264
 265 # .. _code_block_marker:
 266 #
 267 # code_block_markers
 268 # ------------------
 269 #
 270 # Markup at the end of a documentation block.
 271 # Default is Docutils' marker for a `literal block`_::
 272
 273 defaults.code_block_markers = DefaultDict()
 274 defaults.code_block_markers.default = '::'
 275
 276 # The `code_block_marker` string is `inserted into a regular expression`_.
 277 # Language-specific markers can be defined programmatically, e.g. in a
 278 # wrapper script.
 279 #
 280 # In a document where code examples are only one of several uses of
 281 # literal blocks, it is more appropriate to single out the source code
 282 # ,e.g. with the double colon at a separate line ("expanded form")
 283 #
 284 #   ``defaults.code_block_marker.default = ':: *'``
 285 #
 286 # or a dedicated ``.. code-block::`` directive [#]_
 287 #
 288 #   ``defaults.code_block_marker['c++'] = '.. code-block:: *c++'``
 289 #
 290 # The latter form also allows code in different languages kept together
 291 # in one literate source file.
 292 #
 293 # .. [#] The ``.. code-block::`` directive is not (yet) supported by
 294 #    standard Docutils.  It is provided by several add-ons, including
 295 #    the `code-block directive`_ project in the Docutils Sandbox and
 296 #    Sphinx_.
 297 #
 298 #
 299 # strip
 300 # -----
 301 #
 302 # Export to the output format stripping documentation or code blocks::
 303
 304 defaults.strip = False
 305
 306 # strip_marker
 307 # ------------
 308 #
 309 # Strip literal marker from the end of documentation blocks when
 310 # converting  to code format. Makes the code more concise but looses the
 311 # synchronisation of line numbers in text and code formats. Can also be used
 312 # (together with the auto-completion of the code-text conversion) to change
 313 # the `code_block_marker`::
 314
 315 defaults.strip_marker = False
 316
 317 # add_missing_marker
 318 # ------------------
 319 #
 320 # When converting from code format to text format, add a `code_block_marker`
 321 # at the end of documentation blocks if it is missing::
 322
 323 defaults.add_missing_marker = True
 324
 325 # Keep this at ``True``, if you want to re-convert to code format later!
 326 #
 327 #
 328 # .. _defaults.preprocessors:
 329 #
 330 # preprocessors
 331 # -------------
 332 #
 333 # Preprocess the data with language-specific filters_
 334 # Set below in Filters_::
 335
 336 defaults.preprocessors = {}
 337
 338 # .. _defaults.postprocessors:
 339 #
 340 # postprocessors
 341 # --------------
 342 #
 343 # Postprocess the data with language-specific filters_::
 344
 345 defaults.postprocessors = {}
 346
 347 # .. _defaults.codeindent:
 348 #
 349 # codeindent
 350 # ----------
 351 #
 352 # Number of spaces to indent code blocks in `Code2Text.code_block_handler`_::
 353
 354 defaults.codeindent = 2
 355
 356 # In `Text2Code.code_block_handler`_, the codeindent is determined by the
 357 # first recognised code line (header or first indented literal block
 358 # of the text source).
 359 #
 360 # overwrite
 361 # ---------
 362 #
 363 # What to do if the outfile already exists? (ignored if `outfile` == '-')::
 364
 365 defaults.overwrite = 'update'
 366
 367 # Recognised values:
 368 #
 369 #  :'yes':    overwrite eventually existing `outfile`,
 370 #  :'update': fail if the `outfile` is newer than `infile`,
 371 #             TODO: silently stop if both are of same age
 372 #  :'no':     fail if `outfile` exists.
 373 #
 374 #
 375 # Actions: execute, doctest, diff
 376 # -------------------------------
 377 # If true, these actions replace the default action (txt<->code conversion).
 378 # See also `PylitOptions`_. ::
 379
 380 defaults.execute = False
 381 defaults.doctest = False
 382 defaults.diff = False
 383
 384 # Initial values
 385 # --------------
 386 #
 387 # The following settings are auto-determined if None
 388 # (see `PylitOptions.complete_values()`_).
 389 # Initialize them here as they will not be set by
 390 # `ArgumentParser.parse_args()`_::
 391
 392 # defaults.infile = ''   # required
 393 defaults.outfile = None
 394 defaults.language = None
 395 defaults.comment_string = None
 396 defaults.replace = None
 397 defaults.code_block_marker = None
 398 defaults.txt2code = None
 399
 400
 401 # Extensions
 402 # ==========
 403 #
 404 # Try to import optional extensions::
 405
 406 try:
 407     import pylit_elisp
 408 except ImportError:
 409     pass
 410
 411
 412 # Converter Classes
 413 # =================
 414 #
 415 # The converter classes implement a simple state machine to separate and
 416 # transform documentation and code blocks. For this task, only a very limited
 417 # parsing is needed. PyLit's parser assumes:
 418 #
 419 # * `indented literal blocks`_ in a text source are code blocks.
 420 #
 421 # * comment blocks in a code source where every line starts with a matching
 422 #   comment string are documentation blocks.
 423 #
 424 # TextCodeConverter
 425 # -----------------
 426 # ::
 427
 428 class TextCodeConverter(object):
 429     """Parent class for the converters `Text2Code` and `Code2Text`.
 430     """
 431
 432 # The parent class defines data attributes and functions used in both
 433 # `Text2Code`_ converting a text source to executable code source, and
 434 # `Code2Text`_ converting commented code to a text source.
 435 #
 436 # Data attributes
 437 # ~~~~~~~~~~~~~~~
 438 #
 439 # Class default values are fetched from the `defaults`_ object and can be
 440 # overridden by matching keyword arguments during class instantiation. This
 441 # also works with keyword arguments to `get_converter`_ and `main`_, as these
 442 # functions pass on unused keyword args to the instantiation of a converter
 443 # class. ::
 444
 445     language = defaults.languages[None]
 446     comment_strings = defaults.comment_strings
 447     comment_string = "" # set in __init__ (if empty)
 448     codeindent =  defaults.codeindent
 449     header_string = defaults.header_string
 450     code_block_markers = defaults.code_block_markers
 451     code_block_marker = "" # set in __init__ (if empty)
 452     strip = defaults.strip
 453     strip_marker = defaults.strip_marker
 454     add_missing_marker = defaults.add_missing_marker
 455     directive_option_regexp = re.compile(r' +:(\w|[-._+:])+:( |$)')
 456     state = "" # type of current block, see `TextCodeConverter.convert`_
 457
 458 # Interface methods
 459 # ~~~~~~~~~~~~~~~~~
 460 #
 461 # .. _TextCodeConverter.__init__:
 462 #
 463 # __init__
 464 # """"""""
 465 #
 466 # Initialising sets the `data` attribute, an iterable object yielding lines of
 467 # the source to convert. [#]_
 468 #
 469 # .. [#] The most common choice of data is a `file` object with the text
 470 #        or code source.
 471 #
 472 #        To convert a string into a suitable object, use its splitlines()
 473 #        method like ``"2 lines\nof source".splitlines(True)``.
 474 #
 475 #
 476 # Additional keyword arguments are stored as instance variables,
 477 # overwriting the class defaults::
 478
 479     def __init__(self, data, **keyw):
 480         """data   --  iterable data object
 481                       (list, file, generator, string, ...)
 482            **keyw --  remaining keyword arguments are
 483                       stored as data-attributes
 484         """
 485         self.data = data
 486         self.__dict__.update(keyw)
 487
 488 # If empty, `code_block_marker` and `comment_string` are set according
 489 # to the `language`::
 490
 491         if not self.code_block_marker:
 492             self.code_block_marker = self.code_block_markers[self.language]
 493         if not self.comment_string:
 494             self.comment_string = self.comment_strings[self.language]
 495         self.stripped_comment_string = self.comment_string.rstrip()
 496
 497 # Pre- and postprocessing filters are set (with
 498 # `TextCodeConverter.get_filter`_)::
 499
 500         self.preprocessor = self.get_filter("preprocessors", self.language)
 501         self.postprocessor = self.get_filter("postprocessors", self.language)
 502
 503 # .. _inserted into a regular expression:
 504 #
 505 # Finally, a regular_expression for the `code_block_marker` is compiled
 506 # to find valid cases of `code_block_marker` in a given line and return
 507 # the groups: ``\1 prefix, \2 code_block_marker, \3 remainder`` ::
 508
 509         marker = self.code_block_marker
 510         if marker == '::':
 511             # the default marker may occur at the end of a text line
 512             self.marker_regexp = re.compile('^( *(?!\.\.).*)(::)([ \n]*)$')
 513         else:
 514             # marker must be on a separate line
 515             self.marker_regexp = re.compile('^( *)(%s)(.*\n?)$' % marker)
 516
 517 # .. _TextCodeConverter.__iter__:
 518 #
 519 # __iter__
 520 # """"""""
 521 #
 522 # Return an iterator for the instance. Iteration yields lines of converted
 523 # data.
 524 #
 525 # The iterator is a chain of iterators acting on `self.data` that does
 526 #
 527 # * preprocessing
 528 # * text<->code format conversion
 529 # * postprocessing
 530 #
 531 # Pre- and postprocessing are only performed, if filters for the current
 532 # language are registered in `defaults.preprocessors`_ and|or
 533 # `defaults.postprocessors`_. The filters must accept an iterable as first
 534 # argument and yield the processed input data line-wise.
 535 # ::
 536
 537     def __iter__(self):
 538         """Iterate over input data source and yield converted lines
 539         """
 540         return self.postprocessor(self.convert(self.preprocessor(self.data)))
 541
 542
 543 # .. _TextCodeConverter.__call__:
 544 #
 545 # __call__
 546 # """"""""
 547 # The special `__call__` method allows the use of class instances as callable
 548 # objects. It returns the converted data as list of lines::
 549
 550     def __call__(self):
 551         """Iterate over state-machine and return results as list of lines"""
 552         return [line for line in self]
 553
 554
 555 # .. _TextCodeConverter.__str__:
 556 #
 557 # __str__
 558 # """""""
 559 # Return converted data as string::
 560
 561     def __str__(self):
 562         return "".join(self())
 563
 564
 565 # Helpers and convenience methods
 566 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 567 #
 568 # .. _TextCodeConverter.convert:
 569 #
 570 # convert
 571 # """""""
 572 #
 573 # The `convert` method generates an iterator that does the actual  code <-->
 574 # text format conversion. The converted data is yielded line-wise and the
 575 # instance's `status` argument indicates whether the current line is "header",
 576 # "documentation", or "code_block"::
 577
 578     def convert(self, lines):
 579         """Iterate over lines of a program document and convert
 580         between "text" and "code" format
 581         """
 582
 583 # Initialise internal data arguments. (Done here, so that every new iteration
 584 # re-initialises them.)
 585 #
 586 # `state`
 587 #   the "type" of the currently processed block of lines. One of
 588 #
 589 #   :"":              initial state: check for header,
 590 #   :"header":        leading code block: strip `header_string`,
 591 #   :"documentation": documentation part: comment out,
 592 #   :"code_block":    literal blocks containing source code: unindent.
 593 #
 594 # ::
 595
 596         self.state = ""
 597
 598 # `_codeindent`
 599 #   * Do not confuse the internal attribute `_codeindent` with the configurable
 600 #     `codeindent` (without the leading underscore).
 601 #   * `_codeindent` is set in `Text2Code.code_block_handler`_ to the indent of
 602 #     first non-blank "code_block" line and stripped from all "code_block" lines
 603 #     in the text-to-code conversion,
 604 #   * `codeindent` is set in `__init__` to `defaults.codeindent`_ and added to
 605 #     "code_block" lines in the code-to-text conversion.
 606 #
 607 # ::
 608
 609         self._codeindent = 0
 610
 611 # `_textindent`
 612 #   * set by `Text2Code.documentation_handler`_ to the minimal indent of a
 613 #     documentation block,
 614 #   * used in `Text2Code.set_state`_ to find the end of a code block.
 615 #
 616 # ::
 617
 618         self._textindent = 0
 619
 620 # `_add_code_block_marker`
 621 #   If the last paragraph of a documentation block does not end with a
 622 #   code_block_marker_, it should be added (otherwise, the back-conversion
 623 #   fails.).
 624 #
 625 #   `_add_code_block_marker` is set by `Code2Text.documentation_handler`_
 626 #   and evaluated by `Code2Text.code_block_handler`_, because the
 627 #   documentation_handler does not know whether the next block will be
 628 #   documentation (with no need for a code_block_marker) or a code block.
 629 #
 630 # ::
 631
 632         self._add_code_block_marker = False
 633
 634
 635
 636 # Determine the state of the block and convert with the matching "handler"::
 637
 638         for block in collect_blocks(expandtabs_filter(lines)):
 639             try:
 640                 self.set_state(block)
 641             except StopIteration:
 642                 return
 643             for line in getattr(self, self.state+"_handler")(block):
 644                 yield line
 645
 646
 647 # .. _TextCodeConverter.get_filter:
 648 #
 649 # get_filter
 650 # """"""""""
 651 # ::
 652
 653     def get_filter(self, filter_set, language):
 654         """Return language specific filter"""
 655         if self.__class__ == Text2Code:
 656             key = "text2"+language
 657         elif self.__class__ == Code2Text:
 658             key = language+"2text"
 659         else:
 660             key = ""
 661         try:
 662             return getattr(defaults, filter_set)[key]
 663         except (AttributeError, KeyError, TypeError):
 664             # print("there is no %r filter in %r"%(key, filter_set))
 665             pass
 666         return identity_filter
 667
 668
 669 # get_indent
 670 # """"""""""
 671 # Return the number of leading spaces in `line`::
 672
 673     def get_indent(self, line):
 674         """Return the indentation of `string`.
 675         """
 676         return len(line) - len(line.lstrip())
 677
 678
 679 # Text2Code
 680 # ---------
 681 #
 682 # The `Text2Code` converter separates *code-blocks* [#]_ from *documentation*.
 683 # Code blocks are unindented, documentation is commented (or filtered, if the
 684 # ``strip`` option is True).
 685 #
 686 # .. [#] Only `indented literal blocks`_ are considered code-blocks. `quoted
 687 #        literal blocks`_, `parsed-literal blocks`_, and `doctest blocks`_ are
 688 #        treated as part of the documentation. This allows the inclusion of
 689 #        examples:
 690 #
 691 #           >>> 23 + 3
 692 #           26
 693 #
 694 #        Mark that there is no double colon before the doctest block in the
 695 #        text source.
 696 #
 697 # The class inherits the interface and helper functions from
 698 # TextCodeConverter_ and adds functions specific to the text-to-code format
 699 # conversion::
 700
 701 class Text2Code(TextCodeConverter):
 702     """Convert a (reStructured) text source to code source
 703     """
 704
 705 # .. _Text2Code.set_state:
 706 #
 707 # set_state
 708 # ~~~~~~~~~
 709 # ::
 710
 711     def set_state(self, block):
 712         """Determine state of `block`. Set `self.state`
 713         """
 714
 715 # `set_state` is used inside an iteration. Hence, if we are out of data, a
 716 # StopItertion exception should be raised::
 717
 718         if not block:
 719             raise StopIteration
 720
 721 # The new state depends on the active state (from the last block) and
 722 # features of the current block. It is either "header", "documentation", or
 723 # "code_block".
 724 #
 725 # If the current state is "" (first block), check for
 726 # the  `header_string` indicating a leading code block::
 727
 728         if self.state == "":
 729             # print("set state for %r"%block)
 730             if block[0].startswith(self.header_string):
 731                 self.state = "header"
 732             else:
 733                 self.state = "documentation"
 734
 735 # If the current state is "documentation", the next block is also
 736 # documentation. The end of a documentation part is detected in the
 737 # `Text2Code.documentation_handler`_::
 738
 739         # elif self.state == "documentation":
 740         #    self.state = "documentation"
 741
 742 # A "code_block" ends with the first less indented, non-blank line.
 743 # `_textindent` is set by the documentation handler to the indent of the
 744 # preceding documentation block::
 745
 746         elif self.state in ["code_block", "header"]:
 747             indents = [self.get_indent(line) for line in block
 748                        if line.rstrip()]
 749             # print("set_state:", indents, self._textindent)
 750             if indents and min(indents) <= self._textindent:
 751                 self.state = 'documentation'
 752             else:
 753                 self.state = 'code_block'
 754
 755 # TODO: (or not to do?) insert blank line before the first line with too-small
 756 # codeindent using self.ensure_trailing_blank_line(lines, line) (would need
 757 # split and push-back of the documentation part)?
 758 #
 759 # .. _Text2Code.header_handler:
 760 #
 761 # header_handler
 762 # ~~~~~~~~~~~~~~
 763 #
 764 # Sometimes code needs to remain on the first line(s) of the document to be
 765 # valid. The most common example is the "shebang" line that tells a POSIX
 766 # shell how to process an executable file::
 767
 768 #!/usr/bin/env python
 769
 770 # In Python, the special comment to indicate the encoding, e.g.
 771 # ``# -*- coding: iso-8859-1 -*-``, must occur before any other comment
 772 # or code too.
 773 #
 774 # If we want to keep the line numbers in sync for text and code source, the
 775 # reStructured Text markup for these header lines must start at the same line
 776 # as the first header line. Therefore, header lines could not be marked as
 777 # literal block (this would require the ``::`` and an empty line above the
 778 # code_block).
 779 #
 780 # OTOH, a comment may start at the same line as the comment marker and it
 781 # includes subsequent indented lines. Comments are visible in the reStructured
 782 # Text source but hidden in the pretty-printed output.
 783 #
 784 # With a header converted to comment in the text source, everything before
 785 # the first documentation block (i.e. before the first paragraph using the
 786 # matching comment string) will be hidden away (in HTML or PDF output).
 787 #
 788 # This seems a good compromise, the advantages
 789 #
 790 # * line numbers are kept
 791 # * the "normal" code_block conversion rules (indent/unindent by `codeindent` apply
 792 # * greater flexibility: you can hide a repeating header in a project
 793 #   consisting of many source files.
 794 #
 795 # set off the disadvantages
 796 #
 797 # - it may come as surprise if a part of the file is not "printed",
 798 # - one more syntax element to learn for rst newbies to start with pylit,
 799 #   (however, starting from the code source, this will be auto-generated)
 800 #
 801 # In the case that there is no matching comment at all, the complete code
 802 # source will become a comment -- however, in this case it is not very likely
 803 # the source is a literate document anyway.
 804 #
 805 # If needed for the documentation, it is possible to quote the header in (or
 806 # after) the first documentation block, e.g. as `parsed literal`.
 807 # ::
 808
 809     def header_handler(self, lines):
 810         """Format leading code block"""
 811         # strip header string from first line
 812         lines[0] = lines[0].replace(self.header_string, "", 1)
 813         # yield remaining lines formatted as code-block
 814         for line in self.code_block_handler(lines):
 815             yield line
 816
 817
 818 # .. _Text2Code.documentation_handler:
 819 #
 820 # documentation_handler
 821 # ~~~~~~~~~~~~~~~~~~~~~
 822 #
 823 # The 'documentation' handler processes everything that is not recognised as
 824 # "code_block". Documentation is quoted with `self.comment_string`
 825 # (or filtered with `--strip=True`).
 826 #
 827 # If end-of-documentation marker is detected,
 828 #
 829 # * set state to 'code_block'
 830 # * set `self._textindent` (needed by `Text2Code.set_state`_ to find the
 831 #   next "documentation" block)
 832 #
 833 # ::
 834
 835     def documentation_handler(self, lines):
 836         """Convert documentation blocks from text to code format
 837         """
 838         for line in lines:
 839             # test lines following the code-block marker for false positives
 840             if (self.state == "code_block" and line.rstrip()
 841                 and not self.directive_option_regexp.search(line)):
 842                 self.state = "documentation"
 843             # test for end of documentation block
 844             if self.marker_regexp.search(line):
 845                 self.state = "code_block"
 846                 self._textindent = self.get_indent(line)
 847             # yield lines
 848             if self.strip:
 849                 continue
 850             # do not comment blank lines preceding a code block
 851             if line.rstrip():
 852                 yield self.comment_string + line
 853             else:
 854                 if self.state == "code_block":
 855                     yield line
 856                 else:
 857                     yield self.comment_string.rstrip() + line
 858
 859
 860
 861 # .. _Text2Code.code_block_handler:
 862 #
 863 # code_block_handler
 864 # ~~~~~~~~~~~~~~~~~~
 865 #
 866 # The "code_block" handler is called with an indented literal block. It
 867 # removes leading whitespace up to the indentation of the first code line in
 868 # the file (this deviation from Docutils behaviour allows indented blocks of
 869 # Python code). ::
 870
 871     def code_block_handler(self, block):
 872         """Convert indented literal blocks to source code format
 873         """
 874
 875 # If still unset, determine the indentation of code blocks from first non-blank
 876 # code line::
 877
 878         if self._codeindent == 0:
 879             self._codeindent = self.get_indent(block[0])
 880
 881 # Yield unindented lines after check whether we can safely unindent. If the
 882 # line is less indented then `_codeindent`, something got wrong. ::
 883
 884         for line in block:
 885             if line.lstrip() and self.get_indent(line) < self._codeindent:
 886                 raise ValueError("code block contains line less indented "
 887                             "than %d spaces \n%r"%(self._codeindent, block))
 888             yield line.replace(" "*self._codeindent, "", 1)
 889
 890
 891 # Code2Text
 892 # ---------
 893 #
 894 # The `Code2Text` converter does the opposite of `Text2Code`_ -- it processes
 895 # a source in "code format" (i.e. in a programming language), extracts
 896 # documentation from comment blocks, and puts program code in literal blocks.
 897 #
 898 # The class inherits the interface and helper functions from
 899 # TextCodeConverter_ and adds functions specific to the text-to-code  format
 900 # conversion::
 901
 902 class Code2Text(TextCodeConverter):
 903     """Convert code source to text source
 904     """
 905
 906 # set_state
 907 # ~~~~~~~~~
 908 #
 909 # Check if block is "header", "documentation", or "code_block":
 910 #
 911 # A paragraph is "documentation", if every non-blank line starts with a
 912 # matching comment string (including whitespace except for commented blank
 913 # lines) ::
 914
 915     def set_state(self, block):
 916         """Determine state of `block`."""
 917         for line in block:
 918             # skip documentation lines (commented, blank or blank comment)
 919             if (line.startswith(self.comment_string)
 920                 or not line.rstrip()
 921                 or line.rstrip() == self.comment_string.rstrip()
 922                ):
 923                 continue
 924             # non-commented line found:
 925             if self.state == "":
 926                 self.state = "header"
 927             else:
 928                 self.state = "code_block"
 929             break
 930         else:
 931             # no code line found
 932             # keep state if the block is just a blank line
 933             # if len(block) == 1 and self._is_blank_codeline(line):
 934             #     return
 935             self.state = "documentation"
 936
 937
 938 # header_handler
 939 # ~~~~~~~~~~~~~~
 940 #
 941 # Handle a leading code block. (See `Text2Code.header_handler`_ for a
 942 # discussion of the "header" state.) ::
 943
 944     def header_handler(self, lines):
 945         """Format leading code block"""
 946         if self.strip == True:
 947             return
 948         # get iterator over the lines that formats them as code-block
 949         lines = iter(self.code_block_handler(lines))
 950         # prepend header string to first line
 951         yield self.header_string + next(lines)
 952         # yield remaining lines
 953         for line in lines:
 954             yield line
 955
 956 # .. _Code2Text.documentation_handler:
 957 #
 958 # documentation_handler
 959 # ~~~~~~~~~~~~~~~~~~~~~
 960 #
 961 # The *documentation state* handler converts a comment to a documentation
 962 # block by stripping the leading `comment string` from every line::
 963
 964     def documentation_handler(self, block):
 965         """Uncomment documentation blocks in source code
 966         """
 967
 968 # Strip comment strings::
 969
 970         lines = [self.uncomment_line(line) for line in block]
 971
 972 # If the code block is stripped, the literal marker would lead to an
 973 # error when the text is converted with Docutils. Strip it as well. ::
 974
 975         if self.strip or self.strip_marker:
 976             self.strip_code_block_marker(lines)
 977
 978 # Otherwise, check for the `code_block_marker`_ at the end of the
 979 # documentation block (skipping directive options that might follow it)::
 980
 981         elif self.add_missing_marker:
 982             for line in lines[::-1]:
 983                 if self.marker_regexp.search(line):
 984                     self._add_code_block_marker = False
 985                     break
 986                 if (line.rstrip() and
 987                     not self.directive_option_regexp.search(line)):
 988                     self._add_code_block_marker = True
 989                     break
 990             else:
 991                 self._add_code_block_marker = True
 992
 993 # Yield lines::
 994
 995         for line in lines:
 996             yield line
 997
 998 # uncomment_line
 999 # ~~~~~~~~~~~~~~
1000 #
1001 # Return documentation line after stripping comment string. Consider the
1002 # case that a blank line has a comment string without trailing whitespace::
1003
1004     def uncomment_line(self, line):
1005         """Return uncommented documentation line"""
1006         line = line.replace(self.comment_string, "", 1)
1007         if line.rstrip() == self.stripped_comment_string:
1008             line = line.replace(self.stripped_comment_string, "", 1)
1009         return line
1010
1011 # .. _Code2Text.code_block_handler:
1012 #
1013 # code_block_handler
1014 # ~~~~~~~~~~~~~~~~~~
1015 #
1016 # The `code_block` handler returns the code block as indented literal
1017 # block (or filters it, if ``self.strip == True``). The amount of the code
1018 # indentation is controlled by `self.codeindent` (default 2).  ::
1019
1020     def code_block_handler(self, lines):
1021         """Covert code blocks to text format (indent or strip)
1022         """
1023         if self.strip == True:
1024             return
1025         # eventually insert transition marker
1026         if self._add_code_block_marker:
1027             self.state = "documentation"
1028             yield self.code_block_marker + "\n"
1029             yield "\n"
1030             self._add_code_block_marker = False
1031             self.state = "code_block"
1032         for line in lines:
1033             yield " "*self.codeindent + line
1034
1035
1036
1037 # strip_code_block_marker
1038 # ~~~~~~~~~~~~~~~~~~~~~~~
1039 #
1040 # Replace the literal marker with the equivalent of Docutils replace rules
1041 #
1042 # * strip ``::``-line (and preceding blank line) if on a line on its own
1043 # * strip ``::`` if it is preceded by whitespace.
1044 # * convert ``::`` to a single colon if preceded by text
1045 #
1046 # `lines` is a list of documentation lines (with a trailing blank line).
1047 # It is modified in-place::
1048
1049     def strip_code_block_marker(self, lines):
1050         try:
1051             line = lines[-2]
1052         except IndexError:
1053             return # just one line (no trailing blank line)
1054
1055         # match with regexp: `match` is None or has groups
1056         # \1 leading text, \2 code_block_marker, \3 remainder
1057         match = self.marker_regexp.search(line)
1058
1059         if not match:                 # no code_block_marker present
1060             return
1061         if not match.group(1):        # `code_block_marker` on an extra line
1062             del(lines[-2])
1063             # delete preceding line if it is blank
1064             if len(lines) >= 2 and not lines[-2].lstrip():
1065                 del(lines[-2])
1066         elif match.group(1).rstrip() < match.group(1):
1067             # '::' follows whitespace
1068             lines[-2] = match.group(1).rstrip() + match.group(3)
1069         else:                         # '::' follows text
1070             lines[-2] = match.group(1).rstrip() + ':' + match.group(3)
1071
1072 # Filters
1073 # =======
1074 #
1075 # Filters allow pre- and post-processing of the data to bring it in a format
1076 # suitable for the "normal" text<->code conversion. An example is conversion
1077 # of `C` ``/*`` ``*/`` comments into C++ ``//`` comments (and back).
1078 # Another example is the conversion of `C` ``/*`` ``*/`` comments into C++
1079 # ``//`` comments (and back).
1080 #
1081 # Filters are generator functions that return an iterator acting on a
1082 # `data` iterable and yielding processed `data` lines.
1083 #
1084 # identity_filter
1085 # ---------------
1086 #
1087 # The most basic filter is the identity filter, that returns its argument as
1088 # iterator::
1089
1090 def identity_filter(data):
1091     """Return data iterator without any processing"""
1092     return iter(data)
1093
1094 # expandtabs_filter
1095 # -----------------
1096 #
1097 # Expand hard-tabs in every line of `data` (cf. `str.expandtabs`).
1098 #
1099 # This filter is applied to the input data by `TextCodeConverter.convert`_ as
1100 # hard tabs can lead to errors when the indentation is changed. ::
1101
1102 def expandtabs_filter(data):
1103     """Yield data tokens with hard-tabs expanded"""
1104     for line in data:
1105         yield line.expandtabs()
1106
1107
1108 # collect_blocks
1109 # --------------
1110 #
1111 # A filter to aggregate "paragraphs" (blocks separated by blank
1112 # lines). Yields lists of lines::
1113
1114 def collect_blocks(lines):
1115     """collect lines in a list
1116
1117     yield list for each paragraph, i.e. block of lines separated by a
1118     blank line (whitespace only).
1119
1120     Trailing blank lines are collected as well.
1121     """
1122     blank_line_reached = False
1123     block = []
1124     for line in lines:
1125         if blank_line_reached and line.rstrip():
1126             yield block
1127             blank_line_reached = False
1128             block = [line]
1129             continue
1130         if not line.rstrip():
1131             blank_line_reached = True
1132         block.append(line)
1133     yield block
1134
1135
1136
1137 # dumb_c_preprocessor
1138 # -------------------
1139 #
1140 # This is a basic filter to convert `C` to `C++` comments. Works line-wise and
1141 # only converts lines that
1142 #
1143 # * start with "/\* " and end with " \*/" (followed by whitespace only)
1144 #
1145 # A more sophisticated version would also
1146 #
1147 # * convert multi-line comments
1148 #
1149 #   + Keep indentation or strip 3 leading spaces?
1150 #
1151 # * account for nested comments
1152 #
1153 # * only convert comments that are separated from code by a blank line
1154 #
1155 # ::
1156
1157 def dumb_c_preprocessor(data):
1158     """change `C` ``/* `` `` */`` comments into C++ ``// `` comments"""
1159     comment_string = defaults.comment_strings["c++"]
1160     boc_string = "/* "
1161     eoc_string = " */"
1162     for line in data:
1163         if (line.startswith(boc_string)
1164             and line.rstrip().endswith(eoc_string)
1165            ):
1166             line = line.replace(boc_string, comment_string, 1)
1167             line = "".join(line.rsplit(eoc_string, 1))
1168         yield line
1169
1170 # Unfortunately, the `replace` method of strings does not support negative
1171 # numbers for the `count` argument:
1172 #
1173 #   >>> "foo */ baz */ bar".replace(" */", "", -1) == "foo */ baz bar"
1174 #   False
1175 #
1176 # However, there is the `rsplit` method, that can be used together with `join`:
1177 #
1178 #   >>> "".join("foo */ baz */ bar".rsplit(" */", 1)) == "foo */ baz bar"
1179 #   True
1180 #
1181 # dumb_c_postprocessor
1182 # --------------------
1183 #
1184 # Undo the preparations by the dumb_c_preprocessor and re-insert valid comment
1185 # delimiters ::
1186
1187 def dumb_c_postprocessor(data):
1188     """change C++ ``// `` comments into `C` ``/* `` `` */`` comments"""
1189     comment_string = defaults.comment_strings["c++"]
1190     boc_string = "/* "
1191     eoc_string = " */"
1192     for line in data:
1193         if line.rstrip() == comment_string.rstrip():
1194             line = line.replace(comment_string, "", 1)
1195         elif line.startswith(comment_string):
1196             line = line.replace(comment_string, boc_string, 1)
1197             line = line.rstrip() + eoc_string + "\n"
1198         yield line
1199
1200
1201 # register filters
1202 # ----------------
1203 #
1204 # ::
1205
1206 defaults.preprocessors['c2text'] = dumb_c_preprocessor
1207 defaults.preprocessors['css2text'] = dumb_c_preprocessor
1208 defaults.postprocessors['text2c'] = dumb_c_postprocessor
1209 defaults.postprocessors['text2css'] = dumb_c_postprocessor
1210
1211
1212 # Command line use
1213 # ================
1214 #
1215 # Using this script from the command line will convert a file according to its
1216 # extension. This default can be overridden by a couple of options.
1217 #
1218 # Dual source handling
1219 # --------------------
1220 #
1221 # How to determine which source is up-to-date?
1222 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1223 #
1224 # - set modification date of `outfile` to the one of `infile`
1225 #
1226 #   Points out that the source files are 'synchronised'.
1227 #
1228 #   * Are there problems to expect from "backdating" a file? Which?
1229 #
1230 #     Looking at http://www.unix.com/showthread.php?t=20526, it seems
1231 #     perfectly legal to set `mtime` (while leaving `ctime`) as `mtime` is a
1232 #     description of the "actuality" of the data in the file.
1233 #
1234 #   * Should this become a default or an option?
1235 #
1236 # - alternatively move input file to a backup copy (with option: `--replace`)
1237 #
1238 # - check modification date before overwriting
1239 #   (with option: `--overwrite=update`)
1240 #
1241 # - check modification date before editing (implemented as `Jed editor`_
1242 #   function `pylit_check()` in `pylit.sl`_)
1243 #
1244 # .. _Jed editor: http://www.jedsoft.org/jed/
1245 # .. _pylit.sl: http://jedmodes.sourceforge.net/mode/pylit/
1246 #
1247 # Recognised Filename Extensions
1248 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1249 #
1250 # Instead of defining a new extension for "pylit" literate programs,
1251 # by default ``.txt`` will be appended for the text source and stripped by
1252 # the conversion to the code source. I.e. for a Python program foo:
1253 #
1254 # * the code source is called ``foo.py``
1255 # * the text source is called ``foo.py.txt``
1256 # * the html rendering is called ``foo.py.html``
1257 #
1258 #
1259 # PylitOptions
1260 # ------------
1261 #
1262 # The `PylitOptions` class comprises an option parser and methods for parsing
1263 # and completion of command line options::
1264
1265 class PylitOptions(object):
1266     """Storage and handling of command line options for pylit"""
1267
1268 # Instantiation
1269 # ~~~~~~~~~~~~~
1270 #
1271 # ::
1272
1273     def __init__(self):
1274         """Set up an `OptionParser` instance for pylit command line options
1275         """
1276         p = argparse.ArgumentParser(usage=main.__doc__)
1277
1278 # Positional arguments (I/O):
1279 #
1280 # The "infile" argument is required unless there is a value in the
1281 # `namespace` passed to `p.parse_args()`.
1282 # We need to cheat here, because this behaviour is not supported by
1283 # `argparse` (cf. `issue 29670`_).
1284 #
1285 # The default value is set to `argparse.SUPPRESS` to prevent overwriting an
1286 # existing value in the `namespace` (cf. `issue 28734`_). ::
1287
1288         p.add_argument('infile', metavar='INFILE',
1289                        nargs='?', default=argparse.SUPPRESS,
1290                        help='input file ("-" for stdin)')
1291         p.add_argument('outfile', metavar='OUTFILE',
1292                        nargs='?', default=argparse.SUPPRESS,
1293                        help=u'output file, default: auto-determined')
1294
1295 # Conversion settings::
1296
1297         p.add_argument("-c", "--code2txt", action="store_false",
1298                        dest="txt2code",
1299                        help="convert code source to text source")
1300         p.add_argument("-t", "--txt2code", action="store_true",
1301                        help="convert text source to code source")
1302         p.add_argument("--language",
1303                        choices = list(defaults.comment_strings.keys()),
1304                        help="use LANGUAGE's native comment style")
1305         p.add_argument("--comment-string", dest="comment_string",
1306                        help="documentation block marker in code source "
1307                        "(including trailing whitespace, "
1308                        "default: language dependent)")
1309         p.add_argument("-m", "--code-block-marker", dest="code_block_marker",
1310                        help="syntax token starting a code block. (default '::')")
1311         p.add_argument("--codeindent", type=int,
1312                        help="Number of spaces to indent code blocks with "
1313                        "code2text (default %d)" % defaults.codeindent)
1314
1315 # Output file handling::
1316
1317         p.add_argument("--overwrite", action="store",
1318                        choices = ["yes", "update", "no"],
1319                        help="overwrite output file (default 'update')")
1320         p.add_argument("--replace", action="store_true",
1321                        help="move infile to a backup copy (appending '~')")
1322         # TODO: do we need this? If yes, make mtime update depend on it!
1323         # p.add_argument("--keep-mtime", action="store_true",
1324         #              help="do not set the modification time of the outfile "
1325         #              "to the corresponding value of the infile")
1326         p.add_argument("-s", "--strip", action="store_true",
1327                        help='"export" by stripping documentation or code')
1328
1329 # Actions::
1330
1331         p.add_argument("-d", "--diff", action="store_true",
1332                        help="test for differences to existing file")
1333         p.add_argument("--doctest", action="store_true",
1334                        help="run doctest.testfile() on the text version")
1335         p.add_argument("-e", "--execute", action="store_true",
1336                        help="execute code (Python only)")
1337         p.add_argument('-v', '--version', action='version',
1338                        version=__version__)
1339
1340         self.parser = p
1341
1342
1343
1344 # .. _PylitOptions.parse_args:
1345 #
1346 # parse_args
1347 # ~~~~~~~~~~
1348 #
1349 # The `parse_args` method calls the `optparse.OptionParser` on command
1350 # line or provided args and returns the result as `PylitOptions.Values`
1351 # instance. ::
1352
1353     def parse_args(self, args=sys.argv[1:], values=None):
1354         """Parse command line arguments using `optparse.OptionParser`.
1355
1356            parse_args(args, **keyw) -> OptionValues instance
1357
1358            args --  list of command line arguments.
1359            values -- object to store the option's values
1360         """
1361         # parse arguments
1362         values = self.parser.parse_args(args, values)
1363
1364         return values
1365
1366 # .. _PylitOptions.complete_values():
1367 #
1368 # complete_values
1369 # ~~~~~~~~~~~~~~~
1370 #
1371 # Complete an OptionValues instance `values`.  Use module-level defaults and
1372 # context information to set missing option values to sensible defaults (if
1373 # possible) ::
1374
1375     def complete_values(self, values):
1376         """complete option values with module and context sensible defaults
1377
1378         x.complete_values(values) -> values
1379         values -- OptionValues instance
1380         """
1381
1382 # The "infile" argument is required but may be pre-set in the `namespace`
1383 # passed to `p.parse_args()` (cf. `issue 29670`_::
1384
1385         try:
1386             values.infile
1387         except AttributeError:
1388             self.parser.error('the following argument is required: infile')
1389
1390 # Guess conversion direction from `infile` filename::
1391
1392         if getattr(values, 'txt2code', None) is None:
1393             in_extension = os.path.splitext(values.infile)[1]
1394             if in_extension in defaults.text_extensions:
1395                 values.txt2code = True
1396             elif in_extension in defaults.languages.keys():
1397                 values.txt2code = False
1398             else:
1399                 values.txt2code = None
1400
1401 # Auto-determine the output file name::
1402
1403         if not values.outfile:
1404             values.outfile = self._get_outfile_name(values)
1405
1406 # Second try: Guess conversion direction from outfile filename::
1407
1408         if values.txt2code is None:
1409             out_extension = os.path.splitext(values.outfile)[1]
1410             values.txt2code = not (out_extension in defaults.text_extensions)
1411
1412 # Set the language of the code::
1413
1414         if values.language is None:
1415             if values.txt2code is True:
1416                 code_extension = os.path.splitext(values.outfile)[1]
1417             elif values.txt2code is False:
1418                 code_extension = os.path.splitext(values.infile)[1]
1419             values.language = defaults.languages[code_extension]
1420
1421         return values
1422
1423
1424 # _get_outfile_name
1425 # ~~~~~~~~~~~~~~~~~
1426 #
1427 # Construct a matching filename for the output file. The output filename is
1428 # constructed from `infile` by the following rules:
1429 #
1430 # * '-' (stdin) results in '-' (stdout)
1431 # * strip the `text_extension`_ (txt2code) or
1432 # * add the `text_extension`_ (code2txt)
1433 # * fallback: if no guess can be made, add ".out"
1434 #
1435 #   .. TODO: use values.outfile_extension if it exists?
1436 #
1437 # ::
1438
1439     def _get_outfile_name(self, values):
1440         """Return a matching output filename for `infile`
1441         """
1442         # if input is stdin, default output is stdout
1443         if values.infile == '-':
1444             return '-'
1445
1446         # Derive from `infile` name: strip or add text extension
1447         (base, ext) = os.path.splitext(values.infile)
1448         if ext in defaults.text_extensions:
1449             return base # strip
1450         if ext and ext in defaults.languages or values.txt2code == False:
1451             return values.infile + defaults.text_extensions[0] # add
1452         # give up
1453         return values.infile + ".out"
1454
1455
1456 # .. _PylitOptions.__call__():
1457 #
1458 # __call__
1459 # ~~~~~~~~
1460 #
1461 # Use PylitOptions instances as *callables*: Calling a `PylitOptions` instance
1462 # parses the argument list to extract option values and completes them based
1463 # on "context-sensitive defaults".
1464 # Keyword arguments overwrite the `defaults`_
1465 # and are overwritten by command line options.
1466 #
1467 # Attention: passing a `namespace` to `ArgumentParser.parse_args()` has a
1468 # side-effect:
1469 #
1470 #   […] if you give an existing object, the option defaults will not be
1471 #   initialized on it
1472 #
1473 #   -- https://docs.python.org/dev/library/optparse.html#parsing-arguments
1474 #
1475 # .. The argument is renamed from `values` to `namespace` in Python 3.
1476 #    Positional argument defaults are initialized unless the default value
1477 #    `argparse.SUPPRESS` is specified.
1478 #
1479 # ::
1480
1481     def __call__(self, args=sys.argv[1:], **kwargs):
1482         """parse and complete command line args, return option values
1483         """
1484         settings = vars(defaults).copy()  # don't change global settings
1485         settings.update(kwargs)
1486         settings = argparse.Namespace(**settings)
1487
1488         settings = self.parse_args(args, settings)
1489         settings = self.complete_values(settings)
1490         # print(f'{settings.outfile=}')
1491         # for k,v in vars(settings).items():
1492         #    print(k,v)
1493         return settings
1494
1495 # Helper functions
1496 # ----------------
1497 #
1498 # open_streams
1499 # ~~~~~~~~~~~~
1500 #
1501 # Return file objects for in- and output. If the input path is missing,
1502 # write usage and abort. (An alternative would be to use stdin as default.
1503 # However,  this leaves the uninitiated user with a non-responding application
1504 # if (s)he just tries the script without any arguments) ::
1505
1506 def open_streams(infile = '-', outfile = '-', overwrite='update', **keyw):
1507     """Open and return the input and output stream
1508
1509     open_streams(infile, outfile) -> (in_stream, out_stream)
1510
1511     in_stream   --  file(infile) or sys.stdin
1512     out_stream  --  file(outfile) or sys.stdout
1513     overwrite   --  'yes': overwrite eventually existing `outfile`,
1514                     'update': fail if the `outfile` is newer than `infile`,
1515                     'no': fail if `outfile` exists.
1516
1517                     Irrelevant if `outfile` == '-'.
1518     """
1519     if overwrite not in ('yes', 'no', 'update'):
1520         raise ValueError('Argument "overwrite" must be "yes", "no",'
1521                          ' or update, not "%s".' % overwrite)
1522     if not infile:
1523         strerror = "Missing input file name ('-' for stdin; -h for help)"
1524         raise IOError(2, strerror, infile)
1525     if infile == '-':
1526         in_stream = sys.stdin
1527     else:
1528         in_stream = open(infile, 'r')
1529     if outfile == '-':
1530         out_stream = sys.stdout
1531     elif overwrite == 'no' and os.path.exists(outfile):
1532         raise IOError(17, "Output file exists!", outfile)
1533     elif overwrite == 'update' and is_newer(outfile, infile) is None:
1534         raise IOError(0, "Output file is as old as input file!", outfile)
1535     elif overwrite == 'update' and is_newer(outfile, infile):
1536         raise IOError(1, "Output file is newer than input file!", outfile)
1537     else:
1538         out_stream = open(outfile, 'w')
1539     return (in_stream, out_stream)
1540
1541 # is_newer
1542 # ~~~~~~~~
1543 #
1544 # ::
1545
1546 def is_newer(path1, path2):
1547     """Check if `path1` is newer than `path2` (using mtime)
1548
1549     Compare modification time of files at path1 and path2.
1550
1551     Non-existing files are considered oldest: Return False if path1 does not
1552     exist and True if path2 does not exist.
1553
1554     Return None if the modification time differs less than 1/10 second.
1555     (This evaluates to False in a Boolean context but allows a test
1556     for equality.)
1557     """
1558     try:
1559         mtime1 = os.path.getmtime(path1)
1560     except OSError:
1561         mtime1 = -1
1562     try:
1563         mtime2 = os.path.getmtime(path2)
1564     except OSError:
1565         mtime2 = -1
1566     if abs(mtime1 - mtime2) < 0.1:
1567         return None
1568     return mtime1 > mtime2
1569
1570
1571 # get_converter
1572 # ~~~~~~~~~~~~~
1573 #
1574 # Get an instance of the converter state machine::
1575
1576 def get_converter(data, txt2code=True, **keyw):
1577     if txt2code:
1578         return Text2Code(data, **keyw)
1579     else:
1580         return Code2Text(data, **keyw)
1581
1582
1583 # Use cases
1584 # ---------
1585 #
1586 # run_doctest
1587 # ~~~~~~~~~~~
1588 # ::
1589
1590 def run_doctest(infile="-", txt2code=True,
1591                 globs={}, verbose=False, optionflags=0, **keyw):
1592     """run doctest on the text source
1593     """
1594
1595 # Allow imports from the current working dir by prepending an empty string to
1596 # sys.path (see doc of sys.path())::
1597
1598     sys.path.insert(0, '')
1599
1600 # Import classes from the doctest module::
1601
1602     from doctest import DocTestParser, DocTestRunner
1603
1604 # Read in source. Make sure it is in text format, as tests in comments are not
1605 # found by doctest::
1606
1607     (data, out_stream) = open_streams(infile, "-")
1608     if txt2code is False:
1609         keyw.update({'add_missing_marker': False})
1610         converter = Code2Text(data, **keyw)
1611         docstring = str(converter)
1612     else:
1613         docstring = data.read()
1614
1615 # decode doc string if there is a "magic comment" in the first or second line
1616 # (http://docs.python.org/reference/lexical_analysis.html#encoding-declarations)
1617 # ::
1618
1619     if sys.version_info < (3,0):
1620         firstlines = ' '.join(docstring.splitlines()[:2])
1621         match = re.search('coding[=:]\s*([-\w.]+)', firstlines)
1622         if match:
1623             docencoding = match.group(1)
1624             docstring = docstring.decode(docencoding)
1625
1626 # Use the doctest Advanced API to run all doctests in the source text::
1627
1628     test = DocTestParser().get_doctest(docstring, globs, name="",
1629                                        filename=infile, lineno=0)
1630     runner = DocTestRunner(verbose, optionflags)
1631     runner.run(test)
1632     runner.summarize()
1633     # give feedback also if no failures occurred
1634     if not runner.failures:
1635         print("%d failures in %d tests"%(runner.failures, runner.tries))
1636     return runner.failures, runner.tries
1637
1638
1639 # diff
1640 # ~~~~
1641 #
1642 # ::
1643
1644 def diff(infile='-', outfile='-', txt2code=True, **keyw):
1645     """Report differences between converted infile and existing outfile
1646
1647     If outfile does not exist or is '-', do a round-trip conversion and
1648     report differences.
1649     """
1650
1651     import difflib
1652
1653     instream = open(infile)
1654     # for diffing, we need a copy of the data as list::
1655     data = instream.readlines()
1656     # convert
1657     converter = get_converter(data, txt2code, **keyw)
1658     new = converter()
1659
1660     if outfile != '-' and os.path.exists(outfile):
1661         outstream = open(outfile)
1662         old = outstream.readlines()
1663         oldname = outfile
1664         newname = "<conversion of %s>"%infile
1665     else:
1666         old = data
1667         oldname = infile
1668         # back-convert the output data
1669         converter = get_converter(new, not txt2code)
1670         new = converter()
1671         newname = "<round-conversion of %s>"%infile
1672
1673     # find and print the differences
1674     is_different = False
1675     # print(type(old), old)
1676     # print(type(new), new)
1677     delta = difflib.unified_diff(old, new,
1678     # delta = difflib.unified_diff(["heute\n", "schon\n"], ["heute\n", "noch\n"],
1679                                       fromfile=oldname, tofile=newname)
1680     for line in delta:
1681         is_different = True
1682         print(line, end=' ')
1683     if not is_different:
1684         print(oldname)
1685         print(newname)
1686         print("no differences found")
1687     return is_different
1688
1689
1690 # execute
1691 # ~~~~~~~
1692 #
1693 # Works only for python code.
1694 #
1695 # Does not work with `eval`, as code is not just one expression. ::
1696
1697 def execute(infile="-", txt2code=True, **keyw):
1698     """Execute the input file. Convert first, if it is a text source.
1699     """
1700
1701     with open(infile) as f:
1702         data = f.readlines()
1703     if txt2code:
1704         data = str(Text2Code(data, **keyw))
1705     exec(''.join(data))
1706
1707
1708 # main
1709 # ----
1710 #
1711 # If this script is called from the command line, the `main` function will
1712 # convert the input (file or stdin) between text and code formats.
1713 #
1714 # Setting values for the conversion can be given as keyword arguments
1715 # to `main`_.  The option defaults will be updated by command line options and
1716 # extended with "intelligent guesses" by `PylitOptions`_ and passed on to
1717 # helper functions and the converter instantiation.
1718 #
1719 # This allows easy customisation for programmatic use -- just call `main`
1720 # with the appropriate keyword options, e.g. ``pylit.main(comment_string="## ")``
1721 #
1722 # ::
1723
1724 def main(args=sys.argv[1:], **settings):
1725     """%(prog)s [options] INFILE [OUTFILE]
1726
1727     Convert between (reStructured) text source with embedded code,
1728     and code source with embedded documentation (comment blocks)
1729
1730     The special filename '-' stands for standard in- and output.
1731     """
1732
1733 # Parse and complete the options::
1734
1735     settings = PylitOptions()(args, **settings)
1736
1737 # Special actions with early return::
1738
1739     if settings.doctest:
1740         return run_doctest(**vars(settings).copy())
1741
1742     if settings.diff:
1743         return diff(**vars(settings).copy())
1744
1745     if settings.execute:
1746         return execute(**vars(settings).copy())
1747
1748 # Open in- and output streams::
1749
1750     try:
1751         (data, out_stream) = open_streams(**vars(settings).copy())
1752     except IOError as ex:
1753         print("IOError: %s %s" % (ex.filename, ex.strerror))
1754         sys.exit(ex.errno)
1755
1756 # Get a converter instance::
1757
1758     converter = get_converter(data, **vars(settings).copy())
1759
1760 # Convert and write to out_stream::
1761
1762     out_stream.write(str(converter))
1763
1764     if out_stream is not sys.stdout:
1765         print('output written to %r' % out_stream.name)
1766         out_stream.close()
1767
1768 # If input and output are from files, set the modification time (`mtime`) of
1769 # the output file to the one of the input file to indicate that the contained
1770 # information is equal. [#]_ ::
1771
1772
1773         # print("fractions?", os.stat_float_times())
1774         try:
1775             os.utime(settings.outfile, (os.path.getatime(settings.outfile),
1776                                        os.path.getmtime(settings.infile))
1777                     )
1778         except OSError:
1779             pass
1780
1781     ## print("mtime", os.path.getmtime(settings.infile),  settings.infile)
1782     ## print("mtime", os.path.getmtime(settings.outfile), settings.outfile)
1783
1784
1785 # .. [#] Make sure the corresponding file object (here `out_stream`) is
1786 #        closed, as otherwise the change will be overwritten when `close` is
1787 #        called afterwards (either explicitly or at program exit).
1788 #
1789 #
1790 # Rename the infile to a backup copy if ``--replace`` is set::
1791
1792     if settings.replace:
1793         os.rename(settings.infile, settings.infile + "~")
1794
1795
1796 # Run main, if called from the command line::
1797
1798 if __name__ == '__main__':
1799     main()
1800
1801
1802 # Open questions
1803 # ==============
1804 #
1805 # Open questions and ideas for further development
1806 #
1807 # Clean code
1808 # ----------
1809 #
1810 # * can we gain from using "shutils" over "os.path" and "os"?
1811 # * use pylint or pyChecker to enforce a consistent style?
1812 #
1813 # Options
1814 # -------
1815 #
1816 # * Use templates for the "intelligent guesses" (with Python syntax for string
1817 #   replacement with dicts: ``"hello %(what)s" % {'what': 'world'}``)
1818 #
1819 # * Is it sensible to offer the `header_string` option also as command line
1820 #   option?
1821 #
1822 # treatment of blank lines
1823 # ------------------------
1824 #
1825 # Alternatives: Keep blank lines blank
1826 #
1827 # - "never" (current setting) -> "visually merges" all documentation
1828 #    if there is no interjacent code
1829 #
1830 # - "always" -> disrupts documentation blocks,
1831 #
1832 # - "if empty" (no whitespace). Comment if there is whitespace.
1833 #
1834 #   This would allow non-obstructing markup but unfortunately this is (in
1835 #   most editors) also non-visible markup.
1836 #
1837 # + "if double" (if there is more than one consecutive blank line)
1838 #
1839 #   With this handling, the "visual gap" remains in both, text and code
1840 #   source.
1841 #
1842 #
1843 # Parsing Problems
1844 # ----------------
1845 #
1846 # * Ignore "matching comments" in literal strings?
1847 #
1848 #   Too complicated: Would need a specific detection algorithm for every
1849 #   language that supports multi-line literal strings (C++, PHP, Python)
1850 #
1851 # * Warn if a comment in code will become documentation after round-trip?
1852 #
1853 #
1854 # docstrings in code blocks
1855 # -------------------------
1856 #
1857 # * How to handle docstrings in code blocks? (it would be nice to convert them
1858 #   to rst-text if ``__docformat__ == restructuredtext``)
1859 #
1860 # TODO: Ask at Docutils users|developers
1861 #
1862 # Plug-ins
1863 # --------
1864 #
1865 # Specify a path for user additions and plug-ins. This would require to
1866 # convert Pylit from a pure module to a package...
1867 #
1868 #   6.4.3 Packages in Multiple Directories
1869 #
1870 #   Packages support one more special attribute, __path__. This is initialized
1871 #   to be a list containing the name of the directory holding the package's
1872 #   __init__.py before the code in that file is executed. This
1873 #   variable can be modified; doing so affects future searches for modules and
1874 #   subpackages contained in the package.
1875 #
1876 #   While this feature is not often needed, it can be used to extend the set
1877 #   of modules found in a package.
1878 #
1879 #
1880 # .. References
1881 #
1882 # .. _Docutils: http://docutils.sourceforge.net/
1883 # .. _Sphinx: http://sphinx.pocoo.org
1884 # .. _Pygments: http://pygments.org/
1885 # .. _code-block directive:
1886 #     http://docutils.sourceforge.net/sandbox/code-block-directive/
1887 # .. _literal block:
1888 # .. _literal blocks:
1889 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#literal-blocks
1890 # .. _indented literal block:
1891 # .. _indented literal blocks:
1892 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#indented-literal-blocks
1893 # .. _quoted literal block:
1894 # .. _quoted literal blocks:
1895 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#quoted-literal-blocks
1896 # .. _parsed-literal blocks:
1897 #     http://docutils.sf.net/docs/ref/rst/directives.html#parsed-literal-block
1898 # .. _doctest block:
1899 # .. _doctest blocks:
1900 #     http://docutils.sf.net/docs/ref/rst/restructuredtext.html#doctest-blocks
1901 # .. _issue 28734: https://bugs.python.org/issue28734
1902 # .. _issue 29670: https://bugs.python.org/issue29670
1903 # .. _ArgumentParser.parse_args():
1904 #     https://docs.python.org/dev/library/argparse.html#the-parse-args-method