1 .. #!/usr/bin/env python
2 # -*- coding: iso-8859-1 -*-
4 ===============================================================
5 pylit.py: Literate programming with reStructuredText
6 ===============================================================
9 :Version: SVN-Revision $Revision$
11 :Copyright: 2005, 2007 Guenter Milde.
12 Released under the terms of the GNU General Public License
24 :2005-06-29: Initial version.
25 :2005-06-30: First literate version.
26 :2005-07-01: Object orientated script using generators.
27 :2005-07-10: Two state machine (later added 'header' state).
28 :2006-12-04: Start of work on version 0.2 (code restructuring).
29 :2007-01-23: 0.2 Published at http://pylit.berlios.de.
30 :2007-01-25: 0.2.1 Outsourced non-core documentation to the PyLit pages.
31 :2007-01-26: 0.2.2 New behaviour of `diff` function.
32 :2007-01-29: 0.2.3 New `header` methods after suggestion by Riccardo Murri.
33 :2007-01-31: 0.2.4 Raise Error if code indent is too small.
34 :2007-02-05: 0.2.5 New command line option --comment-string.
35 :2007-02-09: 0.2.6 Add section with open questions,
36 Code2Text: let only blank lines (no comment str)
37 separate text and code,
38 fix `Code2Text.header`.
39 :2007-02-19: 0.2.7 Simplify `Code2Text.header`,
40 new `iter_strip` method replacing a lot of ``if``-s.
41 :2007-02-22: 0.2.8 Set `mtime` of outfile to the one of infile.
42 :2007-02-27: 0.3 New `Code2Text` converter after an idea by Riccardo Murri,
43 explicite `option_defaults` dict for easier customization.
44 :2007-03-02: 0.3.1 Expand hard-tabs to prevent errors in indentation,
45 `Text2Code` now also works on blocks,
46 removed dependency on SimpleStates module.
47 :2007-03-06: 0.3.2 Bugfix: do not set `language` in `option_defaults`
48 renamed `code_languages` to `languages`.
49 :2007-03-16: 0.3.3 New language css,
50 option_defaults -> defaults = optparse.Values(),
51 simpler PylitOptions: don't store parsed values,
52 don't parse at initialization,
53 OptionValues: return `None` for non-existing attributes,
54 removed -infile and -outfile, use positional arguments.
55 :2007-03-19: 0.3.4 Documentation update,
56 separate `execute` function.
57 :2007-03-21: Code cleanup in `Text2Code.__iter__`.
58 :2007-03-23: 0.3.5 Removed "css" from languages after learning that
59 there is no C++ style "// " comment in css2.
60 :2007-04-24: 0.3.6 Documentation update.
61 :2007-05-18: 0.4 Implement Converter.__iter__ as stack of iterator
62 generators. Iterating over a converter instance now
63 yields lines instead of blocks.
64 Provide "hooks" for pre- and postprocessing filters.
65 Rename states to avoid confusion with formats:
66 "text" -> "documentation", "code" -> "code_block".
67 :2007-05-22: 0.4.1 Converter.__iter__: cleanup and reorganization,
68 rename Converter -> TextCodeConverter.
73 """pylit: bidirectional converter between a *text source* with embedded
74 computer code and a *code source* with embedded documentation.
77 __docformat__ = 'restructuredtext'
85 PyLit is a bidirectional converter between two formats of a computer
88 * a (reStructured) text document with program code embedded in
90 * a compilable (or executable) code source with *documentation* embedded in
110 The `defaults` object provides a central repository for default values
111 and their customisation. ::
113 defaults = optparse.Values()
117 * the initialization of data arguments in TextCodeConverter_ and
120 * completion of command line options in `PylitOptions.complete_values`_.
122 This allows the easy creation of custom back-ends that customise the
123 defaults and then call main_ e.g.:
126 >>> defaults.comment_string = "## "
127 >>> defaults.codeindent = 4
130 The following default values are defined in pylit.py:
135 Mapping of code file extension to code language::
137 defaults.languages = {".py": "python",
141 Used by `OptionValues.complete`_ to set the `defaults.language`. The
142 ``--language`` command line option or setting ``defaults.language`` in
143 programmatic use overrides this auto-setting feature.
145 defaults.fallback_language
146 ~~~~~~~~~~~~~~~~~~~~~~~~~~
148 Language to use, if there is no matching extension (e.g. if pylit is used as
149 filter) and no `language` is specified::
151 defaults.fallback_language = "python"
153 defaults.code_extensions
154 ~~~~~~~~~~~~~~~~~~~~~~~~
156 List of known extensions for source code files::
158 defaults.code_extensions = defaults.languages.keys()
160 Used in `OptionValues.complete`_ to auto-determine the conversion direction
161 from the input and output file names.
163 defaults.text_extensions
164 ~~~~~~~~~~~~~~~~~~~~~~~~
166 List of known extensions of (reStructured) text files::
168 defaults.text_extensions = [".txt"]
170 Used by `OptionValues._get_outfile` to auto-determine the output filename.
172 defaults.comment_strings
173 ~~~~~~~~~~~~~~~~~~~~~~~~
175 Dictionary of comment strings for known languages. Comment strings include
176 trailing whitespace. ::
178 defaults.comment_strings = {"python": '# ',
182 Used in Code2Text_ to recognise text blocks and in Text2Code_ to format
183 text blocks as comments.
185 defaults.header_string
186 ~~~~~~~~~~~~~~~~~~~~~~
188 Marker string for a header code block in the text source. No trailing
189 whitespace needed as indented code follows. Default is a comment marker::
191 defaults.header_string = '..'
193 Must be a valid rst directive that accepts code on the same line, e.g.
194 ``'..admonition::'``.
199 Export to the output format stripping documentation or code blocks::
201 defaults.strip = False
203 defaults.preprocessors
204 ~~~~~~~~~~~~~~~~~~~~~~
206 Preprocess the data with language-specific filters_::
208 defaults.preprocessors = {}
210 defaults.postprocessors
211 ~~~~~~~~~~~~~~~~~~~~~~~
213 Postprocess the data with language-specific filters_::
215 defaults.postprocessors = {}
220 Number of spaces to indent code blocks in `Code2Text.code_block_handler`_::
222 defaults.codeindent = 2
224 In `Text2Code.code_block_handler`_, the codeindent is determined by the
225 first recognized code line (leading comment or first indented literal block
231 What to do if the outfile already exists? (ignored if `outfile` == '-')::
233 defaults.overwrite = 'update'
237 :'yes': overwrite eventually existing `outfile`,
238 :'update': fail if the `outfile` is newer than `infile`,
239 :'no': fail if `outfile` exists.
245 The converter classes implement a simple state machine to separate and
246 transform documentation and code blocks. For this task, only a very limited
247 parsing is needed. PyLit's parser assumes:
249 * indented literal blocks in a text source are code blocks.
251 * comment lines that start with a matching comment string in a code source
252 are documentation blocks.
258 class TextCodeConverter(object):
259 """Parent class for the converters `Text2Code` and `Code2Text`.
262 The parent class defines data attributes and functions used in both
265 converting a text source to executable code source, and
267 converting commented code to a text source.
272 Class default values are fetched from the `defaults`_ object and can be
273 overridden by matching keyword arguments during class instantiation. This
274 also works with keyword arguments to `get_converter`_ and `main`_, as these
275 functions pass on unused keyword args to the instantiation of a converter
278 language = defaults.fallback_language
279 comment_strings = defaults.comment_strings
280 comment_string = "" # set in __init__
281 codeindent = defaults.codeindent
282 header_string = defaults.header_string
283 strip = defaults.strip
285 TextCodeConverter.__init__
286 ~~~~~~~~~~~~~~~~~~~~~~~~~~
288 Initializing sets up the `data` attribute, an iterable object yielding lines
289 of the source to convert. [1]_ Additional keyword arguments are stored as
290 data attributes, overwriting the class defaults. If not given as keyword
291 argument, `comment_string` is set to the language's default comment
294 def __init__(self, data, **keyw):
295 """data -- iterable data object
296 (list, file, generator, string, ...)
297 **keyw -- remaining keyword arguments are
298 stored as data-attributes
301 self.__dict__.update(keyw)
302 if not self.comment_string:
303 self.comment_string = self.comment_strings[self.language]
304 self.preprocessor = self.get_filter("preprocessors", self.language)
305 self.postprocessor = self.get_filter("postprocessors", self.language)
307 .. [1] The most common choice of data is a `file` object with the text
310 To convert a string into a suitable object, use its splitlines method
311 with the optional `keepends` argument set to True.
314 TextCodeConverter.__iter__
315 ~~~~~~~~~~~~~~~~~~~~~~~~~~
317 Return an iterator for `self`. Iteration yields lines of converted data.
319 The iterator is a chain of iterators acting on `self.data` that does
322 * text<->code format conversion
328 """Iterate over input data source and yield converted lines
330 return self.postprocessor(self.convert(self.preprocessor(self.data)))
333 TextCodeConverter.__call__
334 ~~~~~~~~~~~~~~~~~~~~~~~~~~
335 The special `__call__` method allows the use of class instances as callable
336 objects. It returns the converted data as list of lines::
339 """Iterate over state-machine and return results as list of lines"""
340 return [line for line in self]
343 TextCodeConverter.__str__
344 ~~~~~~~~~~~~~~~~~~~~~~~~~
345 Return converted data as string::
348 return "".join(self())
351 TextCodeConverter.get_filter
352 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
354 Filter the data by wrapping it in a language-specific pre- or
355 post-processing iterator. The filter must accept an iterable as first
356 argument and yield the processed input data linewise::
358 def get_filter(self, filter_set, language):
359 """Return language specific filter"""
360 if self.__class__ == Text2Code:
361 key = "text2"+language
362 elif self.__class__ == Code2Text:
363 key = language+"2text"
367 return getattr(defaults, filter_set)[key]
368 except (AttributeError, KeyError):
369 # print "there is no %r filter in %r"%(key, filter_set)
371 return identity_filter
374 TextCodeConverter.get_indent
375 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
376 Return the number of leading spaces in `line` after expanding tabs ::
378 def get_indent(self, line):
379 """Return the indentation of `string`.
381 # line = line.expandtabs() # now done in `collect_blocks`
382 return len(line) - len(line.lstrip())
385 TextCodeConverter.collect_blocks
386 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
388 A generator function to aggregate "paragraphs" (blocks separated by blank
391 def collect_blocks(self, lines):
392 """collect lines in a list
394 yield list for each paragraph, i.e. block of lines seperated by a
395 blank line (whitespace only).
397 Also expand hard-tabs as these will lead to errors in indentation
398 (see `str.expandtabs`).
402 block.append(line.expandtabs())
403 if not line.rstrip():
412 The `Text2Code` class separates code blocks (indented literal blocks) from
413 documentation. Code blocks are unindented, documentation is commented
414 (or filtered, if the ``strip`` option is True.
416 Only `indented literal blocks` are extracted. `quoted literal blocks` and
417 `pydoc blocks` are treated as text. This allows the easy inclusion of
423 .. [#] Mark that there is no double colon before the doctest block in
426 Using the full blown docutils_ rst parser would introduce a large overhead
427 and slow down the conversion.
430 class Text2Code(TextCodeConverter):
431 """Convert a (reStructured) text source to code source
436 This is the core state machine of the converter class::
438 def convert(self, lines):
439 """Iterate over lists of text lines and convert them to code format
441 # Initialize data arguments
443 # Done here, so that every new iteration re-initializes them.
446 # :"header": first block -> check for leading `header_string`
447 # :"documentation": documentation part: comment out
448 # :"code_block": literal blocks containing source code: unindent
455 * stripped from all 'code_block' lines.
456 * set in `Text2Code.code_block_handler`_ to the indent of first non-blank
461 self._codeindent = None
464 * set by `Text2Code.documentation_handler`_ to the minimal indent of a
466 * used in `Text2Code.set_state`_ to find the end of a code block
472 Determine the state of the block and convert with the matching "handler"::
474 for block in self.collect_blocks(lines):
475 self.set_state(block)
476 for line in getattr(self, self.state+"_handler")(block):
481 ~~~~~~~~~~~~~~~~~~~~~
484 def set_state(self, block):
485 """Determine state of `block`. Set `self.state`
491 The new state depends on the active state (from the last block) and
492 features of the current block. It is either "header", "documentation", or
495 If the current state is "" (first block), check for
496 the `header_string` indicating a leading code block::
499 # print "set state for %r"%block
500 if block[0].startswith(self.header_string):
501 self.state = "header"
503 self.state = "documentation"
505 If the current state is "documentation", the next block is also
506 documentation. The end of a documentation part is detected in the
507 `Text2Code.documentation_handler`::
509 # elif self.state == "documentation":
510 # self.state = "documentation"
512 A "code_block" ends with the first less indented, nonblank line.
513 `_textindent` is set by the documentation handler to the indent of the
514 preceding documentation block::
516 elif self.state in ["code_block", "header"]:
517 indents = [self.get_indent(line) for line in block]
518 if indents and min(indents) <= self._textindent:
519 self.state = 'documentation'
521 self.state = 'code_block'
523 TODO: (or not to do?) insert blank line before the first line with too-small
524 codeindent using self.ensure_trailing_blank_line(lines, line) (would need
525 split and push-back of the documentation part)?
527 Text2Code.header_handler
528 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
532 def header_handler(self, lines):
533 """Format leading code block"""
534 # strip header string from first line
535 lines[0] = lines[0].replace(self.header_string, "", 1)
536 # yield remaining lines formatted as code-block
537 for line in self.code_block_handler(lines):
541 Text2Code.documentation_handler
542 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
544 The 'documentation' handler processes everything that is not an indented
545 literal comment. Documentation is quoted with `self.comment_string` or
546 filtered (with `--strip=True`). ::
548 def documentation_handler(self, lines):
549 """Convert documentation blocks from text to code format
552 lines = [self.comment_string + line for line in lines]
554 Test for the end of the documentation block: does the second last line end with
555 `::` but is neither a comment nor a directive?
556 If end-of-documentation marker is detected,
558 * set state to 'code_block'
559 * set `self._textindent` (needed by `Text2Code.set_state`_ to find the
560 next "documentation" block)
561 * remove the comment from the last line again (it's a separator between documentation
564 TODO: allow different code marking directives (for syntax color etc)
569 except IndexError: # len(lines < 2), e.g. last line of document
572 if (line.rstrip().endswith("::")
573 and not line.lstrip().startswith("..")):
574 self.state = "code_block"
575 self._textindent = self.get_indent(line)
576 lines[-1] = lines[-1].replace(self.comment_string, "", 1)
584 TODO: Ensure a trailing blank line? Would need to test all
585 documentation lines for end-of-documentation marker and add a line by calling the
586 `ensure_trailing_blank_line` method (which also issues a warning)
589 Text2Code.code_block_handler
590 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
592 The "code_block" handler is called with an indented literal block. It
593 removes leading whitespace up to the indentation of the first code line in
594 the file (this deviation from docutils behaviour allows indented blocks of
597 def code_block_handler(self, block):
598 """Convert indented literal blocks to source code format
601 If still unset, determine the indentation of code blocks from first non-blank
604 if self._codeindent is None:
605 self._codeindent = self.get_indent(block[0])
607 Yield unindented lines::
611 Check if we can safely unindent. If the line is less indented then
612 `_codeindent`, something got wrong. ::
614 if line.lstrip() and self.get_indent(line) < self._codeindent:
615 raise ValueError, "code block contains line less indented " \
616 "than %d spaces \n%r"%(self._codeindent, block)
617 yield line.replace(" "*self._codeindent, "", 1)
623 The `Code2Text` class does the opposite of `Text2Code`_ -- it processes
624 valid source code, extracts documentation from comment blocks, and puts
625 program code in literal blocks.
627 The class is derived from the TextCodeConverter state machine and adds an
628 `__iter__` method as well as handlers for "documentation", and "code_block"
631 class Code2Text(TextCodeConverter):
632 """Convert code source to text source
639 def convert(self, lines):
641 (re) set initial state. The leading block can be either "documentation" or
642 "header". This will be set by `Code2Text.set_state`_.
648 If the last paragraph of a documentation block does not end with a
649 "code_block_marker" (by default, the literal-block marker ``::``), it
650 must be added (otherwise, the back-conversion fails.).
651 `code_block_marker_missing` is set by `Code2Text.documentation_handler`_
652 and evaluated by `Code2Text.code_block_handler`_. ::
654 self.code_block_marker_missing = False
656 Determine the state of the block return it processed with the matching
659 for block in self.collect_blocks(lines):
660 self.set_state(block)
661 for line in getattr(self, self.state+"_handler")(block):
668 Check if block is "header", "documentation", or "code_block":
670 A paragraph is "documentation", if every non-blank line starts with a matching
671 comment string (including whitespace except for commented blank lines) ::
673 def set_state(self, block):
674 """Determine state of `block`."""
676 # skip documentation lines (commented or blank)
677 if line.startswith(self.comment_string):
679 if not line.rstrip(): # blank line
681 if line.rstrip() == self.comment_string.rstrip(): # blank comment
683 # non-documentation line found: the block is "header" or "code_block"
685 self.state = "header"
687 self.state = "code_block"
690 self.state = "documentation"
693 Code2Text.header_handler
694 ~~~~~~~~~~~~~~~~~~~~~~~~
696 Sometimes code needs to remain on the first line(s) of the document to be
697 valid. The most common example is the "shebang" line that tells a POSIX
698 shell how to process an executable file::
700 #!/usr/bin/env python
702 In Python, the ``# -*- coding: iso-8859-1 -*-`` line must occure before any
703 other comment or code.
705 If we want to keep the line numbers in sync for text and code source, the
706 reStructured Text markup for these header lines must start at the same line
707 as the first header line. Therfore, header lines could not be marked as
708 literal block (this would require the ``::`` and an empty line above the code_block).
710 OTOH, a comment may start at the same line as the comment marker and it
711 includes subsequent indented lines. Comments are visible in the reStructured
712 Text source but hidden in the pretty-printed output.
714 With a header converted to comment in the text source, everything before the
715 first documentation block (i.e. before the first paragraph using the matching comment
716 string) will be hidden away (in HTML or PDF output).
718 This seems a good compromise, the advantages
720 * line numbers are kept
721 * the "normal" code_block conversion rules (indent/unindent by `codeindent` apply
722 * greater flexibility: you can hide a repeating header in a project
723 consisting of many source files.
725 set off the disadvantages
727 - it may come as surprise if a part of the file is not "printed",
728 - one more syntax element to learn for rst newbees to start with pylit,
729 (however, starting from the code source, this will be auto-generated)
731 In the case that there is no matching comment at all, the complete code
732 source will become a comment -- however, in this case it is not very likely
733 the source is a literate document anyway.
735 If needed for the documentation, it is possible to quote the header in (or
736 after) the first documentation block, e.g. as `parsed literal`.
740 def header_handler(self, lines):
741 """Format leading code block"""
742 if self.strip == True:
744 # get iterator over the lines that formats them as code-block
745 lines = iter(self.code_block_handler(lines))
746 # prepend header string to first line
747 yield self.header_string + lines.next()
748 # yield remaining lines
752 Code2Text.documentation_handler
753 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
755 The *documentation state* handler converts a comment to a documentation block by
756 stripping the leading `comment string` from every line::
758 def documentation_handler(self, lines):
759 """Uncomment documentation blocks in source code
762 If the code block is stripped, the literal marker would lead to
763 an error when the text is converted with docutils. Strip it as well.
764 Otherwise, check for the code block marker (``::``) at the end of
765 the documentation block::
768 self.strip_literal_marker(lines)
771 self.code_block_marker_missing = not(lines[-2].rstrip().endswith("::"))
772 except IndexError: # len(lines < 2), e.g. last line of document
773 self.code_block_marker_missing = True
775 Strip comment strings and yield lines. Consider the case that a blank line
776 has a comment string without trailing whitespace::
778 stripped_comment_string = self.comment_string.rstrip()
781 line = line.replace(self.comment_string, "", 1)
782 if line.rstrip() == stripped_comment_string:
783 line = line.replace(stripped_comment_string, "", 1)
787 Code2Text.code_block_handler
788 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
790 The `code_block` handler returns the code block as indented literal
791 block (or filters it, if ``self.strip == True``). The amount of the code
792 indentation is controled by `self.codeindent` (default 2). ::
794 def code_block_handler(self, lines):
795 """Covert code blocks to text format (indent or strip)
797 if self.strip == True:
799 # eventually insert transition marker
800 if self.code_block_marker_missing:
801 self.state = "documentation"
804 self.state = "code_block"
806 yield " "*self.codeindent + line
810 Code2Text.strip_literal_marker
811 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
813 Replace the literal marker with the equivalent of docutils replace rules
815 * strip `::`-line (and preceding blank line) if on a line on its own
816 * strip `::` if it is preceded by whitespace.
817 * convert `::` to a single colon if preceded by text
819 `lines` should be a list of documentation lines (with a trailing blank line).
820 It is modified in-place::
822 def strip_literal_marker(self, lines):
825 except IndexError: # len(lines < 2)
828 # split at rightmost '::'
830 (head, tail) = line.rsplit('::', 1)
831 except ValueError: # only one part (no '::')
834 # '::' on an extra line
837 # delete preceding line if it is blank
838 if len(lines) >= 2 and not lines[-2].lstrip():
840 # '::' follows whitespace
841 elif head.rstrip() < head:
843 lines[-2] = "".join((head, tail))
846 lines[-2] = ":".join((head, tail))
851 Filters allow pre- and post-processing of the data to bring it in a format
852 suitable for the "normal" text<->code conversion. An example is conversion
853 of `C` ``/*`` ``*/`` comments into C++ ``//`` comments (and back).
855 Filters are generator functions that return an iterator that acts on a
856 `data` iterable and returns processed `data` items (lines).
858 The most basic filter is the identity filter, that returns its argument as
861 def identity_filter(data):
867 Using this script from the command line will convert a file according to its
868 extension. This default can be overridden by a couple of options.
873 How to determine which source is up-to-date?
874 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
876 - set modification date of `oufile` to the one of `infile`
878 Points out that the source files are 'synchronized'.
880 * Are there problems to expect from "backdating" a file? Which?
882 Looking at http://www.unix.com/showthread.php?t=20526, it seems
883 perfectly legal to set `mtime` (while leaving `ctime`) as `mtime` is a
884 description of the "actuality" of the data in the file.
886 * Should this become a default or an option?
888 - alternatively move input file to a backup copy (with option: `--replace`)
890 - check modification date before overwriting
891 (with option: `--overwrite=update`)
893 - check modification date before editing (implemented as `Jed editor`_
894 function `pylit_check()` in `pylit.sl`_)
896 .. _Jed editor: http://www.jedsoft.org/jed/
897 .. _pylit.sl: http://jedmodes.sourceforge.net/mode/pylit/
899 Recognised Filename Extensions
900 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
902 Instead of defining a new extension for "pylit" literate programms,
903 by default ``.txt`` will be appended for the text source and stripped by
904 the conversion to the code source. I.e. for a Python program foo:
906 * the code source is called ``foo.py``
907 * the text source is called ``foo.py.txt``
908 * the html rendering is called ``foo.py.html``
914 The following class adds `as_dict` and `__getattr__` methods to
917 class OptionValues(optparse.Values):
922 For use as keyword arguments, it is handy to have the options in a
923 dictionary. `as_dict` returns a copy of the instances object dictionary::
926 """Return options as dictionary object"""
927 return self.__dict__.copy()
929 OptionValues.complete
930 ~~~~~~~~~~~~~~~~~~~~~
934 def complete(self, **keyw):
936 Complete the option values with keyword arguments.
938 Do not overwrite existing values. Only use arguments that do not
939 have a corresponding attribute in `self`,
942 if not self.__dict__.has_key(key):
943 setattr(self, key, keyw[key])
945 OptionValues.__getattr__
946 ~~~~~~~~~~~~~~~~~~~~~~~~
948 To replace calls using ``options.ensure_value("OPTION", None)`` with the
949 more concise ``options.OPTION``, we define `__getattr__` [#]_ ::
951 def __getattr__(self, name):
952 """Return default value for non existing options"""
956 .. [#] The special method `__getattr__` is only called when an attribute
957 lookup has not found the attribute in the usual places (i.e. it is
958 not an instance attribute nor is it found in the class tree for
965 The `PylitOptions` class comprises an option parser and methods for parsing
966 and completion of command line options::
968 class PylitOptions(object):
969 """Storage and handling of command line options for pylit"""
977 """Set up an `OptionParser` instance for pylit command line options
980 p = optparse.OptionParser(usage=main.__doc__, version=_version)
982 p.add_option("-c", "--code2txt", dest="txt2code", action="store_false",
983 help="convert code source to text source")
984 p.add_option("--comment-string", dest="comment_string",
985 help="documentation block marker (default '# ' (for python))" )
986 p.add_option("-d", "--diff", action="store_true",
987 help="test for differences to existing file")
988 p.add_option("--doctest", action="store_true",
989 help="run doctest.testfile() on the text version")
990 p.add_option("-e", "--execute", action="store_true",
991 help="execute code (Python only)")
992 p.add_option("--language", action="store",
993 choices = defaults.languages.values(),
994 help="use LANGUAGE native comment style")
995 p.add_option("--overwrite", action="store",
996 choices = ["yes", "update", "no"],
997 help="overwrite output file (default 'update')")
998 p.add_option("--replace", action="store_true",
999 help="move infile to a backup copy (appending '~')")
1000 p.add_option("-s", "--strip", action="store_true",
1001 help="export by stripping documentation or code")
1002 p.add_option("-t", "--txt2code", action="store_true",
1003 help="convert text source to code source")
1007 PylitOptions.parse_args
1008 ~~~~~~~~~~~~~~~~~~~~~~~
1010 The `parse_args` method calls the `optparse.OptionParser` on command
1011 line or provided args and returns the result as `PylitOptions.Values`
1012 instance. Defaults can be provided as keyword arguments::
1014 def parse_args(self, args=sys.argv[1:], **keyw):
1015 """parse command line arguments using `optparse.OptionParser`
1017 parse_args(args, **keyw) -> OptionValues instance
1019 args -- list of command line arguments.
1020 keyw -- keyword arguments or dictionary of option defaults
1023 (values, args) = self.parser.parse_args(args, OptionValues(keyw))
1024 # Convert FILE and OUTFILE positional args to option values
1025 # (other positional arguments are ignored)
1027 values.infile = args[0]
1028 values.outfile = args[1]
1034 PylitOptions.complete_values
1035 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1037 Complete an OptionValues instance `values`. Use module-level defaults and
1038 context information to set missing option values to sensible defaults (if
1041 def complete_values(self, values):
1042 """complete option values with module and context sensible defaults
1044 x.complete_values(values) -> values
1045 values -- OptionValues instance
1048 Complete with module-level defaults_::
1050 values.complete(**defaults.__dict__)
1052 Ensure infile is a string::
1054 values.ensure_value("infile", "")
1056 Guess conversion direction from `infile` filename::
1058 if values.txt2code is None:
1059 in_extension = os.path.splitext(values.infile)[1]
1060 if in_extension in values.text_extensions:
1061 values.txt2code = True
1062 elif in_extension in values.code_extensions:
1063 values.txt2code = False
1065 Auto-determine the output file name::
1067 values.ensure_value("outfile", self._get_outfile_name(values))
1069 Second try: Guess conversion direction from outfile filename::
1071 if values.txt2code is None:
1072 out_extension = os.path.splitext(values.outfile)[1]
1073 values.txt2code = not (out_extension in values.text_extensions)
1075 Set the language of the code::
1077 if values.txt2code is True:
1078 code_extension = os.path.splitext(values.outfile)[1]
1079 elif values.txt2code is False:
1080 code_extension = os.path.splitext(values.infile)[1]
1081 values.ensure_value("language",
1082 values.languages.get(code_extension,
1083 values.fallback_language))
1087 PylitOptions._get_outfile_name
1088 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1090 Construct a matching filename for the output file. The output filename is
1091 constructed from `infile` by the following rules:
1093 * '-' (stdin) results in '-' (stdout)
1094 * strip the `txt_extension` (txt2code) or
1095 * add a `txt_ extension` (code2txt)
1096 * fallback: if no guess can be made, add ".out"
1098 .. TODO: use values.outfile_extension if it exists?
1102 def _get_outfile_name(self, values):
1103 """Return a matching output filename for `infile`
1105 # if input is stdin, default output is stdout
1106 if values.infile == '-':
1109 # Derive from `infile` name: strip or add text extension
1110 (base, ext) = os.path.splitext(values.infile)
1111 if ext in values.text_extensions:
1113 if ext in values.code_extensions or values.txt2code == False:
1114 return values.infile + values.text_extensions[0] # add
1116 return values.infile + ".out"
1118 PylitOptions.__call__
1119 ~~~~~~~~~~~~~~~~~~~~~
1121 The special `__call__` method allows to use PylitOptions instances as
1122 *callables*: Calling an instance parses the argument list to extract option
1123 values and completes them based on "context-sensitive defaults". Keyword
1124 arguments are passed to `PylitOptions.parse_args`_ as default values. ::
1126 def __call__(self, args=sys.argv[1:], **keyw):
1127 """parse and complete command line args return option values
1129 values = self.parse_args(args, **keyw)
1130 return self.complete_values(values)
1140 Return file objects for in- and output. If the input path is missing,
1141 write usage and abort. (An alternative would be to use stdin as default.
1142 However, this leaves the uninitiated user with a non-responding application
1143 if (s)he just tries the script without any arguments) ::
1145 def open_streams(infile = '-', outfile = '-', overwrite='update', **keyw):
1146 """Open and return the input and output stream
1148 open_streams(infile, outfile) -> (in_stream, out_stream)
1150 in_stream -- file(infile) or sys.stdin
1151 out_stream -- file(outfile) or sys.stdout
1152 overwrite -- 'yes': overwrite eventually existing `outfile`,
1153 'update': fail if the `outfile` is newer than `infile`,
1154 'no': fail if `outfile` exists.
1156 Irrelevant if `outfile` == '-'.
1159 strerror = "Missing input file name ('-' for stdin; -h for help)"
1160 raise IOError, (2, strerror, infile)
1162 in_stream = sys.stdin
1164 in_stream = file(infile, 'r')
1166 out_stream = sys.stdout
1167 elif overwrite == 'no' and os.path.exists(outfile):
1168 raise IOError, (1, "Output file exists!", outfile)
1169 elif overwrite == 'update' and is_newer(outfile, infile):
1170 raise IOError, (1, "Output file is newer than input file!", outfile)
1172 out_stream = file(outfile, 'w')
1173 return (in_stream, out_stream)
1180 def is_newer(path1, path2):
1181 """Check if `path1` is newer than `path2` (using mtime)
1183 Compare modification time of files at path1 and path2.
1185 Non-existing files are considered oldest: Return False if path1 doesnot
1186 exist and True if path2 doesnot exist.
1188 Return None for equal modification time. (This evaluates to False in a
1189 boolean context but allows a test for equality.)
1193 mtime1 = os.path.getmtime(path1)
1197 mtime2 = os.path.getmtime(path2)
1200 # print "mtime1", mtime1, path1, "\n", "mtime2", mtime2, path2
1202 if mtime1 == mtime2:
1204 return mtime1 > mtime2
1210 Get an instance of the converter state machine::
1212 def get_converter(data, txt2code=True, **keyw):
1214 return Text2Code(data, **keyw)
1216 return Code2Text(data, **keyw)
1227 def run_doctest(infile="-", txt2code=True,
1228 globs={}, verbose=False, optionflags=0, **keyw):
1229 """run doctest on the text source
1231 from doctest import DocTestParser, DocTestRunner
1232 (data, out_stream) = open_streams(infile, "-")
1234 If source is code, convert to text, as tests in comments are not found by
1237 if txt2code is False:
1238 converter = Code2Text(data, **keyw)
1239 docstring = str(converter)
1241 docstring = data.read()
1243 Use the doctest Advanced API to do all doctests in a given string::
1245 test = DocTestParser().get_doctest(docstring, globs={}, name="",
1246 filename=infile, lineno=0)
1247 runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
1250 if not runner.failures:
1251 print "%d failures in %d tests"%(runner.failures, runner.tries)
1252 return runner.failures, runner.tries
1260 def diff(infile='-', outfile='-', txt2code=True, **keyw):
1261 """Report differences between converted infile and existing outfile
1263 If outfile is '-', do a round-trip conversion and report differences
1268 instream = file(infile)
1269 # for diffing, we need a copy of the data as list::
1270 data = instream.readlines()
1272 converter = get_converter(data, txt2code, **keyw)
1276 outstream = file(outfile)
1277 old = outstream.readlines()
1279 newname = "<conversion of %s>"%infile
1283 # back-convert the output data
1284 converter = get_converter(new, not txt2code)
1286 newname = "<round-conversion of %s>"%infile
1288 # find and print the differences
1289 is_different = False
1290 # print type(old), old
1291 # print type(new), new
1292 delta = difflib.unified_diff(old, new,
1293 # delta = difflib.unified_diff(["heute\n", "schon\n"], ["heute\n", "noch\n"],
1294 fromfile=oldname, tofile=newname)
1298 if not is_different:
1301 print "no differences found"
1308 Works only for python code.
1310 Doesnot work with `eval`, as code is not just one expression. ::
1312 def execute(infile="-", txt2code=True, **keyw):
1313 """Execute the input file. Convert first, if it is a text source.
1318 data = str(Text2Code(data, **keyw))
1319 # print "executing " + options.infile
1326 If this script is called from the command line, the `main` function will
1327 convert the input (file or stdin) between text and code formats.
1329 Option default values for the conversion can be given as keyword arguments
1330 to `main`_. The option defaults will be updated by command line options and
1331 extended with "intelligent guesses" by `PylitOptions`_ and passed on to
1332 helper functions and the converter instantiation.
1334 This allows easy customization for programmatic use -- just call `main`
1335 with the appropriate keyword options, e.g.:
1337 >>> main(comment_string="## ")
1341 def main(args=sys.argv[1:], **defaults):
1342 """%prog [options] INFILE [OUTFILE]
1344 Convert between (reStructured) text source with embedded code,
1345 and code source with embedded documentation (comment blocks)
1347 The special filename '-' stands for standard in and output.
1350 Parse and complete the options::
1352 options = PylitOptions()(args, **defaults)
1353 # print "infile", repr(options.infile)
1355 Special actions with early return::
1358 return run_doctest(**options.as_dict())
1361 return diff(**options.as_dict())
1364 return execute(**options.as_dict())
1366 Open in- and output streams::
1369 (data, out_stream) = open_streams(**options.as_dict())
1371 print "IOError: %s %s" % (ex.filename, ex.strerror)
1374 Get a converter instance::
1376 converter = get_converter(data, **options.as_dict())
1378 Convert and write to out_stream::
1380 out_stream.write(str(converter))
1382 if out_stream is not sys.stdout:
1383 print "extract written to", out_stream.name
1386 If input and output are from files, set the modification time (`mtime`) of
1387 the output file to the one of the input file to indicate that the contained
1388 information is equal. [#]_ ::
1391 os.utime(options.outfile, (os.path.getatime(options.outfile),
1392 os.path.getmtime(options.infile))
1397 ## print "mtime", os.path.getmtime(options.infile), options.infile
1398 ## print "mtime", os.path.getmtime(options.outfile), options.outfile
1401 .. [#] Make sure the corresponding file object (here `out_stream`) is
1402 closed, as otherwise the change will be overwritten when `close` is
1403 called afterwards (either explicitely or at program exit).
1406 Rename the infile to a backup copy if ``--replace`` is set::
1409 os.rename(options.infile, options.infile + "~")
1412 Run main, if called from the command line::
1414 if __name__ == '__main__':
1421 Open questions and ideas for further development
1426 * can we gain from using "shutils" over "os.path" and "os"?
1427 * use pylint or pyChecker to enfoce a consistent style?
1432 * Use templates for the "intelligent guesses" (with Python syntax for string
1433 replacement with dicts: ``"hello %(what)s" % {'what': 'world'}``)
1435 * Is it sensible to offer the `header_string` option also as command line
1441 ----------------------
1443 * How can I include a literal block that should not be in the
1444 executable code (e.g. an example, an earlier version or variant)?
1448 - Use a `parsed-literal block`_ directive if there is no "accidential"
1449 markup in the literal code
1451 - Use a `line block`_ directive or the `line block syntax`_
1452 and mark all lines as `inline literals`_.
1454 - Python session examples and doctests can use `doctest block`_ syntax
1456 No double colon! Start first line of block with ``>>>``.
1459 Not implemented yet:
1461 - use a dedicated `code-block directive`_ or a distinct directive for
1462 ordinary literal blocks.
1464 * Ignore "matching comments" in literal strings?
1466 Too complicated: Would need a specific detection algorithm for every
1467 language that supports multi-line literal strings (C++, PHP, Python)
1469 * Warn if a comment in code will become documentation after round-trip?
1471 code-block directive
1472 --------------------
1474 In a document where code examples are only one of several uses of literal
1475 blocks, it would be more appropriate to single out the sourcecode with a
1476 dedicated "code-block" directive.
1478 Some highlight plug-ins require a special "sourcecode" or "code-block"
1479 directive instead of the ``::`` literal block marker. Actually,
1480 syntax-highlight is possible without changes to docutils with the Pygments_
1481 package using a "code-block" directive. See the `syntax highlight`_ section
1482 in the features documentation.
1486 * provide a "code-block-marker" string option.
1488 * correctly handle the case of ``code_block_marker == '::'`` and conversion
1489 of ``::`` to a different "code_block_marker" -- consider minimized forms.
1491 doctstrings in code blocks
1492 --------------------------
1494 * How to handle docstrings in code blocks? (it would be nice to convert them
1495 to rst-text if ``__docformat__ == restructuredtext``)
1497 TODO: Ask at docutils users|developers
1501 .. _docutils: http://docutils.sourceforge.net/
1503 http://docutils.sf.net/docs/ref/rst/restructuredtext.html#doctest-blocks
1504 .. _parsed-literal block:
1505 http://docutils.sf.net/docs/ref/rst/directives.html#parsed-literal-block
1507 http://docutils.sourceforge.net/docs/ref/rst/directives.html#line-block
1508 .. _line block syntax:
1509 http://docutils.sf.net/docs/ref/rst/restructuredtext.html#line-blocks
1510 .. _inline literals:
1511 http://docutils.sf.net/docs/ref/rst/restructuredtext.html#inline-literals
1512 .. _pygments: http://pygments.org/
1513 .. _syntax highlight: ../features/syntax-highlight.html