1 .. #!/usr/bin/env python
2 # -*- coding: iso-8859-1 -*-
4 ===============================================================
5 pylit.py: Literate programming with reStructuredText
6 ===============================================================
9 :Version: SVN-Revision $Revision$
11 :Copyright: 2005, 2007 Guenter Milde.
12 Released under the terms of the GNU General Public License
24 :2005-06-29: Initial version.
25 :2005-06-30: First literate version.
26 :2005-07-01: Object orientated script using generators.
27 :2005-07-10: Two state machine (later added 'header' state).
28 :2006-12-04: Start of work on version 0.2 (code restructuring).
29 :2007-01-23: 0.2 Published at http://pylit.berlios.de.
30 :2007-01-25: 0.2.1 Outsourced non-core documentation to the PyLit pages.
31 :2007-01-26: 0.2.2 New behaviour of `diff` function.
32 :2007-01-29: 0.2.3 New `header` methods after suggestion by Riccardo Murri.
33 :2007-01-31: 0.2.4 Raise Error if code indent is too small.
34 :2007-02-05: 0.2.5 New command line option --comment-string.
35 :2007-02-09: 0.2.6 Add section with open questions,
36 Code2Text: let only blank lines (no comment str)
37 separate text and code,
38 fix `Code2Text.header`.
39 :2007-02-19: 0.2.7 Simplify `Code2Text.header`,
40 new `iter_strip` method replacing a lot of ``if``-s.
41 :2007-02-22: 0.2.8 Set `mtime` of outfile to the one of infile.
42 :2007-02-27: 0.3 New `Code2Text` converter after an idea by Riccardo Murri,
43 explicite `option_defaults` dict for easier customization.
44 :2007-03-02: 0.3.1 Expand hard-tabs to prevent errors in indentation,
45 `Text2Code` now also works on blocks,
46 removed dependency on SimpleStates module.
47 :2007-03-06: 0.3.2 Bugfix: do not set `language` in `option_defaults`
48 renamed `code_languages` to `languages`.
49 :2007-03-16: 0.3.3 New language css,
50 option_defaults -> defaults = optparse.Values(),
51 simpler PylitOptions: don't store parsed values,
52 don't parse at initialization,
53 OptionValues: return `None` for non-existing attributes,
54 removed -infile and -outfile, use positional arguments.
55 :2007-03-19: 0.3.4 Documentation update,
56 separate `execute` function.
57 :2007-03-21: Code cleanup in `Text2Code.__iter__`.
58 :2007-03-23: 0.3.5 Removed "css" from known languages after learning that
59 there is no C++ style "// " comment string in CSS2.
60 :2007-04-24: 0.3.6 Documentation update.
61 :2007-05-18: 0.4 Implement Converter.__iter__ as stack of iterator
62 generators. Iterating over a converter instance now
63 yields lines instead of blocks.
64 Provide "hooks" for pre- and postprocessing filters.
65 Rename states to avoid confusion with formats:
66 "text" -> "documentation", "code" -> "code_block".
67 :2007-05-22: 0.4.1 Converter.__iter__: cleanup and reorganization,
68 rename parent class Converter -> TextCodeConverter.
69 :2007-05-23: 0.4.2 Merged Text2Code.converter and Code2Text.converter into
70 TextCodeConverter.converter.
71 :2007-05-30: 0.4.3 Replaced use of defaults.code_extensions with
72 values.languages.keys().
73 Removed spurious `print` statement in code_block_handler.
74 Added basic support for 'c' and 'css' languages
75 with `dumb_c_preprocessor`_ and `dumb_c_postprocessor`_.
76 :2007-06-06: 0.5 Moved `collect_blocks`_ out of `TextCodeConverter`_,
77 bugfix: collect all trailing blank lines into a block.
78 Expand tabs with `expandtabs_filter`_.
79 :2007-06-20: 0.6 Configurable code-block marker (default ``::``)
83 """pylit: bidirectional converter between a *text source* with embedded
84 computer code and a *code source* with embedded documentation.
87 __docformat__ = 'restructuredtext'
95 PyLit is a bidirectional converter between two formats of a computer
98 * a (reStructured) text document with program code embedded in
100 * a compilable (or executable) code source with *documentation* embedded in
120 The `defaults` object provides a central repository for default values
121 and their customisation. ::
123 defaults = optparse.Values()
127 * the initialization of data arguments in TextCodeConverter_ and
130 * completion of command line options in `PylitOptions.complete_values`_.
132 This allows the easy creation of custom back-ends that customise the
133 defaults and then call main_ e.g.:
136 >>> defaults.comment_string = "## "
137 >>> defaults.codeindent = 4
140 The following default values are defined in pylit.py:
145 Mapping of code file extension to code language.
146 Used by `OptionValues.complete`_ to set the `defaults.language`.
147 The ``--language`` command line option or setting ``defaults.language`` in
148 programmatic use override this auto-setting feature. ::
150 defaults.languages = {".py": "python",
157 defaults.fallback_language
158 ~~~~~~~~~~~~~~~~~~~~~~~~~~
160 Language to use, if there is no matching extension (e.g. if pylit is used as
161 filter) and no `language` is specified::
163 defaults.fallback_language = "python"
165 defaults.text_extensions
166 ~~~~~~~~~~~~~~~~~~~~~~~~
168 List of known extensions of (reStructured) text files.
169 Used by `OptionValues._get_outfile` to auto-determine the output filename.
172 defaults.text_extensions = [".txt"]
175 defaults.comment_strings
176 ~~~~~~~~~~~~~~~~~~~~~~~~
178 Dictionary of comment strings for known languages. Comment strings include
179 trailing whitespace. ::
181 defaults.comment_strings = {"python": '# ',
187 Used in Code2Text_ to recognise text blocks and in Text2Code_ to format
188 text blocks as comments.
190 defaults.header_string
191 ~~~~~~~~~~~~~~~~~~~~~~
193 Marker string for a header code block in the text source. No trailing
194 whitespace needed as indented code follows.
195 Must be a valid rst directive that accepts code on the same line, e.g.
196 ``'..admonition::'``.
198 Default is a comment marker::
200 defaults.header_string = '..'
202 defaults.code_block_marker
203 ~~~~~~~~~~~~~~~~~~~~~~~~~~
205 Marker string for a code block in the text source.
207 Default is a literal-block marker::
209 defaults.code_block_marker = '::'
211 In a document where code examples are only one of several uses of literal
212 blocks, it is more appropriate to single out the sourcecode with a dedicated
213 "code-block" directive.
215 Some highlight plug-ins require a special "sourcecode" or "code-block"
216 directive instead of the ``::`` literal block marker. Actually,
217 syntax-highlight is possible without changes to docutils with the Pygments_
218 package using a "code-block" directive. See the `syntax highlight`_ section
219 in the features documentation.
221 The `code_block_marker` string is used in a regular expression. Examples for
222 alternative forms are ``.. code-block::`` or ``.. code-block:: .* python``.
223 The second example can differentiate between Python code blocks and
224 code-blocks in other languages.
226 Another use would be to mark some code-blocks inactive allowing a literate
227 source to contain code-blocks that should become active only in some cases.
234 Export to the output format stripping documentation or code blocks::
236 defaults.strip = False
238 defaults.strip_marker
239 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
241 Strip literal marker from the end of documentation blocks when
242 converting to code format. Makes the code more concise but looses the
243 synchronization of line numbers in text and code formats. Can also be used
244 (together with the auto-completion of the code-text conversion) to change
245 the `code_block_marker`::
247 defaults.strip_marker = False
249 defaults.preprocessors
250 ~~~~~~~~~~~~~~~~~~~~~~
252 Preprocess the data with language-specific filters_
253 Set below in Filters_::
255 defaults.preprocessors = {}
257 defaults.postprocessors
258 ~~~~~~~~~~~~~~~~~~~~~~~
260 Postprocess the data with language-specific filters_::
262 defaults.postprocessors = {}
267 Number of spaces to indent code blocks in `Code2Text.code_block_handler`_::
269 defaults.codeindent = 2
271 In `Text2Code.code_block_handler`_, the codeindent is determined by the
272 first recognized code line (header or first indented literal block
278 What to do if the outfile already exists? (ignored if `outfile` == '-')::
280 defaults.overwrite = 'update'
284 :'yes': overwrite eventually existing `outfile`,
285 :'update': fail if the `outfile` is newer than `infile`,
286 :'no': fail if `outfile` exists.
292 Try to import optional extensions::
303 The converter classes implement a simple state machine to separate and
304 transform documentation and code blocks. For this task, only a very limited
305 parsing is needed. PyLit's parser assumes:
307 * `indented literal blocks`_ in a text source are code blocks.
309 * comment blocks in a code source where every line starts with a matching
310 comment string are documentation blocks.
316 class TextCodeConverter(object):
317 """Parent class for the converters `Text2Code` and `Code2Text`.
320 The parent class defines data attributes and functions used in both
321 `Text2Code`_ converting a text source to executable code source, and
322 `Code2Text`_ converting commented code to a text source.
327 Class default values are fetched from the `defaults`_ object and can be
328 overridden by matching keyword arguments during class instantiation. This
329 also works with keyword arguments to `get_converter`_ and `main`_, as these
330 functions pass on unused keyword args to the instantiation of a converter
333 language = defaults.fallback_language
334 comment_strings = defaults.comment_strings
335 comment_string = "" # set in __init__ (if empty)
336 codeindent = defaults.codeindent
337 header_string = defaults.header_string
338 code_block_marker = defaults.code_block_marker
339 strip = defaults.strip
340 strip_marker = defaults.strip_marker
341 state = "" # type of current block, see `TextCodeConverter.convert`_
346 TextCodeConverter.__init__
347 """"""""""""""""""""""""""
349 Initializing sets the `data` attribute, an iterable object yielding lines of
350 the source to convert. [1]_
352 Additional keyword arguments are stored as instance variables, overwriting
353 the class defaults. If still empty, `comment_string` is set accordign to the
358 def __init__(self, data, **keyw):
359 """data -- iterable data object
360 (list, file, generator, string, ...)
361 **keyw -- remaining keyword arguments are
362 stored as data-attributes
365 self.__dict__.update(keyw)
366 if not self.comment_string:
367 self.comment_string = self.comment_strings[self.language]
369 Pre- and postprocessing filters are set (with
370 `TextCodeConverter.get_filter`_)::
372 self.preprocessor = self.get_filter("preprocessors", self.language)
373 self.postprocessor = self.get_filter("postprocessors", self.language)
375 Finally, the regular_expression for the `code_block_marker` is compiled to
376 find valid cases of code_block_marker in a given line and return the groups:
378 \1 prefix, \2 code_block_marker, \3 remainder
381 marker = self.code_block_marker
383 self.marker_regexp = re.compile('^( *(?!\.\.).*)(%s)([ \n]*)$'
386 # assume code_block_marker is a directive like '.. code-block::'
387 self.marker_regexp = re.compile('^( *)(%s)(.*)$' % marker)
389 .. [1] The most common choice of data is a `file` object with the text
392 To convert a string into a suitable object, use its splitlines method
393 like ``"2 lines\nof source".splitlines(True)``.
396 TextCodeConverter.__iter__
397 """"""""""""""""""""""""""
399 Return an iterator for the instance. Iteration yields lines of converted
402 The iterator is a chain of iterators acting on `self.data` that does
405 * text<->code format conversion
408 Pre- and postprocessing are only performed, if filters for the current
409 language are registered in `defaults.preprocessors`_ and|or
410 `defaults.postprocessors`_. The filters must accept an iterable as first
411 argument and yield the processed input data linewise.
415 """Iterate over input data source and yield converted lines
417 return self.postprocessor(self.convert(self.preprocessor(self.data)))
420 TextCodeConverter.__call__
421 """"""""""""""""""""""""""
422 The special `__call__` method allows the use of class instances as callable
423 objects. It returns the converted data as list of lines::
426 """Iterate over state-machine and return results as list of lines"""
427 return [line for line in self]
430 TextCodeConverter.__str__
431 """""""""""""""""""""""""
432 Return converted data as string::
435 return "".join(self())
438 Helpers and convenience methods
439 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
441 TextCodeConverter.convert
442 """""""""""""""""""""""""
444 The `convert` method generates an iterator that does the actual code <-->
445 text format conversion. The converted data is yielded line-wise and the
446 instance's `status` argument indicates whether the current line is "header",
447 "documentation", or "code_block"::
449 def convert(self, lines):
450 """Iterate over lines of a program document and convert
451 between "text" and "code" format
454 Initialise internal data arguments. (Done here, so that every new iteration
455 re-initialises them.)
458 the "type" of the currently processed block of lines. One of
460 :"": initial state: check for header,
461 :"header": leading code block: strip `header_string`,
462 :"documentation": documentation part: comment out,
463 :"code_block": literal blocks containing source code: unindent.
466 * Do not confuse the internal attribute `_codeindent` with the configurable
467 `codeindent` (without the leading underscore).
468 * `_codeindent` is set in `Text2Code.code_block_handler`_ to the indent of
469 first non-blank "code_block" line and stripped from all "code_block" lines
470 in the text-to-code conversion,
471 * `codeindent` is set in `__init__` to `defaults.codeindent`_ and added to
472 "code_block" lines in the code-to-text conversion.
475 * set by `Text2Code.documentation_handler`_ to the minimal indent of a
477 * used in `Text2Code.set_state`_ to find the end of a code block.
479 `code_block_marker_missing`
480 If the last paragraph of a documentation block does not end with a
481 "code_block_marker" (the literal-block marker ``::``), it must
482 be added (otherwise, the back-conversion fails.).
484 `code_block_marker_missing` is set by `Code2Text.documentation_handler`_
485 and evaluated by `Code2Text.code_block_handler`_, because the
486 documentation_handler does not know whether the next bloc will be
487 documentation (with no need for a code_block_marker) or a code block.
494 self.code_block_marker_missing = False
496 Determine the state of the block and convert with the matching "handler"::
498 for block in collect_blocks(expandtabs_filter(lines)):
499 self.set_state(block)
500 for line in getattr(self, self.state+"_handler")(block):
504 TextCodeConverter.get_filter
505 """"""""""""""""""""""""""""
508 def get_filter(self, filter_set, language):
509 """Return language specific filter"""
510 if self.__class__ == Text2Code:
511 key = "text2"+language
512 elif self.__class__ == Code2Text:
513 key = language+"2text"
517 return getattr(defaults, filter_set)[key]
518 except (AttributeError, KeyError):
519 # print "there is no %r filter in %r"%(key, filter_set)
521 return identity_filter
524 TextCodeConverter.get_indent
525 """"""""""""""""""""""""""""
526 Return the number of leading spaces in `line`::
528 def get_indent(self, line):
529 """Return the indentation of `string`.
531 return len(line) - len(line.lstrip())
537 The `Text2Code` converter separates *code-blocks* [#]_ from *documentation*.
538 Code blocks are unindented, documentation is commented (or filtered, if the
539 ``strip`` option is True).
541 .. [#] Only `indented literal blocks`_ are considered code-blocks. `quoted
542 literal blocks`_, `parsed-literal blocks`_, and `doctest blocks`_ are
543 treated as part of the documentation. This allows the inclusion of
549 Mark that there is no double colon before the doctest block in the
552 The class inherits the interface and helper functions from
553 TextCodeConverter_ and adds functions specific to the text-to-code format
556 class Text2Code(TextCodeConverter):
557 """Convert a (reStructured) text source to code source
561 ~~~~~~~~~~~~~~~~~~~~~
564 def set_state(self, block):
565 """Determine state of `block`. Set `self.state`
568 `set_state` is used inside an iteration. Hence, if we are out of data, a
569 StopItertion exception should be raised::
574 The new state depends on the active state (from the last block) and
575 features of the current block. It is either "header", "documentation", or
578 If the current state is "" (first block), check for
579 the `header_string` indicating a leading code block::
582 # print "set state for %r"%block
583 if block[0].startswith(self.header_string):
584 self.state = "header"
586 self.state = "documentation"
588 If the current state is "documentation", the next block is also
589 documentation. The end of a documentation part is detected in the
590 `Text2Code.documentation_handler`_::
592 # elif self.state == "documentation":
593 # self.state = "documentation"
595 A "code_block" ends with the first less indented, nonblank line.
596 `_textindent` is set by the documentation handler to the indent of the
597 preceding documentation block::
599 elif self.state in ["code_block", "header"]:
600 indents = [self.get_indent(line) for line in block]
601 # print "set_state:", indents, self._textindent
602 if indents and min(indents) <= self._textindent:
603 self.state = 'documentation'
605 self.state = 'code_block'
607 TODO: (or not to do?) insert blank line before the first line with too-small
608 codeindent using self.ensure_trailing_blank_line(lines, line) (would need
609 split and push-back of the documentation part)?
611 Text2Code.header_handler
612 ~~~~~~~~~~~~~~~~~~~~~~~~
614 Sometimes code needs to remain on the first line(s) of the document to be
615 valid. The most common example is the "shebang" line that tells a POSIX
616 shell how to process an executable file::
618 #!/usr/bin/env python
620 In Python, the special comment to indicate the encoding, e.g.
621 ``# -*- coding: iso-8859-1 -*-``, must occure before any other comment
624 If we want to keep the line numbers in sync for text and code source, the
625 reStructured Text markup for these header lines must start at the same line
626 as the first header line. Therfore, header lines could not be marked as
627 literal block (this would require the ``::`` and an empty line above the
630 OTOH, a comment may start at the same line as the comment marker and it
631 includes subsequent indented lines. Comments are visible in the reStructured
632 Text source but hidden in the pretty-printed output.
634 With a header converted to comment in the text source, everything before
635 the first documentation block (i.e. before the first paragraph using the
636 matching comment string) will be hidden away (in HTML or PDF output).
638 This seems a good compromise, the advantages
640 * line numbers are kept
641 * the "normal" code_block conversion rules (indent/unindent by `codeindent` apply
642 * greater flexibility: you can hide a repeating header in a project
643 consisting of many source files.
645 set off the disadvantages
647 - it may come as surprise if a part of the file is not "printed",
648 - one more syntax element to learn for rst newbees to start with pylit,
649 (however, starting from the code source, this will be auto-generated)
651 In the case that there is no matching comment at all, the complete code
652 source will become a comment -- however, in this case it is not very likely
653 the source is a literate document anyway.
655 If needed for the documentation, it is possible to quote the header in (or
656 after) the first documentation block, e.g. as `parsed literal`.
659 def header_handler(self, lines):
660 """Format leading code block"""
661 # strip header string from first line
662 lines[0] = lines[0].replace(self.header_string, "", 1)
663 # yield remaining lines formatted as code-block
664 for line in self.code_block_handler(lines):
668 Text2Code.documentation_handler
669 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
671 The 'documentation' handler processes everything that is not recognized as
672 "code_block". Documentation is quoted with `self.comment_string`
673 (or filtered with `--strip=True`). ::
675 def documentation_handler(self, lines):
676 """Convert documentation blocks from text to code format
679 Test for the end of the documentation block: does the second last line end
680 with `::` but is neither a comment nor a directive?
682 If end-of-documentation marker is detected,
684 * set state to 'code_block'
685 * set `self._textindent` (needed by `Text2Code.set_state`_ to find the
686 next "documentation" block)
687 * do not comment the last line (the blank line separating documentation
692 endnum = len(lines) - 2
693 for (num, line) in enumerate(lines):
695 if self.state == "code_block":
698 yield self.comment_string + line
699 if (num == endnum and self.marker_regexp.search(line)):
700 self.state = "code_block"
701 self._textindent = self.get_indent(line)
703 TODO: Ensure a trailing blank line? Would need to test all documentation
704 lines for end-of-documentation marker and add a line by calling the
705 `ensure_trailing_blank_line` method (which also issues a warning)
708 Text2Code.code_block_handler
709 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
711 The "code_block" handler is called with an indented literal block. It
712 removes leading whitespace up to the indentation of the first code line in
713 the file (this deviation from docutils behaviour allows indented blocks of
716 def code_block_handler(self, block):
717 """Convert indented literal blocks to source code format
720 If still unset, determine the indentation of code blocks from first non-blank
723 if self._codeindent == 0:
724 self._codeindent = self.get_indent(block[0])
726 Yield unindented lines after check whether we can safely unindent. If the
727 line is less indented then `_codeindent`, something got wrong. ::
730 if line.lstrip() and self.get_indent(line) < self._codeindent:
731 raise ValueError, "code block contains line less indented " \
732 "than %d spaces \n%r"%(self._codeindent, block)
733 yield line.replace(" "*self._codeindent, "", 1)
739 The `Code2Text` converter does the opposite of `Text2Code`_ -- it processes
740 a source in "code format" (i.e. in a programming language), extracts
741 documentation from comment blocks, and puts program code in literal blocks.
743 The class inherits the interface and helper functions from
744 TextCodeConverter_ and adds functions specific to the text-to-code format
747 class Code2Text(TextCodeConverter):
748 """Convert code source to text source
754 Check if block is "header", "documentation", or "code_block":
756 A paragraph is "documentation", if every non-blank line starts with a
757 matching comment string (including whitespace except for commented blank
760 def set_state(self, block):
761 """Determine state of `block`."""
763 # skip documentation lines (commented, blank or blank comment)
764 if (line.startswith(self.comment_string)
766 or line.rstrip() == self.comment_string.rstrip()
769 # non-commented line found:
771 self.state = "header"
773 self.state = "code_block"
777 # keep state if the block is just a blank line
778 # if len(block) == 1 and self._is_blank_codeline(line):
780 self.state = "documentation"
783 Code2Text.header_handler
784 ~~~~~~~~~~~~~~~~~~~~~~~~
786 Handle a leading code block. (See `Text2Code.header_handler`_ for a
787 discussion of the "header" state.) ::
789 def header_handler(self, lines):
790 """Format leading code block"""
791 if self.strip == True:
793 # get iterator over the lines that formats them as code-block
794 lines = iter(self.code_block_handler(lines))
795 # prepend header string to first line
796 yield self.header_string + lines.next()
797 # yield remaining lines
801 Code2Text.documentation_handler
802 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
804 The *documentation state* handler converts a comment to a documentation
805 block by stripping the leading `comment string` from every line::
807 def documentation_handler(self, lines):
808 """Uncomment documentation blocks in source code
811 If the code block is stripped, the literal marker would lead to an error
812 when the text is converted with docutils. Strip it as well. Otherwise, check
813 for the `code_block_marker` (default ``::``) at the end of the documentation
816 if self.strip or self.strip_marker:
817 self.strip_code_block_marker(lines)
820 self.code_block_marker_missing = \
821 not self.marker_regexp.search(lines[-2])
822 except IndexError: # len(lines < 2), e.g. last line of document
823 self.code_block_marker_missing = True
825 Strip comment strings and yield lines. Consider the case that a blank line
826 has a comment string without trailing whitespace::
828 stripped_comment_string = self.comment_string.rstrip()
831 line = line.replace(self.comment_string, "", 1)
832 if line.rstrip() == stripped_comment_string:
833 line = line.replace(stripped_comment_string, "", 1)
837 Code2Text.code_block_handler
838 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
840 The `code_block` handler returns the code block as indented literal
841 block (or filters it, if ``self.strip == True``). The amount of the code
842 indentation is controled by `self.codeindent` (default 2). ::
844 def code_block_handler(self, lines):
845 """Covert code blocks to text format (indent or strip)
847 if self.strip == True:
849 # eventually insert transition marker
850 if self.code_block_marker_missing:
851 self.state = "documentation"
854 self.state = "code_block"
856 yield " "*self.codeindent + line
860 Code2Text.strip_code_block_marker
861 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
863 Replace the literal marker with the equivalent of docutils replace rules
865 * strip `::`-line (and preceding blank line) if on a line on its own
866 * strip `::` if it is preceded by whitespace.
867 * convert `::` to a single colon if preceded by text
869 `lines` should be a list of documentation lines (with a trailing blank line).
870 It is modified in-place::
872 def strip_code_block_marker(self, lines):
876 return # just one line (no trailing blank line)
878 # match with regexp: `match` is None or has groups
879 # \1 leading text, \2 code_block_marker, \3 remainder
880 match = self.marker_regexp.search(line)
882 if not match: # no code_block_marker present
884 if not match.group(1): # `code_block_marker` on an extra line
886 # delete preceding line if it is blank
887 if len(lines) >= 2 and not lines[-2].lstrip():
889 elif match.group(1).rstrip() < match.group(1):
890 # '::' follows whitespace
891 lines[-2] = match.group(1).rstrip() + match.group(3)
892 else: # '::' follows text
893 lines[-2] = match.group(1).rstrip() + ':' + match.group(3)
898 Filters allow pre- and post-processing of the data to bring it in a format
899 suitable for the "normal" text<->code conversion. An example is conversion
900 of `C` ``/*`` ``*/`` comments into C++ ``//`` comments (and back).
901 Another example is the conversion of `C` ``/*`` ``*/`` comments into C++
902 ``//`` comments (and back).
904 Filters are generator functions that return an iterator acting on a
905 `data` iterable and yielding processed `data` lines.
910 The most basic filter is the identity filter, that returns its argument as
913 def identity_filter(data):
914 """Return data iterator without any processing"""
920 Expand hard-tabs in every line of `data` (cf. `str.expandtabs`).
922 This filter is applied to the input data by `TextCodeConverter.convert`_ as
923 hard tabs can lead to errors when the indentation is changed. ::
925 def expandtabs_filter(data):
926 """Yield data tokens with hard-tabs expanded"""
928 yield line.expandtabs()
934 A filter to aggregate "paragraphs" (blocks separated by blank
935 lines). Yields lists of lines::
937 def collect_blocks(lines):
938 """collect lines in a list
940 yield list for each paragraph, i.e. block of lines separated by a
941 blank line (whitespace only).
943 Trailing blank lines are collected as well.
945 blank_line_reached = False
948 if blank_line_reached and line.rstrip():
950 blank_line_reached = False
953 if not line.rstrip():
954 blank_line_reached = True
963 This is a basic filter to convert `C` to `C++` comments. Works line-wise and
964 only converts lines that
966 * start with "/\* " and end with " \*/" (followed by whitespace only)
968 A more sophisticated version would also
970 * convert multi-line comments
972 + Keep indentation or strip 3 leading spaces?
974 * account for nested comments
976 * only convert comments that are separated from code by a blank line
980 def dumb_c_preprocessor(data):
981 """change `C` ``/* `` `` */`` comments into C++ ``// `` comments"""
982 comment_string = defaults.comment_strings["c++"]
986 if (line.startswith(boc_string)
987 and line.rstrip().endswith(eoc_string)
989 line = line.replace(boc_string, comment_string, 1)
990 line = "".join(line.rsplit(eoc_string, 1))
993 Unfortunately, the `replace` method of strings does not support negative
994 numbers for the `count` argument:
996 >>> "foo */ baz */ bar".replace(" */", "", -1) == "foo */ baz bar"
998 However, there is the `rsplit` method, that can be used together with `join`:
1000 >>> "".join("foo */ baz */ bar".rsplit(" */", 1)) == "foo */ baz bar"
1003 dumb_c_postprocessor
1004 --------------------
1006 Undo the preparations by the dumb_c_preprocessor and re-insert valid comment
1009 def dumb_c_postprocessor(data):
1010 """change C++ ``// `` comments into `C` ``/* `` `` */`` comments"""
1011 comment_string = defaults.comment_strings["c++"]
1015 if line.rstrip() == comment_string.rstrip():
1016 line = line.replace(comment_string, "", 1)
1017 elif line.startswith(comment_string):
1018 line = line.replace(comment_string, boc_string, 1)
1019 line = line.rstrip() + eoc_string + "\n"
1028 defaults.preprocessors['c2text'] = dumb_c_preprocessor
1029 defaults.preprocessors['css2text'] = dumb_c_preprocessor
1030 defaults.postprocessors['text2c'] = dumb_c_postprocessor
1031 defaults.postprocessors['text2css'] = dumb_c_postprocessor
1037 Using this script from the command line will convert a file according to its
1038 extension. This default can be overridden by a couple of options.
1040 Dual source handling
1041 --------------------
1043 How to determine which source is up-to-date?
1044 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1046 - set modification date of `oufile` to the one of `infile`
1048 Points out that the source files are 'synchronized'.
1050 * Are there problems to expect from "backdating" a file? Which?
1052 Looking at http://www.unix.com/showthread.php?t=20526, it seems
1053 perfectly legal to set `mtime` (while leaving `ctime`) as `mtime` is a
1054 description of the "actuality" of the data in the file.
1056 * Should this become a default or an option?
1058 - alternatively move input file to a backup copy (with option: `--replace`)
1060 - check modification date before overwriting
1061 (with option: `--overwrite=update`)
1063 - check modification date before editing (implemented as `Jed editor`_
1064 function `pylit_check()` in `pylit.sl`_)
1066 .. _Jed editor: http://www.jedsoft.org/jed/
1067 .. _pylit.sl: http://jedmodes.sourceforge.net/mode/pylit/
1069 Recognised Filename Extensions
1070 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1072 Instead of defining a new extension for "pylit" literate programms,
1073 by default ``.txt`` will be appended for the text source and stripped by
1074 the conversion to the code source. I.e. for a Python program foo:
1076 * the code source is called ``foo.py``
1077 * the text source is called ``foo.py.txt``
1078 * the html rendering is called ``foo.py.html``
1084 The following class adds `as_dict` and `__getattr__` methods to
1087 class OptionValues(optparse.Values):
1089 OptionValues.as_dict
1090 ~~~~~~~~~~~~~~~~~~~~
1092 For use as keyword arguments, it is handy to have the options in a
1093 dictionary. `as_dict` returns a copy of the instances object dictionary::
1096 """Return options as dictionary object"""
1097 return self.__dict__.copy()
1099 OptionValues.complete
1100 ~~~~~~~~~~~~~~~~~~~~~
1104 def complete(self, **keyw):
1106 Complete the option values with keyword arguments.
1108 Do not overwrite existing values. Only use arguments that do not
1109 have a corresponding attribute in `self`,
1112 if not self.__dict__.has_key(key):
1113 setattr(self, key, keyw[key])
1115 OptionValues.__getattr__
1116 ~~~~~~~~~~~~~~~~~~~~~~~~
1118 To replace calls using ``options.ensure_value("OPTION", None)`` with the
1119 more concise ``options.OPTION``, we define `__getattr__` [#]_ ::
1121 def __getattr__(self, name):
1122 """Return default value for non existing options"""
1126 .. [#] The special method `__getattr__` is only called when an attribute
1127 lookup has not found the attribute in the usual places (i.e. it is
1128 not an instance attribute nor is it found in the class tree for
1135 The `PylitOptions` class comprises an option parser and methods for parsing
1136 and completion of command line options::
1138 class PylitOptions(object):
1139 """Storage and handling of command line options for pylit"""
1147 """Set up an `OptionParser` instance for pylit command line options
1150 p = optparse.OptionParser(usage=main.__doc__, version=_version)
1152 p.add_option("-c", "--code2txt", dest="txt2code", action="store_false",
1153 help="convert code source to text source")
1154 p.add_option("--comment-string", dest="comment_string",
1155 help="documentation block marker (default '# ' (for python))" )
1156 p.add_option("-d", "--diff", action="store_true",
1157 help="test for differences to existing file")
1158 p.add_option("--doctest", action="store_true",
1159 help="run doctest.testfile() on the text version")
1160 p.add_option("-e", "--execute", action="store_true",
1161 help="execute code (Python only)")
1162 p.add_option("--language", action="store",
1163 choices = defaults.languages.values(),
1164 help="use LANGUAGE native comment style")
1165 p.add_option("--overwrite", action="store",
1166 choices = ["yes", "update", "no"],
1167 help="overwrite output file (default 'update')")
1168 p.add_option("--replace", action="store_true",
1169 help="move infile to a backup copy (appending '~')")
1170 p.add_option("-s", "--strip", action="store_true",
1171 help="export by stripping documentation or code")
1172 p.add_option("-t", "--txt2code", action="store_true",
1173 help="convert text source to code source")
1177 PylitOptions.parse_args
1178 ~~~~~~~~~~~~~~~~~~~~~~~
1180 The `parse_args` method calls the `optparse.OptionParser` on command
1181 line or provided args and returns the result as `PylitOptions.Values`
1182 instance. Defaults can be provided as keyword arguments::
1184 def parse_args(self, args=sys.argv[1:], **keyw):
1185 """parse command line arguments using `optparse.OptionParser`
1187 parse_args(args, **keyw) -> OptionValues instance
1189 args -- list of command line arguments.
1190 keyw -- keyword arguments or dictionary of option defaults
1193 (values, args) = self.parser.parse_args(args, OptionValues(keyw))
1194 # Convert FILE and OUTFILE positional args to option values
1195 # (other positional arguments are ignored)
1197 values.infile = args[0]
1198 values.outfile = args[1]
1204 PylitOptions.complete_values
1205 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1207 Complete an OptionValues instance `values`. Use module-level defaults and
1208 context information to set missing option values to sensible defaults (if
1211 def complete_values(self, values):
1212 """complete option values with module and context sensible defaults
1214 x.complete_values(values) -> values
1215 values -- OptionValues instance
1218 Complete with module-level defaults_::
1220 values.complete(**defaults.__dict__)
1222 Ensure infile is a string::
1224 values.ensure_value("infile", "")
1226 Guess conversion direction from `infile` filename::
1228 if values.txt2code is None:
1229 in_extension = os.path.splitext(values.infile)[1]
1230 if in_extension in values.text_extensions:
1231 values.txt2code = True
1232 elif in_extension in values.languages.keys():
1233 values.txt2code = False
1235 Auto-determine the output file name::
1237 values.ensure_value("outfile", self._get_outfile_name(values))
1239 Second try: Guess conversion direction from outfile filename::
1241 if values.txt2code is None:
1242 out_extension = os.path.splitext(values.outfile)[1]
1243 values.txt2code = not (out_extension in values.text_extensions)
1245 Set the language of the code::
1247 if values.txt2code is True:
1248 code_extension = os.path.splitext(values.outfile)[1]
1249 elif values.txt2code is False:
1250 code_extension = os.path.splitext(values.infile)[1]
1251 values.ensure_value("language",
1252 values.languages.get(code_extension,
1253 values.fallback_language))
1257 PylitOptions._get_outfile_name
1258 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1260 Construct a matching filename for the output file. The output filename is
1261 constructed from `infile` by the following rules:
1263 * '-' (stdin) results in '-' (stdout)
1264 * strip the `txt_extension` (txt2code) or
1265 * add a `txt_ extension` (code2txt)
1266 * fallback: if no guess can be made, add ".out"
1268 .. TODO: use values.outfile_extension if it exists?
1272 def _get_outfile_name(self, values):
1273 """Return a matching output filename for `infile`
1275 # if input is stdin, default output is stdout
1276 if values.infile == '-':
1279 # Derive from `infile` name: strip or add text extension
1280 (base, ext) = os.path.splitext(values.infile)
1281 if ext in values.text_extensions:
1283 if ext in values.languages.keys() or values.txt2code == False:
1284 return values.infile + values.text_extensions[0] # add
1286 return values.infile + ".out"
1288 PylitOptions.__call__
1289 ~~~~~~~~~~~~~~~~~~~~~
1291 The special `__call__` method allows to use PylitOptions instances as
1292 *callables*: Calling an instance parses the argument list to extract option
1293 values and completes them based on "context-sensitive defaults". Keyword
1294 arguments are passed to `PylitOptions.parse_args`_ as default values. ::
1296 def __call__(self, args=sys.argv[1:], **keyw):
1297 """parse and complete command line args return option values
1299 values = self.parse_args(args, **keyw)
1300 return self.complete_values(values)
1310 Return file objects for in- and output. If the input path is missing,
1311 write usage and abort. (An alternative would be to use stdin as default.
1312 However, this leaves the uninitiated user with a non-responding application
1313 if (s)he just tries the script without any arguments) ::
1315 def open_streams(infile = '-', outfile = '-', overwrite='update', **keyw):
1316 """Open and return the input and output stream
1318 open_streams(infile, outfile) -> (in_stream, out_stream)
1320 in_stream -- file(infile) or sys.stdin
1321 out_stream -- file(outfile) or sys.stdout
1322 overwrite -- 'yes': overwrite eventually existing `outfile`,
1323 'update': fail if the `outfile` is newer than `infile`,
1324 'no': fail if `outfile` exists.
1326 Irrelevant if `outfile` == '-'.
1329 strerror = "Missing input file name ('-' for stdin; -h for help)"
1330 raise IOError, (2, strerror, infile)
1332 in_stream = sys.stdin
1334 in_stream = file(infile, 'r')
1336 out_stream = sys.stdout
1337 elif overwrite == 'no' and os.path.exists(outfile):
1338 raise IOError, (1, "Output file exists!", outfile)
1339 elif overwrite == 'update' and is_newer(outfile, infile):
1340 raise IOError, (1, "Output file is newer than input file!", outfile)
1342 out_stream = file(outfile, 'w')
1343 return (in_stream, out_stream)
1350 def is_newer(path1, path2):
1351 """Check if `path1` is newer than `path2` (using mtime)
1353 Compare modification time of files at path1 and path2.
1355 Non-existing files are considered oldest: Return False if path1 doesnot
1356 exist and True if path2 doesnot exist.
1358 Return None for equal modification time. (This evaluates to False in a
1359 boolean context but allows a test for equality.)
1363 mtime1 = os.path.getmtime(path1)
1367 mtime2 = os.path.getmtime(path2)
1370 # print "mtime1", mtime1, path1, "\n", "mtime2", mtime2, path2
1372 if mtime1 == mtime2:
1374 return mtime1 > mtime2
1380 Get an instance of the converter state machine::
1382 def get_converter(data, txt2code=True, **keyw):
1384 return Text2Code(data, **keyw)
1386 return Code2Text(data, **keyw)
1397 def run_doctest(infile="-", txt2code=True,
1398 globs={}, verbose=False, optionflags=0, **keyw):
1399 """run doctest on the text source
1401 from doctest import DocTestParser, DocTestRunner
1402 (data, out_stream) = open_streams(infile, "-")
1404 If source is code, convert to text, as tests in comments are not found by
1407 if txt2code is False:
1408 converter = Code2Text(data, **keyw)
1409 docstring = str(converter)
1411 docstring = data.read()
1413 Use the doctest Advanced API to do all doctests in a given string::
1415 test = DocTestParser().get_doctest(docstring, globs={}, name="",
1416 filename=infile, lineno=0)
1417 runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
1420 if not runner.failures:
1421 print "%d failures in %d tests"%(runner.failures, runner.tries)
1422 return runner.failures, runner.tries
1430 def diff(infile='-', outfile='-', txt2code=True, **keyw):
1431 """Report differences between converted infile and existing outfile
1433 If outfile is '-', do a round-trip conversion and report differences
1438 instream = file(infile)
1439 # for diffing, we need a copy of the data as list::
1440 data = instream.readlines()
1442 converter = get_converter(data, txt2code, **keyw)
1446 outstream = file(outfile)
1447 old = outstream.readlines()
1449 newname = "<conversion of %s>"%infile
1453 # back-convert the output data
1454 converter = get_converter(new, not txt2code)
1456 newname = "<round-conversion of %s>"%infile
1458 # find and print the differences
1459 is_different = False
1460 # print type(old), old
1461 # print type(new), new
1462 delta = difflib.unified_diff(old, new,
1463 # delta = difflib.unified_diff(["heute\n", "schon\n"], ["heute\n", "noch\n"],
1464 fromfile=oldname, tofile=newname)
1468 if not is_different:
1471 print "no differences found"
1478 Works only for python code.
1480 Doesnot work with `eval`, as code is not just one expression. ::
1482 def execute(infile="-", txt2code=True, **keyw):
1483 """Execute the input file. Convert first, if it is a text source.
1488 data = str(Text2Code(data, **keyw))
1489 # print "executing " + options.infile
1496 If this script is called from the command line, the `main` function will
1497 convert the input (file or stdin) between text and code formats.
1499 Option default values for the conversion can be given as keyword arguments
1500 to `main`_. The option defaults will be updated by command line options and
1501 extended with "intelligent guesses" by `PylitOptions`_ and passed on to
1502 helper functions and the converter instantiation.
1504 This allows easy customization for programmatic use -- just call `main`
1505 with the appropriate keyword options, e.g.:
1507 >>> main(comment_string="## ")
1511 def main(args=sys.argv[1:], **defaults):
1512 """%prog [options] INFILE [OUTFILE]
1514 Convert between (reStructured) text source with embedded code,
1515 and code source with embedded documentation (comment blocks)
1517 The special filename '-' stands for standard in and output.
1520 Parse and complete the options::
1522 options = PylitOptions()(args, **defaults)
1523 # print "infile", repr(options.infile)
1525 Special actions with early return::
1528 return run_doctest(**options.as_dict())
1531 return diff(**options.as_dict())
1534 return execute(**options.as_dict())
1536 Open in- and output streams::
1539 (data, out_stream) = open_streams(**options.as_dict())
1541 print "IOError: %s %s" % (ex.filename, ex.strerror)
1544 Get a converter instance::
1546 converter = get_converter(data, **options.as_dict())
1548 Convert and write to out_stream::
1550 out_stream.write(str(converter))
1552 if out_stream is not sys.stdout:
1553 print "extract written to", out_stream.name
1556 If input and output are from files, set the modification time (`mtime`) of
1557 the output file to the one of the input file to indicate that the contained
1558 information is equal. [#]_ ::
1561 os.utime(options.outfile, (os.path.getatime(options.outfile),
1562 os.path.getmtime(options.infile))
1567 ## print "mtime", os.path.getmtime(options.infile), options.infile
1568 ## print "mtime", os.path.getmtime(options.outfile), options.outfile
1571 .. [#] Make sure the corresponding file object (here `out_stream`) is
1572 closed, as otherwise the change will be overwritten when `close` is
1573 called afterwards (either explicitely or at program exit).
1576 Rename the infile to a backup copy if ``--replace`` is set::
1579 os.rename(options.infile, options.infile + "~")
1582 Run main, if called from the command line::
1584 if __name__ == '__main__':
1591 Open questions and ideas for further development
1596 * can we gain from using "shutils" over "os.path" and "os"?
1597 * use pylint or pyChecker to enfoce a consistent style?
1602 * Use templates for the "intelligent guesses" (with Python syntax for string
1603 replacement with dicts: ``"hello %(what)s" % {'what': 'world'}``)
1605 * Is it sensible to offer the `header_string` option also as command line
1608 * treatment of blank lines:
1610 * Alternatives: Keep blank lines blank
1614 + "if empty" (no whitespace). Comment if there is whitespace.
1616 This would allow non-obstructing markup but unfortunately this is (in
1617 most editors) also non-visible markup -> bad.
1619 + "if double" (if there is more than one consecutive blank line)
1621 + "never" (current setting)
1623 So the setting could be something like::
1625 defaults.keep_blank_lines = { "python": "if double",
1630 ----------------------
1632 * Ignore "matching comments" in literal strings?
1634 Too complicated: Would need a specific detection algorithm for every
1635 language that supports multi-line literal strings (C++, PHP, Python)
1637 * Warn if a comment in code will become documentation after round-trip?
1640 doctstrings in code blocks
1641 --------------------------
1643 * How to handle docstrings in code blocks? (it would be nice to convert them
1644 to rst-text if ``__docformat__ == restructuredtext``)
1646 TODO: Ask at docutils users|developers
1651 http://docutils.sourceforge.net/
1652 .. _indented literal block:
1653 .. _indented literal blocks:
1654 http://docutils.sf.net/docs/ref/rst/restructuredtext.html#indented-literal-blocks
1655 .. _quoted literal block:
1656 .. _quoted literal blocks:
1657 http://docutils.sf.net/docs/ref/rst/restructuredtext.html#quoted-literal-blocks
1660 http://docutils.sf.net/docs/ref/rst/restructuredtext.html#doctest-blocks
1661 .. _pygments: http://pygments.org/
1662 .. _syntax highlight: ../features/syntax-highlight.html
1663 .. _parsed-literal blocks:
1664 http://docutils.sf.net/docs/ref/rst/directives.html#parsed-literal-block