Added a test for the ability to specify a class attribute in Formatter configuration...
[python.git] / Doc / lib / libmultifile.tex
blobaa81d4a10600628f4f7ea1fc745cd1a10bcb08d2
1 \section{\module{multifile} ---
2 Support for files containing distinct parts}
4 \declaremodule{standard}{multifile}
5 \modulesynopsis{Support for reading files which contain distinct
6 parts, such as some MIME data.}
7 \sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
10 The \class{MultiFile} object enables you to treat sections of a text
11 file as file-like input objects, with \code{''} being returned by
12 \method{readline()} when a given delimiter pattern is encountered. The
13 defaults of this class are designed to make it useful for parsing
14 MIME multipart messages, but by subclassing it and overriding methods
15 it can be easily adapted for more general use.
17 \begin{classdesc}{MultiFile}{fp\optional{, seekable}}
18 Create a multi-file. You must instantiate this class with an input
19 object argument for the \class{MultiFile} instance to get lines from,
20 such as a file object returned by \function{open()}.
22 \class{MultiFile} only ever looks at the input object's
23 \method{readline()}, \method{seek()} and \method{tell()} methods, and
24 the latter two are only needed if you want random access to the
25 individual MIME parts. To use \class{MultiFile} on a non-seekable
26 stream object, set the optional \var{seekable} argument to false; this
27 will prevent using the input object's \method{seek()} and
28 \method{tell()} methods.
29 \end{classdesc}
31 It will be useful to know that in \class{MultiFile}'s view of the world, text
32 is composed of three kinds of lines: data, section-dividers, and
33 end-markers. MultiFile is designed to support parsing of
34 messages that may have multiple nested message parts, each with its
35 own pattern for section-divider and end-marker lines.
37 \begin{seealso}
38 \seemodule{email}{Comprehensive email handling package; supersedes
39 the \module{multifile} module.}
40 \end{seealso}
43 \subsection{MultiFile Objects \label{MultiFile-objects}}
45 A \class{MultiFile} instance has the following methods:
47 \begin{methoddesc}{readline}{str}
48 Read a line. If the line is data (not a section-divider or end-marker
49 or real EOF) return it. If the line matches the most-recently-stacked
50 boundary, return \code{''} and set \code{self.last} to 1 or 0 according as
51 the match is or is not an end-marker. If the line matches any other
52 stacked boundary, raise an error. On encountering end-of-file on the
53 underlying stream object, the method raises \exception{Error} unless
54 all boundaries have been popped.
55 \end{methoddesc}
57 \begin{methoddesc}{readlines}{str}
58 Return all lines remaining in this part as a list of strings.
59 \end{methoddesc}
61 \begin{methoddesc}{read}{}
62 Read all lines, up to the next section. Return them as a single
63 (multiline) string. Note that this doesn't take a size argument!
64 \end{methoddesc}
66 \begin{methoddesc}{seek}{pos\optional{, whence}}
67 Seek. Seek indices are relative to the start of the current section.
68 The \var{pos} and \var{whence} arguments are interpreted as for a file
69 seek.
70 \end{methoddesc}
72 \begin{methoddesc}{tell}{}
73 Return the file position relative to the start of the current section.
74 \end{methoddesc}
76 \begin{methoddesc}{next}{}
77 Skip lines to the next section (that is, read lines until a
78 section-divider or end-marker has been consumed). Return true if
79 there is such a section, false if an end-marker is seen. Re-enable
80 the most-recently-pushed boundary.
81 \end{methoddesc}
83 \begin{methoddesc}{is_data}{str}
84 Return true if \var{str} is data and false if it might be a section
85 boundary. As written, it tests for a prefix other than \code{'-}\code{-'} at
86 start of line (which all MIME boundaries have) but it is declared so
87 it can be overridden in derived classes.
89 Note that this test is used intended as a fast guard for the real
90 boundary tests; if it always returns false it will merely slow
91 processing, not cause it to fail.
92 \end{methoddesc}
94 \begin{methoddesc}{push}{str}
95 Push a boundary string. When a decorated version of this boundary
96 is found as an input line, it will be interpreted as a section-divider
97 or end-marker (depending on the decoration, see \rfc{2045}). All subsequent
98 reads will return the empty string to indicate end-of-file, until a
99 call to \method{pop()} removes the boundary a or \method{next()} call
100 reenables it.
102 It is possible to push more than one boundary. Encountering the
103 most-recently-pushed boundary will return EOF; encountering any other
104 boundary will raise an error.
105 \end{methoddesc}
107 \begin{methoddesc}{pop}{}
108 Pop a section boundary. This boundary will no longer be interpreted
109 as EOF.
110 \end{methoddesc}
112 \begin{methoddesc}{section_divider}{str}
113 Turn a boundary into a section-divider line. By default, this
114 method prepends \code{'-}\code{-'} (which MIME section boundaries have) but
115 it is declared so it can be overridden in derived classes. This
116 method need not append LF or CR-LF, as comparison with the result
117 ignores trailing whitespace.
118 \end{methoddesc}
120 \begin{methoddesc}{end_marker}{str}
121 Turn a boundary string into an end-marker line. By default, this
122 method prepends \code{'-}\code{-'} and appends \code{'-}\code{-'} (like a
123 MIME-multipart end-of-message marker) but it is declared so it can be
124 overridden in derived classes. This method need not append LF or
125 CR-LF, as comparison with the result ignores trailing whitespace.
126 \end{methoddesc}
128 Finally, \class{MultiFile} instances have two public instance variables:
130 \begin{memberdesc}{level}
131 Nesting depth of the current part.
132 \end{memberdesc}
134 \begin{memberdesc}{last}
135 True if the last end-of-file was for an end-of-message marker.
136 \end{memberdesc}
139 \subsection{\class{MultiFile} Example \label{multifile-example}}
140 \sectionauthor{Skip Montanaro}{skip@mojam.com}
142 \begin{verbatim}
143 import mimetools
144 import multifile
145 import StringIO
147 def extract_mime_part_matching(stream, mimetype):
148 """Return the first element in a multipart MIME message on stream
149 matching mimetype."""
151 msg = mimetools.Message(stream)
152 msgtype = msg.gettype()
153 params = msg.getplist()
155 data = StringIO.StringIO()
156 if msgtype[:10] == "multipart/":
158 file = multifile.MultiFile(stream)
159 file.push(msg.getparam("boundary"))
160 while file.next():
161 submsg = mimetools.Message(file)
162 try:
163 data = StringIO.StringIO()
164 mimetools.decode(file, data, submsg.getencoding())
165 except ValueError:
166 continue
167 if submsg.gettype() == mimetype:
168 break
169 file.pop()
170 return data.getvalue()
171 \end{verbatim}