Added WatchedFileHandler (based on SF patch #1598415)
[python.git] / Doc / lib / librfc822.tex
blob4ca3734db59034786071d800d276fb23d50abac9
1 \section{\module{rfc822} ---
2 Parse RFC 2822 mail headers}
4 \declaremodule{standard}{rfc822}
5 \modulesynopsis{Parse \rfc{2822} style mail messages.}
7 \deprecated{2.3}{The \refmodule{email} package should be used in
8 preference to the \module{rfc822} module. This
9 module is present only to maintain backward
10 compatibility.}
12 This module defines a class, \class{Message}, which represents an
13 ``email message'' as defined by the Internet standard
14 \rfc{2822}.\footnote{This module originally conformed to \rfc{822},
15 hence the name. Since then, \rfc{2822} has been released as an
16 update to \rfc{822}. This module should be considered
17 \rfc{2822}-conformant, especially in cases where the
18 syntax or semantics have changed since \rfc{822}.} Such messages
19 consist of a collection of message headers, and a message body. This
20 module also defines a helper class
21 \class{AddressList} for parsing \rfc{2822} addresses. Please refer to
22 the RFC for information on the specific syntax of \rfc{2822} messages.
24 The \refmodule{mailbox}\refstmodindex{mailbox} module provides classes
25 to read mailboxes produced by various end-user mail programs.
27 \begin{classdesc}{Message}{file\optional{, seekable}}
28 A \class{Message} instance is instantiated with an input object as
29 parameter. Message relies only on the input object having a
30 \method{readline()} method; in particular, ordinary file objects
31 qualify. Instantiation reads headers from the input object up to a
32 delimiter line (normally a blank line) and stores them in the
33 instance. The message body, following the headers, is not consumed.
35 This class can work with any input object that supports a
36 \method{readline()} method. If the input object has seek and tell
37 capability, the \method{rewindbody()} method will work; also, illegal
38 lines will be pushed back onto the input stream. If the input object
39 lacks seek but has an \method{unread()} method that can push back a
40 line of input, \class{Message} will use that to push back illegal
41 lines. Thus this class can be used to parse messages coming from a
42 buffered stream.
44 The optional \var{seekable} argument is provided as a workaround for
45 certain stdio libraries in which \cfunction{tell()} discards buffered
46 data before discovering that the \cfunction{lseek()} system call
47 doesn't work. For maximum portability, you should set the seekable
48 argument to zero to prevent that initial \method{tell()} when passing
49 in an unseekable object such as a file object created from a socket
50 object.
52 Input lines as read from the file may either be terminated by CR-LF or
53 by a single linefeed; a terminating CR-LF is replaced by a single
54 linefeed before the line is stored.
56 All header matching is done independent of upper or lower case;
57 e.g.\ \code{\var{m}['From']}, \code{\var{m}['from']} and
58 \code{\var{m}['FROM']} all yield the same result.
59 \end{classdesc}
61 \begin{classdesc}{AddressList}{field}
62 You may instantiate the \class{AddressList} helper class using a single
63 string parameter, a comma-separated list of \rfc{2822} addresses to be
64 parsed. (The parameter \code{None} yields an empty list.)
65 \end{classdesc}
67 \begin{funcdesc}{quote}{str}
68 Return a new string with backslashes in \var{str} replaced by two
69 backslashes and double quotes replaced by backslash-double quote.
70 \end{funcdesc}
72 \begin{funcdesc}{unquote}{str}
73 Return a new string which is an \emph{unquoted} version of \var{str}.
74 If \var{str} ends and begins with double quotes, they are stripped
75 off. Likewise if \var{str} ends and begins with angle brackets, they
76 are stripped off.
77 \end{funcdesc}
79 \begin{funcdesc}{parseaddr}{address}
80 Parse \var{address}, which should be the value of some
81 address-containing field such as \mailheader{To} or \mailheader{Cc},
82 into its constituent ``realname'' and ``email address'' parts.
83 Returns a tuple of that information, unless the parse fails, in which
84 case a 2-tuple \code{(None, None)} is returned.
85 \end{funcdesc}
87 \begin{funcdesc}{dump_address_pair}{pair}
88 The inverse of \method{parseaddr()}, this takes a 2-tuple of the form
89 \code{(\var{realname}, \var{email_address})} and returns the string
90 value suitable for a \mailheader{To} or \mailheader{Cc} header. If
91 the first element of \var{pair} is false, then the second element is
92 returned unmodified.
93 \end{funcdesc}
95 \begin{funcdesc}{parsedate}{date}
96 Attempts to parse a date according to the rules in \rfc{2822}.
97 however, some mailers don't follow that format as specified, so
98 \function{parsedate()} tries to guess correctly in such cases.
99 \var{date} is a string containing an \rfc{2822} date, such as
100 \code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
101 the date, \function{parsedate()} returns a 9-tuple that can be passed
102 directly to \function{time.mktime()}; otherwise \code{None} will be
103 returned. Note that fields 6, 7, and 8 of the result tuple are not
104 usable.
105 \end{funcdesc}
107 \begin{funcdesc}{parsedate_tz}{date}
108 Performs the same function as \function{parsedate()}, but returns
109 either \code{None} or a 10-tuple; the first 9 elements make up a tuple
110 that can be passed directly to \function{time.mktime()}, and the tenth
111 is the offset of the date's timezone from UTC (which is the official
112 term for Greenwich Mean Time). (Note that the sign of the timezone
113 offset is the opposite of the sign of the \code{time.timezone}
114 variable for the same timezone; the latter variable follows the
115 \POSIX{} standard while this module follows \rfc{2822}.) If the input
116 string has no timezone, the last element of the tuple returned is
117 \code{None}. Note that fields 6, 7, and 8 of the result tuple are not
118 usable.
119 \end{funcdesc}
121 \begin{funcdesc}{mktime_tz}{tuple}
122 Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
123 timestamp. If the timezone item in the tuple is \code{None}, assume
124 local time. Minor deficiency: this first interprets the first 8
125 elements as a local time and then compensates for the timezone
126 difference; this may yield a slight error around daylight savings time
127 switch dates. Not enough to worry about for common use.
128 \end{funcdesc}
131 \begin{seealso}
132 \seemodule{email}{Comprehensive email handling package; supersedes
133 the \module{rfc822} module.}
134 \seemodule{mailbox}{Classes to read various mailbox formats produced
135 by end-user mail programs.}
136 \seemodule{mimetools}{Subclass of \class{rfc822.Message} that
137 handles MIME encoded messages.}
138 \end{seealso}
141 \subsection{Message Objects \label{message-objects}}
143 A \class{Message} instance has the following methods:
145 \begin{methoddesc}{rewindbody}{}
146 Seek to the start of the message body. This only works if the file
147 object is seekable.
148 \end{methoddesc}
150 \begin{methoddesc}{isheader}{line}
151 Returns a line's canonicalized fieldname (the dictionary key that will
152 be used to index it) if the line is a legal \rfc{2822} header; otherwise
153 returns \code{None} (implying that parsing should stop here and the
154 line be pushed back on the input stream). It is sometimes useful to
155 override this method in a subclass.
156 \end{methoddesc}
158 \begin{methoddesc}{islast}{line}
159 Return true if the given line is a delimiter on which Message should
160 stop. The delimiter line is consumed, and the file object's read
161 location positioned immediately after it. By default this method just
162 checks that the line is blank, but you can override it in a subclass.
163 \end{methoddesc}
165 \begin{methoddesc}{iscomment}{line}
166 Return \code{True} if the given line should be ignored entirely, just skipped.
167 By default this is a stub that always returns \code{False}, but you can
168 override it in a subclass.
169 \end{methoddesc}
171 \begin{methoddesc}{getallmatchingheaders}{name}
172 Return a list of lines consisting of all headers matching
173 \var{name}, if any. Each physical line, whether it is a continuation
174 line or not, is a separate list item. Return the empty list if no
175 header matches \var{name}.
176 \end{methoddesc}
178 \begin{methoddesc}{getfirstmatchingheader}{name}
179 Return a list of lines comprising the first header matching
180 \var{name}, and its continuation line(s), if any. Return
181 \code{None} if there is no header matching \var{name}.
182 \end{methoddesc}
184 \begin{methoddesc}{getrawheader}{name}
185 Return a single string consisting of the text after the colon in the
186 first header matching \var{name}. This includes leading whitespace,
187 the trailing linefeed, and internal linefeeds and whitespace if there
188 any continuation line(s) were present. Return \code{None} if there is
189 no header matching \var{name}.
190 \end{methoddesc}
192 \begin{methoddesc}{getheader}{name\optional{, default}}
193 Like \code{getrawheader(\var{name})}, but strip leading and trailing
194 whitespace. Internal whitespace is not stripped. The optional
195 \var{default} argument can be used to specify a different default to
196 be returned when there is no header matching \var{name}.
197 \end{methoddesc}
199 \begin{methoddesc}{get}{name\optional{, default}}
200 An alias for \method{getheader()}, to make the interface more compatible
201 with regular dictionaries.
202 \end{methoddesc}
204 \begin{methoddesc}{getaddr}{name}
205 Return a pair \code{(\var{full name}, \var{email address})} parsed
206 from the string returned by \code{getheader(\var{name})}. If no
207 header matching \var{name} exists, return \code{(None, None)};
208 otherwise both the full name and the address are (possibly empty)
209 strings.
211 Example: If \var{m}'s first \mailheader{From} header contains the
212 string \code{'jack@cwi.nl (Jack Jansen)'}, then
213 \code{m.getaddr('From')} will yield the pair
214 \code{('Jack Jansen', 'jack@cwi.nl')}.
215 If the header contained
216 \code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
217 exact same result.
218 \end{methoddesc}
220 \begin{methoddesc}{getaddrlist}{name}
221 This is similar to \code{getaddr(\var{list})}, but parses a header
222 containing a list of email addresses (e.g.\ a \mailheader{To} header) and
223 returns a list of \code{(\var{full name}, \var{email address})} pairs
224 (even if there was only one address in the header). If there is no
225 header matching \var{name}, return an empty list.
227 If multiple headers exist that match the named header (e.g. if there
228 are several \mailheader{Cc} headers), all are parsed for addresses.
229 Any continuation lines the named headers contain are also parsed.
230 \end{methoddesc}
232 \begin{methoddesc}{getdate}{name}
233 Retrieve a header using \method{getheader()} and parse it into a 9-tuple
234 compatible with \function{time.mktime()}; note that fields 6, 7, and 8
235 are not usable. If there is no header matching
236 \var{name}, or it is unparsable, return \code{None}.
238 Date parsing appears to be a black art, and not all mailers adhere to
239 the standard. While it has been tested and found correct on a large
240 collection of email from many sources, it is still possible that this
241 function may occasionally yield an incorrect result.
242 \end{methoddesc}
244 \begin{methoddesc}{getdate_tz}{name}
245 Retrieve a header using \method{getheader()} and parse it into a
246 10-tuple; the first 9 elements will make a tuple compatible with
247 \function{time.mktime()}, and the 10th is a number giving the offset
248 of the date's timezone from UTC. Note that fields 6, 7, and 8
249 are not usable. Similarly to \method{getdate()}, if
250 there is no header matching \var{name}, or it is unparsable, return
251 \code{None}.
252 \end{methoddesc}
254 \class{Message} instances also support a limited mapping interface.
255 In particular: \code{\var{m}[name]} is like
256 \code{\var{m}.getheader(name)} but raises \exception{KeyError} if
257 there is no matching header; and \code{len(\var{m})},
258 \code{\var{m}.get(\var{name}\optional{, \var{default}})},
259 \code{\var{m}.has_key(\var{name})}, \code{\var{m}.keys()},
260 \code{\var{m}.values()} \code{\var{m}.items()}, and
261 \code{\var{m}.setdefault(\var{name}\optional{, \var{default}})} act as
262 expected, with the one difference that \method{setdefault()} uses
263 an empty string as the default value. \class{Message} instances
264 also support the mapping writable interface \code{\var{m}[name] =
265 value} and \code{del \var{m}[name]}. \class{Message} objects do not
266 support the \method{clear()}, \method{copy()}, \method{popitem()}, or
267 \method{update()} methods of the mapping interface. (Support for
268 \method{get()} and \method{setdefault()} was only added in Python
269 2.2.)
271 Finally, \class{Message} instances have some public instance variables:
273 \begin{memberdesc}{headers}
274 A list containing the entire set of header lines, in the order in
275 which they were read (except that setitem calls may disturb this
276 order). Each line contains a trailing newline. The
277 blank line terminating the headers is not contained in the list.
278 \end{memberdesc}
280 \begin{memberdesc}{fp}
281 The file or file-like object passed at instantiation time. This can
282 be used to read the message content.
283 \end{memberdesc}
285 \begin{memberdesc}{unixfrom}
286 The \UNIX{} \samp{From~} line, if the message had one, or an empty
287 string. This is needed to regenerate the message in some contexts,
288 such as an \code{mbox}-style mailbox file.
289 \end{memberdesc}
292 \subsection{AddressList Objects \label{addresslist-objects}}
294 An \class{AddressList} instance has the following methods:
296 \begin{methoddesc}{__len__}{}
297 Return the number of addresses in the address list.
298 \end{methoddesc}
300 \begin{methoddesc}{__str__}{}
301 Return a canonicalized string representation of the address list.
302 Addresses are rendered in "name" <host@domain> form, comma-separated.
303 \end{methoddesc}
305 \begin{methoddesc}{__add__}{alist}
306 Return a new \class{AddressList} instance that contains all addresses
307 in both \class{AddressList} operands, with duplicates removed (set
308 union).
309 \end{methoddesc}
311 \begin{methoddesc}{__iadd__}{alist}
312 In-place version of \method{__add__()}; turns this \class{AddressList}
313 instance into the union of itself and the right-hand instance,
314 \var{alist}.
315 \end{methoddesc}
317 \begin{methoddesc}{__sub__}{alist}
318 Return a new \class{AddressList} instance that contains every address
319 in the left-hand \class{AddressList} operand that is not present in
320 the right-hand address operand (set difference).
321 \end{methoddesc}
323 \begin{methoddesc}{__isub__}{alist}
324 In-place version of \method{__sub__()}, removing addresses in this
325 list which are also in \var{alist}.
326 \end{methoddesc}
329 Finally, \class{AddressList} instances have one public instance variable:
331 \begin{memberdesc}{addresslist}
332 A list of tuple string pairs, one per address. In each member, the
333 first is the canonicalized name part, the second is the
334 actual route-address (\character{@}-separated username-host.domain
335 pair).
336 \end{memberdesc}