4 The scrubber is an integral part of Mailman, both in the normal delivery of
5 messages and in components such as the archiver. Its primary purpose is to
6 scrub attachments from messages so that binary goop doesn't end up in an
9 >>> from Mailman.Handlers.Scrubber import process, save_attachment
10 >>> from Mailman.configuration import config
11 >>> mlist = config.db.list_manager.create(u'_xtest@example.com')
12 >>> mlist.preferred_language = u'en'
14 Helper functions for getting the attachment data.
17 >>> def read_attachment(filename, remove=True):
18 ... path = os.path.join(config.PRIVATE_ARCHIVE_FILE_DIR,
19 ... mlist.fqdn_listname, filename)
29 >>> from urlparse import urlparse
30 >>> def read_url_from_message(msg):
32 ... for line in msg.get_payload().splitlines():
33 ... mo = re.match('URL: <(?P<url>[^>]+)>', line)
35 ... url = mo.group('url')
37 ... path = '/'.join(urlparse(url).path.split('/')[3:])
38 ... return read_attachment(path)
44 The Scrubber handler exposes a function called save_attachments() which can be
45 used to strip various types of attachments and store them in the archive
46 directory. This is a public interface used by components outside the normal
49 Site administrators can decide whether the scrubber should use the attachment
50 filename suggested in the message's Content-Disposition: header or not. If
51 enabled, the filename will be used when this header attribute is present (yes,
52 this is an unfortunate double negative).
54 >>> config.SCRUBBER_DONT_USE_ATTACHMENT_FILENAME = False
55 >>> msg = message_from_string(u"""\
56 ... Content-Type: image/gif; name="xtest.gif"
57 ... Content-Transfer-Encoding: base64
58 ... Content-Disposition: attachment; filename="xtest.gif"
60 ... R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
62 >>> save_attachment(mlist, msg, 'dir')
63 u'<http://www.example.com/pipermail/_xtest@example.com/dir/xtest.gif>'
64 >>> data = read_attachment('dir/xtest.gif')
70 Saving the attachment does not alter the original message.
72 >>> print msg.as_string()
73 Content-Type: image/gif; name="xtest.gif"
74 Content-Transfer-Encoding: base64
75 Content-Disposition: attachment; filename="xtest.gif"
77 R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
79 The site administrator can also configure Mailman to ignore the
80 Content-Disposition: filename. This is the default for reasons described in
81 the Defaults.py.in file.
83 >>> config.SCRUBBER_DONT_USE_ATTACHMENT_FILENAME = True
84 >>> msg = message_from_string(u"""\
85 ... Content-Type: image/gif; name="xtest.gif"
86 ... Content-Transfer-Encoding: base64
87 ... Content-Disposition: attachment; filename="xtest.gif"
89 ... R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
91 >>> save_attachment(mlist, msg, 'dir')
92 u'<http://www.example.com/pipermail/_xtest@example.com/dir/attachment.gif>'
93 >>> data = read_attachment('dir/xtest.gif')
94 Traceback (most recent call last):
95 IOError: [Errno ...] No such file or directory:
96 u'.../archives/private/_xtest@example.com/dir/xtest.gif'
97 >>> data = read_attachment('dir/attachment.gif')
104 Scrubbing image attachments
105 ---------------------------
107 When scrubbing image attachments, the original message is modified to include
108 a reference to the attachment file as available through the on-line archive.
110 >>> msg = message_from_string(u"""\
111 ... MIME-Version: 1.0
112 ... Content-Type: multipart/mixed; boundary="BOUNDARY"
115 ... Content-type: text/plain; charset=us-ascii
117 ... This is a message.
119 ... Content-Type: image/gif; name="xtest.gif"
120 ... Content-Transfer-Encoding: base64
121 ... Content-Disposition: attachment; filename="xtest.gif"
123 ... R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
128 The Scrubber.process() function is different than other handler process
129 functions in that it returns the scrubbed message.
131 >>> scrubbed_msg = process(mlist, msg, msgdata)
132 >>> scrubbed_msg is msg
134 >>> print scrubbed_msg.as_string()
137 Content-Type: text/plain; charset="us-ascii"
138 Content-Transfer-Encoding: 7bit
141 -------------- next part --------------
142 A non-text attachment was scrubbed...
147 URL: <http://www.example.com/pipermail/_xtest@example.com/attachments/.../attachment.gif>
150 This is the same as the transformed message originally passed in.
152 >>> print msg.as_string()
155 Content-Type: text/plain; charset="us-ascii"
156 Content-Transfer-Encoding: 7bit
159 -------------- next part --------------
160 A non-text attachment was scrubbed...
165 URL: <http://www.example.com/pipermail/_xtest@example.com/attachments/.../attachment.gif>
170 The URL will point to the attachment sitting in the archive.
172 >>> data = read_url_from_message(msg)
179 Scrubbing text attachments
180 --------------------------
182 Similar to image attachments, text attachments will also be scrubbed, but the
183 placeholder will be slightly different.
185 >>> msg = message_from_string(u"""\
186 ... MIME-Version: 1.0
187 ... Content-Type: multipart/mixed; boundary="BOUNDARY"
190 ... Content-type: text/plain; charset=us-ascii; format=flowed; delsp=no
192 ... This is a message.
194 ... Content-type: text/plain; name="xtext.txt"
195 ... Content-Disposition: attachment; filename="xtext.txt"
197 ... This is a text attachment.
200 >>> scrubbed_msg = process(mlist, msg, {})
201 >>> print scrubbed_msg.as_string()
204 Content-Transfer-Encoding: 7bit
205 Content-Type: text/plain; charset="us-ascii"; format="flowed"; delsp="no"
208 -------------- next part --------------
209 An embedded and charset-unspecified text was scrubbed...
211 URL: <http://www.example.com/pipermail/_xtest@example.com/attachments/.../attachment.txt>
213 >>> read_url_from_message(msg)
214 'This is a text attachment.'