5 The scrubber is an integral part of Mailman, both in the normal delivery of
6 messages and in components such as the archiver. Its primary purpose is to
7 scrub attachments from messages so that binary goop doesn't end up in an
10 >>> mlist = create_list('_xtest@example.com')
11 >>> mlist.preferred_language = 'en'
13 Helper functions for getting the attachment data.
16 >>> def read_attachment(filename, remove=True):
17 ... path = os.path.join(config.PRIVATE_ARCHIVE_FILE_DIR,
18 ... mlist.fqdn_listname, filename)
28 >>> from urlparse import urlparse
29 >>> def read_url_from_message(msg):
31 ... for line in msg.get_payload().splitlines():
32 ... mo = re.match('URL: <(?P<url>[^>]+)>', line)
34 ... url = mo.group('url')
36 ... path = '/'.join(urlparse(url).path.split('/')[3:])
37 ... return read_attachment(path)
43 The Scrubber handler exposes a function called save_attachment() which can be
44 used to strip various types of attachments and store them in the archive
45 directory. This is a public interface used by components outside the normal
48 Site administrators can decide whether the scrubber should use the attachment
49 filename suggested in the message's Content-Disposition: header or not. If
50 enabled, the filename will be used when this header attribute is present (yes,
51 this is an unfortunate double negative).
53 >>> config.push('test config', """
55 ... use_attachment_filename: yes
57 >>> msg = message_from_string("""\
58 ... Content-Type: image/gif; name="xtest.gif"
59 ... Content-Transfer-Encoding: base64
60 ... Content-Disposition: attachment; filename="xtest.gif"
62 ... R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
65 >>> from mailman.pipeline.scrubber import save_attachment
66 >>> print save_attachment(mlist, msg, 'dir')
67 <http://www.example.com/pipermail/_xtest@example.com/dir/xtest.gif>
68 >>> data = read_attachment('dir/xtest.gif')
74 Saving the attachment does not alter the original message.
76 >>> print msg.as_string()
77 Content-Type: image/gif; name="xtest.gif"
78 Content-Transfer-Encoding: base64
79 Content-Disposition: attachment; filename="xtest.gif"
81 R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
83 The site administrator can also configure Mailman to ignore the
84 Content-Disposition: filename. This is the default.
86 >>> config.pop('test config')
87 >>> config.push('test config', """
89 ... use_attachment_filename: no
91 >>> msg = message_from_string("""\
92 ... Content-Type: image/gif; name="xtest.gif"
93 ... Content-Transfer-Encoding: base64
94 ... Content-Disposition: attachment; filename="xtest.gif"
96 ... R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
98 >>> print save_attachment(mlist, msg, 'dir')
99 <http://www.example.com/pipermail/_xtest@example.com/dir/attachment.gif>
100 >>> data = read_attachment('dir/xtest.gif')
101 Traceback (most recent call last):
102 IOError: [Errno ...] No such file or directory:
103 u'.../archives/private/_xtest@example.com/dir/xtest.gif'
104 >>> data = read_attachment('dir/attachment.gif')
111 Scrubbing image attachments
112 ===========================
114 When scrubbing image attachments, the original message is modified to include
115 a reference to the attachment file as available through the on-line archive.
117 >>> msg = message_from_string("""\
118 ... MIME-Version: 1.0
119 ... Content-Type: multipart/mixed; boundary="BOUNDARY"
122 ... Content-type: text/plain; charset=us-ascii
124 ... This is a message.
126 ... Content-Type: image/gif; name="xtest.gif"
127 ... Content-Transfer-Encoding: base64
128 ... Content-Disposition: attachment; filename="xtest.gif"
130 ... R0lGODdhAQABAIAAAAAAAAAAACwAAAAAAQABAAACAQUAOw==
135 The Scrubber.process() function is different than other handler process
136 functions in that it returns the scrubbed message.
138 >>> from mailman.pipeline.scrubber import process
139 >>> scrubbed_msg = process(mlist, msg, msgdata)
140 >>> scrubbed_msg is msg
142 >>> print scrubbed_msg.as_string()
145 Content-Type: text/plain; charset="us-ascii"
146 Content-Transfer-Encoding: 7bit
149 -------------- next part --------------
150 A non-text attachment was scrubbed...
155 URL: <http://www.example.com/pipermail/_xtest@example.com/attachments/.../attachment.gif>
158 This is the same as the transformed message originally passed in.
160 >>> print msg.as_string()
163 Content-Type: text/plain; charset="us-ascii"
164 Content-Transfer-Encoding: 7bit
167 -------------- next part --------------
168 A non-text attachment was scrubbed...
173 URL: <http://www.example.com/pipermail/_xtest@example.com/attachments/.../attachment.gif>
178 The URL will point to the attachment sitting in the archive.
180 >>> data = read_url_from_message(msg)
187 Scrubbing text attachments
188 ==========================
190 Similar to image attachments, text attachments will also be scrubbed, but the
191 placeholder will be slightly different.
193 >>> msg = message_from_string("""\
194 ... MIME-Version: 1.0
195 ... Content-Type: multipart/mixed; boundary="BOUNDARY"
198 ... Content-type: text/plain; charset=us-ascii; format=flowed; delsp=no
200 ... This is a message.
202 ... Content-type: text/plain; name="xtext.txt"
203 ... Content-Disposition: attachment; filename="xtext.txt"
205 ... This is a text attachment.
208 >>> scrubbed_msg = process(mlist, msg, {})
209 >>> print scrubbed_msg.as_string()
212 Content-Transfer-Encoding: 7bit
213 Content-Type: text/plain; charset="us-ascii"; format="flowed"; delsp="no"
216 -------------- next part --------------
217 An embedded and charset-unspecified text was scrubbed...
219 URL: <http://www.example.com/pipermail/_xtest@example.com/attachments/.../attachment.txt>
221 >>> read_url_from_message(msg)
222 'This is a text attachment.'
228 >>> config.pop('test config')