9 This is an example of how to write WSGI middleware with WebOb. The
10 specific example adds a simple comment form to HTML web pages; any
11 page served through the middleware that is HTML gets a comment form
12 added to it, and shows any existing comments.
17 The finished code for this is available in
18 `docs/comment-example-code/example.py
19 <http://svn.pythonpaste.org/Paste/WebOb/trunk/docs/comment-example-code/example.py>`_
20 -- you can run that file as a script to try it out.
22 Instantiating Middleware
23 ------------------------
25 Middleware of any complexity at all is usually best created as a
26 class with its configuration as arguments to that class.
28 Every middleware needs an application (``app``) that it wraps. This
29 middleware also needs a location to store the comments; we'll put them
30 all in a single directory.
36 class Commenter(object):
37 def __init__(self, app, storage_dir):
39 self.storage_dir = storage_dir
40 if not os.path.exists(storage_dir):
41 os.makedirs(storage_dir)
43 When you use this middleware, you'll use it like:
47 app = ... make the application ...
48 app = Commenter(app, storage_dir='./comments')
50 For our application we'll use a simple static file server that is
51 included with `Paste <http://pythonpaste.org>`_ (use ``easy_install
52 Paste`` to install this). The setup is all at the bottom of
53 ``example.py``, and looks like this:
57 if __name__ == '__main__':
59 parser = optparse.OptionParser(
60 usage='%prog --port=PORT BASE_DIRECTORY'
67 help='Port to serve on (default 8080)')
72 help='Place to put comment data into (default ./comments/)')
73 options, args = parser.parse_args()
75 parser.error('You must give a BASE_DIRECTORY')
77 from paste.urlparser import StaticURLParser
78 app = StaticURLParser(base_dir)
79 app = Commenter(app, options.comment_data)
80 from wsgiref.simple_server import make_server
81 httpd = make_server('localhost', options.port, app)
82 print 'Serving on http://localhost:%s' % options.port
85 except KeyboardInterrupt:
88 I won't explain it here, but basically it takes some options, creates
89 an application that serves static files
90 (``StaticURLParser(base_dir)``), wraps it with ``Commenter(app,
91 options.comment_data)`` then serves that.
96 While we've created the class structure for the middleware, it doesn't
97 actually do anything. Here's a kind of minimal version of the
98 middleware (using WebOb):
102 from webob import Request
104 class Commenter(object):
106 def __init__(self, app, storage_dir):
108 self.storage_dir = storage_dir
109 if not os.path.exists(storage_dir):
110 os.makedirs(storage_dir)
112 def __call__(self, environ, start_response):
113 req = Request(environ)
114 resp = req.get_response(self.app)
115 return resp(environ, start_response)
117 This doesn't modify the response it any way. You could write it like
122 class Commenter(object):
124 def __call__(self, environ, start_response):
125 return self.app(environ, start_response)
127 But it won't be as convenient later. First, lets create a little bit
128 of infrastructure for our middleware. We need to save and load
129 per-url data (the comments themselves). We'll keep them in pickles,
130 where each url has a pickle named after the url (but double-quoted, so
131 ``http://localhost:8080/index.html`` becomes
132 ``http%3A%2F%2Flocalhost%3A8080%2Findex.html``).
136 from cPickle import load, dump
138 class Commenter(object):
141 def get_data(self, url):
142 filename = self.url_filename(url)
143 if not os.path.exists(filename):
146 f = open(filename, 'rb')
151 def save_data(self, url, data):
152 filename = self.url_filename(url)
153 f = open(filename, 'wb')
157 def url_filename(self, url):
158 # Double-quoting makes the filename safe
159 return os.path.join(self.storage_dir, urllib.quote(url, ''))
161 You can get the full request URL with ``req.url``, so to get the
162 comment data with these methods you do ``data =
163 self.get_data(req.url)``.
165 Now we'll update the ``__call__`` method to filter *some* responses,
166 and get the comment data for those. We don't want to change responses
167 that were error responses (anything but ``200``), nor do we want to
168 filter responses that aren't HTML. So we get:
172 class Commenter(object):
175 def __call__(self, environ, start_response):
176 req = Request(environ)
177 resp = req.get_response(self.app)
178 if resp.content_type != 'text/html' or resp.status_int != 200:
179 return resp(environ, start_response)
180 data = self.get_data(req.url)
181 ... do stuff with data, update resp ...
182 return resp(environ, start_response)
184 So far we're punting on actually adding the comments to the page. We
185 also haven't defined what ``data`` will hold. Let's say it's a list
186 of dictionaries, where each dictionary looks like ``{'name': 'John
187 Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great
190 We'll also need a simple method to add stuff to the page. We'll use a
191 regular expression to find the end of the page and put text in:
197 class Commenter(object):
200 _end_body_re = re.compile(r'</body.*?>', re.I|re.S)
202 def add_to_end(self, html, extra_html):
204 Adds extra_html to the end of the html page (before </body>)
206 match = self._end_body_re.search(html)
208 return html + extra_html
210 return html[:match.start()] + extra_html + html[match.start():]
212 And then we'll use it like:
216 data = self.get_data(req.url)
218 body = self.add_to_end(body, self.format_comments(data))
220 return resp(environ, start_response)
222 We get the body, update it, and put it back in the response. This
223 also updates ``Content-Length``. Then we define:
227 from webob import html_escape
229 class Commenter(object):
232 def format_comments(self, comments):
237 text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments))
238 for comment in comments:
239 text.append('<h3><a href="%s">%s</a> at %s:</h3>' % (
240 html_escape(comment['homepage']), html_escape(comment['name']),
241 time.strftime('%c', comment['time'])))
242 # Susceptible to XSS attacks!:
243 text.append(comment['comments'])
246 We put in a header (with an anchor we'll use later), and a section for
247 each comment. Note that ``html_escape`` is the same as ``cgi.escape``
248 and just turns ``&`` into ``&``, etc.
250 Because we put in some text without quoting it is susceptible to a
251 `Cross-Site Scripting
252 <http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack. Fixing
253 that is beyond the scope of this tutorial; you could quote it or clean
254 it with something like `lxml.html.clean
255 <http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_.
260 All of those pieces *display* comments, but still no one can actually
261 make comments. To handle this we'll take a little piece of the URL
262 space for our own, everything under ``/.comments``, so when someone
263 POSTs there it will add a comment.
265 When the request comes in there are two parts to the path:
266 ``SCRIPT_NAME`` and ``PATH_INFO``. Everything in ``SCRIPT_NAME`` has
267 already been parsed, and everything in ``PATH_INFO`` has yet to be
268 parsed. That means that the URL *without* ``PATH_INFO`` is the path
269 to the middleware; we can intercept anything else below
270 ``SCRIPT_NAME`` but nothing above it. The name for the URL without
271 ``PATH_INFO`` is ``req.application_url``. We have to capture it early
272 to make sure it doesn't change (since the WSGI application we are
273 wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``).
275 So here's what this all looks like:
279 class Commenter(object):
282 def __call__(self, environ, start_response):
283 req = Request(environ)
284 if req.path_info_peek() == '.comments':
285 return self.process_comment(req)(environ, start_response)
286 # This is the base path of *this* middleware:
287 base_url = req.application_url
288 resp = req.get_response(self.app)
289 if resp.content_type != 'text/html' or resp.status_int != 200:
290 # Not an HTML response, we don't want to
292 return resp(environ, start_response)
293 # Make sure the content isn't gzipped:
294 resp.decode_content()
295 comments = self.get_data(req.url)
297 body = self.add_to_end(body, self.format_comments(comments))
298 body = self.add_to_end(body, self.submit_form(base_url, req))
300 return resp(environ, start_response)
302 ``base_url`` is the path where the middleware is located (if you run
303 the example server, it will be ``http://localhost:PORT/``). We use
304 ``req.path_info_peek()`` to look at the next segment of the URL --
305 what comes after base_url. If it is ``.comments`` then we handle it
306 internally and don't pass the request on.
308 We also put in a little guard, ``resp.decode_content()`` in case the
309 application returns a gzipped response.
311 Then we get the data, add the comments, add the *form* to make new
312 comments, and return the result.
317 Here's what the form looks like:
321 class Commenter(object):
324 def submit_form(self, base_path, req):
325 return '''<h2>Leave a comment:</h2>
326 <form action="%s/.comments" method="POST">
327 <input type="hidden" name="url" value="%s">
328 <table width="100%%">
330 <td><input type="text" name="name" style="width: 100%%"></td></tr>
332 <td><input type="text" name="homepage" style="width: 100%%"></td></tr>
335 <textarea name="comments" rows=10 style="width: 100%%"></textarea><br>
336 <input type="submit" value="Submit comment">
338 ''' % (base_path, html_escape(req.url))
340 Nothing too exciting. It submits a form with the keys ``url`` (the
341 URL being commented on), ``name``, ``homepage``, and ``comments``.
346 If you look at the method call, what we do is call the method then
347 treat the result as a WSGI application:
351 return self.process_comment(req)(environ, start_response)
353 You could write this as:
357 response = self.process_comment(req)
358 return response(environ, start_response)
360 A common pattern in WSGI middleware that *doesn't* use WebOb is to
365 return self.process_comment(environ, start_response)
367 But the WebOb style makes it easier to modify the response if you want
368 to; modifying a traditional WSGI response/application output requires
369 changing your logic flow considerably.
371 Here's the actual processing code:
375 from webob import exc
376 from webob import Response
378 class Commenter(object):
381 def process_comment(self, req):
383 url = req.params['url']
384 name = req.params['name']
385 homepage = req.params['homepage']
386 comments = req.params['comments']
388 resp = exc.HTTPBadRequest('Missing parameter: %s' % e)
390 data = self.get_data(url)
396 self.save_data(url, data)
397 resp = exc.HTTPSeeOther(location=url+'#comment-area')
400 We either give a Bad Request response (if the form submission is
401 somehow malformed), or a redirect back to the original page.
403 The classes in ``webob.exc`` (like ``HTTPBadRequest`` and
404 ``HTTPSeeOther``) are Response subclasses that can be used to quickly
405 create responses for these non-200 cases where the response body
406 usually doesn't matter much.
411 This shows how to make response modifying middleware, which is
412 probably the most difficult kind of middleware to write with WSGI --
413 modifying the request is quite simple in comparison, as you simply