google_appengine/lib/webob/docs/comment-example.txt

   1 Comment Example
   2 ===============
   3
   4 .. contents::
   5
   6 Introduction
   7 ------------
   8
   9 This is an example of how to write WSGI middleware with WebOb.  The
  10 specific example adds a simple comment form to HTML web pages; any
  11 page served through the middleware that is HTML gets a comment form
  12 added to it, and shows any existing comments.
  13
  14 Code
  15 ----
  16
  17 The finished code for this is available in
  18 `docs/comment-example-code/example.py
  19 <http://svn.pythonpaste.org/Paste/WebOb/trunk/docs/comment-example-code/example.py>`_
  20 -- you can run that file as a script to try it out.
  21
  22 Instantiating Middleware
  23 ------------------------
  24
  25 Middleware of any complexity at all is usually best created as a
  26 class with its configuration as arguments to that class.
  27
  28 Every middleware needs an application (``app``) that it wraps.  This
  29 middleware also needs a location to store the comments; we'll put them
  30 all in a single directory.
  31
  32 .. code-block::
  33
  34     import os
  35
  36     class Commenter(object):
  37         def __init__(self, app, storage_dir):
  38             self.app = app
  39             self.storage_dir = storage_dir
  40             if not os.path.exists(storage_dir):
  41                 os.makedirs(storage_dir)
  42
  43 When you use this middleware, you'll use it like:
  44
  45 .. code-block::
  46
  47     app = ... make the application ...
  48     app = Commenter(app, storage_dir='./comments')
  49
  50 For our application we'll use a simple static file server that is
  51 included with `Paste <http://pythonpaste.org>`_ (use ``easy_install
  52 Paste`` to install this).  The setup is all at the bottom of
  53 ``example.py``, and looks like this:
  54
  55 .. code-block::
  56
  57     if __name__ == '__main__':
  58         import optparse
  59         parser = optparse.OptionParser(
  60             usage='%prog --port=PORT BASE_DIRECTORY'
  61             )
  62         parser.add_option(
  63             '-p', '--port',
  64             default='8080',
  65             dest='port',
  66             type='int',
  67             help='Port to serve on (default 8080)')
  68         parser.add_option(
  69             '--comment-data',
  70             default='./comments',
  71             dest='comment_data',
  72             help='Place to put comment data into (default ./comments/)')
  73         options, args = parser.parse_args()
  74         if not args:
  75             parser.error('You must give a BASE_DIRECTORY')
  76         base_dir = args[0]
  77         from paste.urlparser import StaticURLParser
  78         app = StaticURLParser(base_dir)
  79         app = Commenter(app, options.comment_data)
  80         from wsgiref.simple_server import make_server
  81         httpd = make_server('localhost', options.port, app)
  82         print 'Serving on http://localhost:%s' % options.port
  83         try:
  84             httpd.serve_forever()
  85         except KeyboardInterrupt:
  86             print '^C'
  87
  88 I won't explain it here, but basically it takes some options, creates
  89 an application that serves static files
  90 (``StaticURLParser(base_dir)``), wraps it with ``Commenter(app,
  91 options.comment_data)`` then serves that.
  92
  93 The Middleware
  94 --------------
  95
  96 While we've created the class structure for the middleware, it doesn't
  97 actually do anything.  Here's a kind of minimal version of the
  98 middleware (using WebOb):
  99
 100 .. code-block::
 101
 102     from webob import Request
 103
 104     class Commenter(object):
 105
 106         def __init__(self, app, storage_dir):
 107             self.app = app
 108             self.storage_dir = storage_dir
 109             if not os.path.exists(storage_dir):
 110                 os.makedirs(storage_dir)
 111
 112         def __call__(self, environ, start_response):
 113             req = Request(environ)
 114             resp = req.get_response(self.app)
 115             return resp(environ, start_response)
 116
 117 This doesn't modify the response it any way.  You could write it like
 118 this without WebOb:
 119
 120 .. code-block::
 121
 122     class Commenter(object):
 123         ...
 124         def __call__(self, environ, start_response):
 125             return self.app(environ, start_response)
 126
 127 But it won't be as convenient later.  First, lets create a little bit
 128 of infrastructure for our middleware.  We need to save and load
 129 per-url data (the comments themselves).  We'll keep them in pickles,
 130 where each url has a pickle named after the url (but double-quoted, so
 131 ``http://localhost:8080/index.html`` becomes
 132 ``http%3A%2F%2Flocalhost%3A8080%2Findex.html``).
 133
 134 .. code-block::
 135
 136     from cPickle import load, dump
 137
 138     class Commenter(object):
 139         ...
 140
 141         def get_data(self, url):
 142             filename = self.url_filename(url)
 143             if not os.path.exists(filename):
 144                 return []
 145             else:
 146                 f = open(filename, 'rb')
 147                 data = load(f)
 148                 f.close()
 149                 return data
 150
 151         def save_data(self, url, data):
 152             filename = self.url_filename(url)
 153             f = open(filename, 'wb')
 154             dump(data, f)
 155             f.close()
 156
 157         def url_filename(self, url):
 158             # Double-quoting makes the filename safe
 159             return os.path.join(self.storage_dir, urllib.quote(url, ''))
 160
 161 You can get the full request URL with ``req.url``, so to get the
 162 comment data with these methods you do ``data =
 163 self.get_data(req.url)``.
 164
 165 Now we'll update the ``__call__`` method to filter *some* responses,
 166 and get the comment data for those.  We don't want to change responses
 167 that were error responses (anything but ``200``), nor do we want to
 168 filter responses that aren't HTML.  So we get:
 169
 170 .. code-block::
 171
 172     class Commenter(object):
 173         ...
 174
 175         def __call__(self, environ, start_response):
 176             req = Request(environ)
 177             resp = req.get_response(self.app)
 178             if resp.content_type != 'text/html' or resp.status_int != 200:
 179                 return resp(environ, start_response)
 180             data = self.get_data(req.url)
 181             ... do stuff with data, update resp ...
 182             return resp(environ, start_response)
 183
 184 So far we're punting on actually adding the comments to the page.  We
 185 also haven't defined what ``data`` will hold.  Let's say it's a list
 186 of dictionaries, where each dictionary looks like ``{'name': 'John
 187 Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great
 188 site!'}``.
 189
 190 We'll also need a simple method to add stuff to the page.  We'll use a
 191 regular expression to find the end of the page and put text in:
 192
 193 .. code-block::
 194
 195     import re
 196
 197     class Commenter(object):
 198         ...
 199
 200         _end_body_re = re.compile(r'</body.*?>', re.I|re.S)
 201
 202         def add_to_end(self, html, extra_html):
 203             """
 204             Adds extra_html to the end of the html page (before </body>)
 205             """
 206             match = self._end_body_re.search(html)
 207             if not match:
 208                 return html + extra_html
 209             else:
 210                 return html[:match.start()] + extra_html + html[match.start():]
 211
 212 And then we'll use it like:
 213
 214 .. code-block::
 215
 216     data = self.get_data(req.url)
 217     body = resp.body
 218     body = self.add_to_end(body, self.format_comments(data))
 219     resp.body = body
 220     return resp(environ, start_response)
 221
 222 We get the body, update it, and put it back in the response.  This
 223 also updates ``Content-Length``.  Then we define:
 224
 225 .. code-block::
 226
 227     from webob import html_escape
 228
 229     class Commenter(object):
 230         ...
 231
 232         def format_comments(self, comments):
 233             if not comments:
 234                 return ''
 235             text = []
 236             text.append('<hr>')
 237             text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments))
 238             for comment in comments:
 239                 text.append('<h3><a href="%s">%s</a> at %s:</h3>' % (
 240                     html_escape(comment['homepage']), html_escape(comment['name']),
 241                     time.strftime('%c', comment['time'])))
 242                 # Susceptible to XSS attacks!:
 243                 text.append(comment['comments'])
 244             return ''.join(text)
 245
 246 We put in a header (with an anchor we'll use later), and a section for
 247 each comment.  Note that ``html_escape`` is the same as ``cgi.escape``
 248 and just turns ``&`` into ``&amp;``, etc.
 249
 250 Because we put in some text without quoting it is susceptible to a
 251 `Cross-Site Scripting
 252 <http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack.  Fixing
 253 that is beyond the scope of this tutorial; you could quote it or clean
 254 it with something like `lxml.html.clean
 255 <http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_.
 256
 257 Accepting Comments
 258 ------------------
 259
 260 All of those pieces *display* comments, but still no one can actually
 261 make comments.  To handle this we'll take a little piece of the URL
 262 space for our own, everything under ``/.comments``, so when someone
 263 POSTs there it will add a comment.
 264
 265 When the request comes in there are two parts to the path:
 266 ``SCRIPT_NAME`` and ``PATH_INFO``.  Everything in ``SCRIPT_NAME`` has
 267 already been parsed, and everything in ``PATH_INFO`` has yet to be
 268 parsed.  That means that the URL *without* ``PATH_INFO`` is the path
 269 to the middleware; we can intercept anything else below
 270 ``SCRIPT_NAME`` but nothing above it.  The name for the URL without
 271 ``PATH_INFO`` is ``req.application_url``.  We have to capture it early
 272 to make sure it doesn't change (since the WSGI application we are
 273 wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``).
 274
 275 So here's what this all looks like:
 276
 277 .. code-block::
 278
 279     class Commenter(object):
 280         ...
 281
 282         def __call__(self, environ, start_response):
 283             req = Request(environ)
 284             if req.path_info_peek() == '.comments':
 285                 return self.process_comment(req)(environ, start_response)
 286             # This is the base path of *this* middleware:
 287             base_url = req.application_url
 288             resp = req.get_response(self.app)
 289             if resp.content_type != 'text/html' or resp.status_int != 200:
 290                 # Not an HTML response, we don't want to
 291                 # do anything to it
 292                 return resp(environ, start_response)
 293             # Make sure the content isn't gzipped:
 294             resp.decode_content()
 295             comments = self.get_data(req.url)
 296             body = resp.body
 297             body = self.add_to_end(body, self.format_comments(comments))
 298             body = self.add_to_end(body, self.submit_form(base_url, req))
 299             resp.body = body
 300             return resp(environ, start_response)
 301
 302 ``base_url`` is the path where the middleware is located (if you run
 303 the example server, it will be ``http://localhost:PORT/``).  We use
 304 ``req.path_info_peek()`` to look at the next segment of the URL --
 305 what comes after base_url.  If it is ``.comments`` then we handle it
 306 internally and don't pass the request on.
 307
 308 We also put in a little guard, ``resp.decode_content()`` in case the
 309 application returns a gzipped response.
 310
 311 Then we get the data, add the comments, add the *form* to make new
 312 comments, and return the result.
 313
 314 submit_form
 315 ~~~~~~~~~~~
 316
 317 Here's what the form looks like:
 318
 319 .. code-block::
 320
 321     class Commenter(object):
 322         ...
 323
 324         def submit_form(self, base_path, req):
 325             return '''<h2>Leave a comment:</h2>
 326             <form action="%s/.comments" method="POST">
 327              <input type="hidden" name="url" value="%s">
 328              <table width="100%%">
 329               <tr><td>Name:</td>
 330                   <td><input type="text" name="name" style="width: 100%%"></td></tr>
 331               <tr><td>URL:</td>
 332                   <td><input type="text" name="homepage" style="width: 100%%"></td></tr>
 333              </table>
 334              Comments:<br>
 335              <textarea name="comments" rows=10 style="width: 100%%"></textarea><br>
 336              <input type="submit" value="Submit comment">
 337             </form>
 338             ''' % (base_path, html_escape(req.url))
 339
 340 Nothing too exciting.  It submits a form with the keys ``url`` (the
 341 URL being commented on), ``name``, ``homepage``, and ``comments``.
 342
 343 process_comment
 344 ~~~~~~~~~~~~~~~
 345
 346 If you look at the method call, what we do is call the method then
 347 treat the result as a WSGI application:
 348
 349 .. code-block::
 350
 351     return self.process_comment(req)(environ, start_response)
 352
 353 You could write this as:
 354
 355 .. code-block::
 356
 357     response = self.process_comment(req)
 358     return response(environ, start_response)
 359
 360 A common pattern in WSGI middleware that *doesn't* use WebOb is to
 361 just do:
 362
 363 .. code-block::
 364
 365     return self.process_comment(environ, start_response)
 366
 367 But the WebOb style makes it easier to modify the response if you want
 368 to; modifying a traditional WSGI response/application output requires
 369 changing your logic flow considerably.
 370
 371 Here's the actual processing code:
 372
 373 .. code-block::
 374
 375     from webob import exc
 376     from webob import Response
 377
 378     class Commenter(object):
 379         ...
 380
 381         def process_comment(self, req):
 382             try:
 383                 url = req.params['url']
 384                 name = req.params['name']
 385                 homepage = req.params['homepage']
 386                 comments = req.params['comments']
 387             except KeyError, e:
 388                 resp = exc.HTTPBadRequest('Missing parameter: %s' % e)
 389                 return resp
 390             data = self.get_data(url)
 391             data.append(dict(
 392                 name=name,
 393                 homepage=homepage,
 394                 comments=comments,
 395                 time=time.gmtime()))
 396             self.save_data(url, data)
 397             resp = exc.HTTPSeeOther(location=url+'#comment-area')
 398             return resp
 399
 400 We either give a Bad Request response (if the form submission is
 401 somehow malformed), or a redirect back to the original page.
 402
 403 The classes in ``webob.exc`` (like ``HTTPBadRequest`` and
 404 ``HTTPSeeOther``) are Response subclasses that can be used to quickly
 405 create responses for these non-200 cases where the response body
 406 usually doesn't matter much.
 407
 408 Conclusion
 409 ----------
 410
 411 This shows how to make response modifying middleware, which is
 412 probably the most difficult kind of middleware to write with WSGI --
 413 modifying the request is quite simple in comparison, as you simply
 414 update ``environ``.