lib/cherrypy/_cpreqbody.py

   1 """Request body processing for CherryPy.
   2
   3 .. versionadded:: 3.2
   4
   5 Application authors have complete control over the parsing of HTTP request
   6 entities. In short, :attr:`cherrypy.request.body<cherrypy._cprequest.Request.body>`
   7 is now always set to an instance of :class:`RequestBody<cherrypy._cpreqbody.RequestBody>`,
   8 and *that* class is a subclass of :class:`Entity<cherrypy._cpreqbody.Entity>`.
   9
  10 When an HTTP request includes an entity body, it is often desirable to
  11 provide that information to applications in a form other than the raw bytes.
  12 Different content types demand different approaches. Examples:
  13
  14  * For a GIF file, we want the raw bytes in a stream.
  15  * An HTML form is better parsed into its component fields, and each text field
  16    decoded from bytes to unicode.
  17  * A JSON body should be deserialized into a Python dict or list.
  18
  19 When the request contains a Content-Type header, the media type is used as a
  20 key to look up a value in the
  21 :attr:`request.body.processors<cherrypy._cpreqbody.Entity.processors>` dict.
  22 If the full media
  23 type is not found, then the major type is tried; for example, if no processor
  24 is found for the 'image/jpeg' type, then we look for a processor for the 'image'
  25 types altogether. If neither the full type nor the major type has a matching
  26 processor, then a default processor is used
  27 (:func:`default_proc<cherrypy._cpreqbody.Entity.default_proc>`). For most
  28 types, this means no processing is done, and the body is left unread as a
  29 raw byte stream. Processors are configurable in an 'on_start_resource' hook.
  30
  31 Some processors, especially those for the 'text' types, attempt to decode bytes
  32 to unicode. If the Content-Type request header includes a 'charset' parameter,
  33 this is used to decode the entity. Otherwise, one or more default charsets may
  34 be attempted, although this decision is up to each processor. If a processor
  35 successfully decodes an Entity or Part, it should set the
  36 :attr:`charset<cherrypy._cpreqbody.Entity.charset>` attribute
  37 on the Entity or Part to the name of the successful charset, so that
  38 applications can easily re-encode or transcode the value if they wish.
  39
  40 If the Content-Type of the request entity is of major type 'multipart', then
  41 the above parsing process, and possibly a decoding process, is performed for
  42 each part.
  43
  44 For both the full entity and multipart parts, a Content-Disposition header may
  45 be used to fill :attr:`name<cherrypy._cpreqbody.Entity.name>` and
  46 :attr:`filename<cherrypy._cpreqbody.Entity.filename>` attributes on the
  47 request.body or the Part.
  48
  49 .. _custombodyprocessors:
  50
  51 Custom Processors
  52 =================
  53
  54 You can add your own processors for any specific or major MIME type. Simply add
  55 it to the :attr:`processors<cherrypy._cprequest.Entity.processors>` dict in a
  56 hook/tool that runs at ``on_start_resource`` or ``before_request_body``.
  57 Here's the built-in JSON tool for an example::
  58
  59     def json_in(force=True, debug=False):
  60         request = cherrypy.serving.request
  61         def json_processor(entity):
  62             \"""Read application/json data into request.json.\"""
  63             if not entity.headers.get("Content-Length", ""):
  64                 raise cherrypy.HTTPError(411)
  65
  66             body = entity.fp.read()
  67             try:
  68                 request.json = json_decode(body)
  69             except ValueError:
  70                 raise cherrypy.HTTPError(400, 'Invalid JSON document')
  71         if force:
  72             request.body.processors.clear()
  73             request.body.default_proc = cherrypy.HTTPError(
  74                 415, 'Expected an application/json content type')
  75         request.body.processors['application/json'] = json_processor
  76
  77 We begin by defining a new ``json_processor`` function to stick in the ``processors``
  78 dictionary. All processor functions take a single argument, the ``Entity`` instance
  79 they are to process. It will be called whenever a request is received (for those
  80 URI's where the tool is turned on) which has a ``Content-Type`` of
  81 "application/json".
  82
  83 First, it checks for a valid ``Content-Length`` (raising 411 if not valid), then
  84 reads the remaining bytes on the socket. The ``fp`` object knows its own length, so
  85 it won't hang waiting for data that never arrives. It will return when all data
  86 has been read. Then, we decode those bytes using Python's built-in ``json`` module,
  87 and stick the decoded result onto ``request.json`` . If it cannot be decoded, we
  88 raise 400.
  89
  90 If the "force" argument is True (the default), the ``Tool`` clears the ``processors``
  91 dict so that request entities of other ``Content-Types`` aren't parsed at all. Since
  92 there's no entry for those invalid MIME types, the ``default_proc`` method of ``cherrypy.request.body``
  93 is called. But this does nothing by default (usually to provide the page handler an opportunity to handle it.)
  94 But in our case, we want to raise 415, so we replace ``request.body.default_proc``
  95 with the error (``HTTPError`` instances, when called, raise themselves).
  96
  97 If we were defining a custom processor, we can do so without making a ``Tool``. Just add the config entry::
  98
  99     request.body.processors = {'application/json': json_processor}
 100
 101 Note that you can only replace the ``processors`` dict wholesale this way, not update the existing one.
 102 """
 103
 104 import re
 105 import sys
 106 import tempfile
 107 from urllib import unquote_plus
 108
 109 import cherrypy
 110 from cherrypy._cpcompat import basestring, ntob, ntou
 111 from cherrypy.lib import httputil
 112
 113
 114 # -------------------------------- Processors -------------------------------- #
 115
 116 def process_urlencoded(entity):
 117     """Read application/x-www-form-urlencoded data into entity.params."""
 118     qs = entity.fp.read()
 119     for charset in entity.attempt_charsets:
 120         try:
 121             params = {}
 122             for aparam in qs.split(ntob('&')):
 123                 for pair in aparam.split(ntob(';')):
 124                     if not pair:
 125                         continue
 126
 127                     atoms = pair.split(ntob('='), 1)
 128                     if len(atoms) == 1:
 129                         atoms.append(ntob(''))
 130
 131                     key = unquote_plus(atoms[0]).decode(charset)
 132                     value = unquote_plus(atoms[1]).decode(charset)
 133
 134                     if key in params:
 135                         if not isinstance(params[key], list):
 136                             params[key] = [params[key]]
 137                         params[key].append(value)
 138                     else:
 139                         params[key] = value
 140         except UnicodeDecodeError:
 141             pass
 142         else:
 143             entity.charset = charset
 144             break
 145     else:
 146         raise cherrypy.HTTPError(
 147             400, "The request entity could not be decoded. The following "
 148             "charsets were attempted: %s" % repr(entity.attempt_charsets))
 149
 150     # Now that all values have been successfully parsed and decoded,
 151     # apply them to the entity.params dict.
 152     for key, value in params.items():
 153         if key in entity.params:
 154             if not isinstance(entity.params[key], list):
 155                 entity.params[key] = [entity.params[key]]
 156             entity.params[key].append(value)
 157         else:
 158             entity.params[key] = value
 159
 160
 161 def process_multipart(entity):
 162     """Read all multipart parts into entity.parts."""
 163     ib = ""
 164     if 'boundary' in entity.content_type.params:
 165         # http://tools.ietf.org/html/rfc2046#section-5.1.1
 166         # "The grammar for parameters on the Content-type field is such that it
 167         # is often necessary to enclose the boundary parameter values in quotes
 168         # on the Content-type line"
 169         ib = entity.content_type.params['boundary'].strip('"')
 170
 171     if not re.match("^[ -~]{0,200}[!-~]$", ib):
 172         raise ValueError('Invalid boundary in multipart form: %r' % (ib,))
 173
 174     ib = ('--' + ib).encode('ascii')
 175
 176     # Find the first marker
 177     while True:
 178         b = entity.readline()
 179         if not b:
 180             return
 181
 182         b = b.strip()
 183         if b == ib:
 184             break
 185
 186     # Read all parts
 187     while True:
 188         part = entity.part_class.from_fp(entity.fp, ib)
 189         entity.parts.append(part)
 190         part.process()
 191         if part.fp.done:
 192             break
 193
 194 def process_multipart_form_data(entity):
 195     """Read all multipart/form-data parts into entity.parts or entity.params."""
 196     process_multipart(entity)
 197
 198     kept_parts = []
 199     for part in entity.parts:
 200         if part.name is None:
 201             kept_parts.append(part)
 202         else:
 203             if part.filename is None:
 204                 # It's a regular field
 205                 value = part.fullvalue()
 206             else:
 207                 # It's a file upload. Retain the whole part so consumer code
 208                 # has access to its .file and .filename attributes.
 209                 value = part
 210
 211             if part.name in entity.params:
 212                 if not isinstance(entity.params[part.name], list):
 213                     entity.params[part.name] = [entity.params[part.name]]
 214                 entity.params[part.name].append(value)
 215             else:
 216                 entity.params[part.name] = value
 217
 218     entity.parts = kept_parts
 219
 220 def _old_process_multipart(entity):
 221     """The behavior of 3.2 and lower. Deprecated and will be changed in 3.3."""
 222     process_multipart(entity)
 223
 224     params = entity.params
 225
 226     for part in entity.parts:
 227         if part.name is None:
 228             key = ntou('parts')
 229         else:
 230             key = part.name
 231
 232         if part.filename is None:
 233             # It's a regular field
 234             value = part.fullvalue()
 235         else:
 236             # It's a file upload. Retain the whole part so consumer code
 237             # has access to its .file and .filename attributes.
 238             value = part
 239
 240         if key in params:
 241             if not isinstance(params[key], list):
 242                 params[key] = [params[key]]
 243             params[key].append(value)
 244         else:
 245             params[key] = value
 246
 247
 248
 249 # --------------------------------- Entities --------------------------------- #
 250
 251
 252 class Entity(object):
 253     """An HTTP request body, or MIME multipart body.
 254
 255     This class collects information about the HTTP request entity. When a
 256     given entity is of MIME type "multipart", each part is parsed into its own
 257     Entity instance, and the set of parts stored in
 258     :attr:`entity.parts<cherrypy._cpreqbody.Entity.parts>`.
 259
 260     Between the ``before_request_body`` and ``before_handler`` tools, CherryPy
 261     tries to process the request body (if any) by calling
 262     :func:`request.body.process<cherrypy._cpreqbody.RequestBody.process`.
 263     This uses the ``content_type`` of the Entity to look up a suitable processor
 264     in :attr:`Entity.processors<cherrypy._cpreqbody.Entity.processors>`, a dict.
 265     If a matching processor cannot be found for the complete Content-Type,
 266     it tries again using the major type. For example, if a request with an
 267     entity of type "image/jpeg" arrives, but no processor can be found for
 268     that complete type, then one is sought for the major type "image". If a
 269     processor is still not found, then the
 270     :func:`default_proc<cherrypy._cpreqbody.Entity.default_proc>` method of the
 271     Entity is called (which does nothing by default; you can override this too).
 272
 273     CherryPy includes processors for the "application/x-www-form-urlencoded"
 274     type, the "multipart/form-data" type, and the "multipart" major type.
 275     CherryPy 3.2 processes these types almost exactly as older versions.
 276     Parts are passed as arguments to the page handler using their
 277     ``Content-Disposition.name`` if given, otherwise in a generic "parts"
 278     argument. Each such part is either a string, or the
 279     :class:`Part<cherrypy._cpreqbody.Part>` itself if it's a file. (In this
 280     case it will have ``file`` and ``filename`` attributes, or possibly a
 281     ``value`` attribute). Each Part is itself a subclass of
 282     Entity, and has its own ``process`` method and ``processors`` dict.
 283
 284     There is a separate processor for the "multipart" major type which is more
 285     flexible, and simply stores all multipart parts in
 286     :attr:`request.body.parts<cherrypy._cpreqbody.Entity.parts>`. You can
 287     enable it with::
 288
 289         cherrypy.request.body.processors['multipart'] = _cpreqbody.process_multipart
 290
 291     in an ``on_start_resource`` tool.
 292     """
 293
 294     # http://tools.ietf.org/html/rfc2046#section-4.1.2:
 295     # "The default character set, which must be assumed in the
 296     # absence of a charset parameter, is US-ASCII."
 297     # However, many browsers send data in utf-8 with no charset.
 298     attempt_charsets = ['utf-8']
 299     """A list of strings, each of which should be a known encoding.
 300
 301     When the Content-Type of the request body warrants it, each of the given
 302     encodings will be tried in order. The first one to successfully decode the
 303     entity without raising an error is stored as
 304     :attr:`entity.charset<cherrypy._cpreqbody.Entity.charset>`. This defaults
 305     to ``['utf-8']`` (plus 'ISO-8859-1' for "text/\*" types, as required by
 306     `HTTP/1.1 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_),
 307     but ``['us-ascii', 'utf-8']`` for multipart parts.
 308     """
 309
 310     charset = None
 311     """The successful decoding; see "attempt_charsets" above."""
 312
 313     content_type = None
 314     """The value of the Content-Type request header.
 315
 316     If the Entity is part of a multipart payload, this will be the Content-Type
 317     given in the MIME headers for this part.
 318     """
 319
 320     default_content_type = 'application/x-www-form-urlencoded'
 321     """This defines a default ``Content-Type`` to use if no Content-Type header
 322     is given. The empty string is used for RequestBody, which results in the
 323     request body not being read or parsed at all. This is by design; a missing
 324     ``Content-Type`` header in the HTTP request entity is an error at best,
 325     and a security hole at worst. For multipart parts, however, the MIME spec
 326     declares that a part with no Content-Type defaults to "text/plain"
 327     (see :class:`Part<cherrypy._cpreqbody.Part>`).
 328     """
 329
 330     filename = None
 331     """The ``Content-Disposition.filename`` header, if available."""
 332
 333     fp = None
 334     """The readable socket file object."""
 335
 336     headers = None
 337     """A dict of request/multipart header names and values.
 338
 339     This is a copy of the ``request.headers`` for the ``request.body``;
 340     for multipart parts, it is the set of headers for that part.
 341     """
 342
 343     length = None
 344     """The value of the ``Content-Length`` header, if provided."""
 345
 346     name = None
 347     """The "name" parameter of the ``Content-Disposition`` header, if any."""
 348
 349     params = None
 350     """
 351     If the request Content-Type is 'application/x-www-form-urlencoded' or
 352     multipart, this will be a dict of the params pulled from the entity
 353     body; that is, it will be the portion of request.params that come
 354     from the message body (sometimes called "POST params", although they
 355     can be sent with various HTTP method verbs). This value is set between
 356     the 'before_request_body' and 'before_handler' hooks (assuming that
 357     process_request_body is True)."""
 358
 359     processors = {'application/x-www-form-urlencoded': process_urlencoded,
 360                   'multipart/form-data': process_multipart_form_data,
 361                   'multipart': process_multipart,
 362                   }
 363     """A dict of Content-Type names to processor methods."""
 364
 365     parts = None
 366     """A list of Part instances if ``Content-Type`` is of major type "multipart"."""
 367
 368     part_class = None
 369     """The class used for multipart parts.
 370
 371     You can replace this with custom subclasses to alter the processing of
 372     multipart parts.
 373     """
 374
 375     def __init__(self, fp, headers, params=None, parts=None):
 376         # Make an instance-specific copy of the class processors
 377         # so Tools, etc. can replace them per-request.
 378         self.processors = self.processors.copy()
 379
 380         self.fp = fp
 381         self.headers = headers
 382
 383         if params is None:
 384             params = {}
 385         self.params = params
 386
 387         if parts is None:
 388             parts = []
 389         self.parts = parts
 390
 391         # Content-Type
 392         self.content_type = headers.elements('Content-Type')
 393         if self.content_type:
 394             self.content_type = self.content_type[0]
 395         else:
 396             self.content_type = httputil.HeaderElement.from_str(
 397                 self.default_content_type)
 398
 399         # Copy the class 'attempt_charsets', prepending any Content-Type charset
 400         dec = self.content_type.params.get("charset", None)
 401         if dec:
 402             #dec = dec.decode('ISO-8859-1')
 403             self.attempt_charsets = [dec] + [c for c in self.attempt_charsets
 404                                              if c != dec]
 405         else:
 406             self.attempt_charsets = self.attempt_charsets[:]
 407
 408         # Length
 409         self.length = None
 410         clen = headers.get('Content-Length', None)
 411         # If Transfer-Encoding is 'chunked', ignore any Content-Length.
 412         if clen is not None and 'chunked' not in headers.get('Transfer-Encoding', ''):
 413             try:
 414                 self.length = int(clen)
 415             except ValueError:
 416                 pass
 417
 418         # Content-Disposition
 419         self.name = None
 420         self.filename = None
 421         disp = headers.elements('Content-Disposition')
 422         if disp:
 423             disp = disp[0]
 424             if 'name' in disp.params:
 425                 self.name = disp.params['name']
 426                 if self.name.startswith('"') and self.name.endswith('"'):
 427                     self.name = self.name[1:-1]
 428             if 'filename' in disp.params:
 429                 self.filename = disp.params['filename']
 430                 if self.filename.startswith('"') and self.filename.endswith('"'):
 431                     self.filename = self.filename[1:-1]
 432
 433     # The 'type' attribute is deprecated in 3.2; remove it in 3.3.
 434     type = property(lambda self: self.content_type,
 435         doc="""A deprecated alias for :attr:`content_type<cherrypy._cpreqbody.Entity.content_type>`.""")
 436
 437     def read(self, size=None, fp_out=None):
 438         return self.fp.read(size, fp_out)
 439
 440     def readline(self, size=None):
 441         return self.fp.readline(size)
 442
 443     def readlines(self, sizehint=None):
 444         return self.fp.readlines(sizehint)
 445
 446     def __iter__(self):
 447         return self
 448
 449     def next(self):
 450         line = self.readline()
 451         if not line:
 452             raise StopIteration
 453         return line
 454
 455     def read_into_file(self, fp_out=None):
 456         """Read the request body into fp_out (or make_file() if None). Return fp_out."""
 457         if fp_out is None:
 458             fp_out = self.make_file()
 459         self.read(fp_out=fp_out)
 460         return fp_out
 461
 462     def make_file(self):
 463         """Return a file-like object into which the request body will be read.
 464
 465         By default, this will return a TemporaryFile. Override as needed.
 466         See also :attr:`cherrypy._cpreqbody.Part.maxrambytes`."""
 467         return tempfile.TemporaryFile()
 468
 469     def fullvalue(self):
 470         """Return this entity as a string, whether stored in a file or not."""
 471         if self.file:
 472             # It was stored in a tempfile. Read it.
 473             self.file.seek(0)
 474             value = self.file.read()
 475             self.file.seek(0)
 476         else:
 477             value = self.value
 478         return value
 479
 480     def process(self):
 481         """Execute the best-match processor for the given media type."""
 482         proc = None
 483         ct = self.content_type.value
 484         try:
 485             proc = self.processors[ct]
 486         except KeyError:
 487             toptype = ct.split('/', 1)[0]
 488             try:
 489                 proc = self.processors[toptype]
 490             except KeyError:
 491                 pass
 492         if proc is None:
 493             self.default_proc()
 494         else:
 495             proc(self)
 496
 497     def default_proc(self):
 498         """Called if a more-specific processor is not found for the ``Content-Type``."""
 499         # Leave the fp alone for someone else to read. This works fine
 500         # for request.body, but the Part subclasses need to override this
 501         # so they can move on to the next part.
 502         pass
 503
 504
 505 class Part(Entity):
 506     """A MIME part entity, part of a multipart entity."""
 507
 508     # "The default character set, which must be assumed in the absence of a
 509     # charset parameter, is US-ASCII."
 510     attempt_charsets = ['us-ascii', 'utf-8']
 511     """A list of strings, each of which should be a known encoding.
 512
 513     When the Content-Type of the request body warrants it, each of the given
 514     encodings will be tried in order. The first one to successfully decode the
 515     entity without raising an error is stored as
 516     :attr:`entity.charset<cherrypy._cpreqbody.Entity.charset>`. This defaults
 517     to ``['utf-8']`` (plus 'ISO-8859-1' for "text/\*" types, as required by
 518     `HTTP/1.1 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_),
 519     but ``['us-ascii', 'utf-8']`` for multipart parts.
 520     """
 521
 522     boundary = None
 523     """The MIME multipart boundary."""
 524
 525     default_content_type = 'text/plain'
 526     """This defines a default ``Content-Type`` to use if no Content-Type header
 527     is given. The empty string is used for RequestBody, which results in the
 528     request body not being read or parsed at all. This is by design; a missing
 529     ``Content-Type`` header in the HTTP request entity is an error at best,
 530     and a security hole at worst. For multipart parts, however (this class),
 531     the MIME spec declares that a part with no Content-Type defaults to
 532     "text/plain".
 533     """
 534
 535     # This is the default in stdlib cgi. We may want to increase it.
 536     maxrambytes = 1000
 537     """The threshold of bytes after which point the ``Part`` will store its data
 538     in a file (generated by :func:`make_file<cherrypy._cprequest.Entity.make_file>`)
 539     instead of a string. Defaults to 1000, just like the :mod:`cgi` module in
 540     Python's standard library.
 541     """
 542
 543     def __init__(self, fp, headers, boundary):
 544         Entity.__init__(self, fp, headers)
 545         self.boundary = boundary
 546         self.file = None
 547         self.value = None
 548
 549     def from_fp(cls, fp, boundary):
 550         headers = cls.read_headers(fp)
 551         return cls(fp, headers, boundary)
 552     from_fp = classmethod(from_fp)
 553
 554     def read_headers(cls, fp):
 555         headers = httputil.HeaderMap()
 556         while True:
 557             line = fp.readline()
 558             if not line:
 559                 # No more data--illegal end of headers
 560                 raise EOFError("Illegal end of headers.")
 561
 562             if line == ntob('\r\n'):
 563                 # Normal end of headers
 564                 break
 565             if not line.endswith(ntob('\r\n')):
 566                 raise ValueError("MIME requires CRLF terminators: %r" % line)
 567
 568             if line[0] in ntob(' \t'):
 569                 # It's a continuation line.
 570                 v = line.strip().decode('ISO-8859-1')
 571             else:
 572                 k, v = line.split(ntob(":"), 1)
 573                 k = k.strip().decode('ISO-8859-1')
 574                 v = v.strip().decode('ISO-8859-1')
 575
 576             existing = headers.get(k)
 577             if existing:
 578                 v = ", ".join((existing, v))
 579             headers[k] = v
 580
 581         return headers
 582     read_headers = classmethod(read_headers)
 583
 584     def read_lines_to_boundary(self, fp_out=None):
 585         """Read bytes from self.fp and return or write them to a file.
 586
 587         If the 'fp_out' argument is None (the default), all bytes read are
 588         returned in a single byte string.
 589
 590         If the 'fp_out' argument is not None, it must be a file-like object that
 591         supports the 'write' method; all bytes read will be written to the fp,
 592         and that fp is returned.
 593         """
 594         endmarker = self.boundary + ntob("--")
 595         delim = ntob("")
 596         prev_lf = True
 597         lines = []
 598         seen = 0
 599         while True:
 600             line = self.fp.readline(1<<16)
 601             if not line:
 602                 raise EOFError("Illegal end of multipart body.")
 603             if line.startswith(ntob("--")) and prev_lf:
 604                 strippedline = line.strip()
 605                 if strippedline == self.boundary:
 606                     break
 607                 if strippedline == endmarker:
 608                     self.fp.finish()
 609                     break
 610
 611             line = delim + line
 612
 613             if line.endswith(ntob("\r\n")):
 614                 delim = ntob("\r\n")
 615                 line = line[:-2]
 616                 prev_lf = True
 617             elif line.endswith(ntob("\n")):
 618                 delim = ntob("\n")
 619                 line = line[:-1]
 620                 prev_lf = True
 621             else:
 622                 delim = ntob("")
 623                 prev_lf = False
 624
 625             if fp_out is None:
 626                 lines.append(line)
 627                 seen += len(line)
 628                 if seen > self.maxrambytes:
 629                     fp_out = self.make_file()
 630                     for line in lines:
 631                         fp_out.write(line)
 632             else:
 633                 fp_out.write(line)
 634
 635         if fp_out is None:
 636             result = ntob('').join(lines)
 637             for charset in self.attempt_charsets:
 638                 try:
 639                     result = result.decode(charset)
 640                 except UnicodeDecodeError:
 641                     pass
 642                 else:
 643                     self.charset = charset
 644                     return result
 645             else:
 646                 raise cherrypy.HTTPError(
 647                     400, "The request entity could not be decoded. The following "
 648                     "charsets were attempted: %s" % repr(self.attempt_charsets))
 649         else:
 650             fp_out.seek(0)
 651             return fp_out
 652
 653     def default_proc(self):
 654         """Called if a more-specific processor is not found for the ``Content-Type``."""
 655         if self.filename:
 656             # Always read into a file if a .filename was given.
 657             self.file = self.read_into_file()
 658         else:
 659             result = self.read_lines_to_boundary()
 660             if isinstance(result, basestring):
 661                 self.value = result
 662             else:
 663                 self.file = result
 664
 665     def read_into_file(self, fp_out=None):
 666         """Read the request body into fp_out (or make_file() if None). Return fp_out."""
 667         if fp_out is None:
 668             fp_out = self.make_file()
 669         self.read_lines_to_boundary(fp_out=fp_out)
 670         return fp_out
 671
 672 Entity.part_class = Part
 673
 674
 675 class Infinity(object):
 676     def __cmp__(self, other):
 677         return 1
 678     def __sub__(self, other):
 679         return self
 680 inf = Infinity()
 681
 682
 683 comma_separated_headers = ['Accept', 'Accept-Charset', 'Accept-Encoding',
 684     'Accept-Language', 'Accept-Ranges', 'Allow', 'Cache-Control', 'Connection',
 685     'Content-Encoding', 'Content-Language', 'Expect', 'If-Match',
 686     'If-None-Match', 'Pragma', 'Proxy-Authenticate', 'Te', 'Trailer',
 687     'Transfer-Encoding', 'Upgrade', 'Vary', 'Via', 'Warning', 'Www-Authenticate']
 688
 689
 690 class SizedReader:
 691
 692     def __init__(self, fp, length, maxbytes, bufsize=8192, has_trailers=False):
 693         # Wrap our fp in a buffer so peek() works
 694         self.fp = fp
 695         self.length = length
 696         self.maxbytes = maxbytes
 697         self.buffer = ntob('')
 698         self.bufsize = bufsize
 699         self.bytes_read = 0
 700         self.done = False
 701         self.has_trailers = has_trailers
 702
 703     def read(self, size=None, fp_out=None):
 704         """Read bytes from the request body and return or write them to a file.
 705
 706         A number of bytes less than or equal to the 'size' argument are read
 707         off the socket. The actual number of bytes read are tracked in
 708         self.bytes_read. The number may be smaller than 'size' when 1) the
 709         client sends fewer bytes, 2) the 'Content-Length' request header
 710         specifies fewer bytes than requested, or 3) the number of bytes read
 711         exceeds self.maxbytes (in which case, 413 is raised).
 712
 713         If the 'fp_out' argument is None (the default), all bytes read are
 714         returned in a single byte string.
 715
 716         If the 'fp_out' argument is not None, it must be a file-like object that
 717         supports the 'write' method; all bytes read will be written to the fp,
 718         and None is returned.
 719         """
 720
 721         if self.length is None:
 722             if size is None:
 723                 remaining = inf
 724             else:
 725                 remaining = size
 726         else:
 727             remaining = self.length - self.bytes_read
 728             if size and size < remaining:
 729                 remaining = size
 730         if remaining == 0:
 731             self.finish()
 732             if fp_out is None:
 733                 return ntob('')
 734             else:
 735                 return None
 736
 737         chunks = []
 738
 739         # Read bytes from the buffer.
 740         if self.buffer:
 741             if remaining is inf:
 742                 data = self.buffer
 743                 self.buffer = ntob('')
 744             else:
 745                 data = self.buffer[:remaining]
 746                 self.buffer = self.buffer[remaining:]
 747             datalen = len(data)
 748             remaining -= datalen
 749
 750             # Check lengths.
 751             self.bytes_read += datalen
 752             if self.maxbytes and self.bytes_read > self.maxbytes:
 753                 raise cherrypy.HTTPError(413)
 754
 755             # Store the data.
 756             if fp_out is None:
 757                 chunks.append(data)
 758             else:
 759                 fp_out.write(data)
 760
 761         # Read bytes from the socket.
 762         while remaining > 0:
 763             chunksize = min(remaining, self.bufsize)
 764             try:
 765                 data = self.fp.read(chunksize)
 766             except Exception:
 767                 e = sys.exc_info()[1]
 768                 if e.__class__.__name__ == 'MaxSizeExceeded':
 769                     # Post data is too big
 770                     raise cherrypy.HTTPError(
 771                         413, "Maximum request length: %r" % e.args[1])
 772                 else:
 773                     raise
 774             if not data:
 775                 self.finish()
 776                 break
 777             datalen = len(data)
 778             remaining -= datalen
 779
 780             # Check lengths.
 781             self.bytes_read += datalen
 782             if self.maxbytes and self.bytes_read > self.maxbytes:
 783                 raise cherrypy.HTTPError(413)
 784
 785             # Store the data.
 786             if fp_out is None:
 787                 chunks.append(data)
 788             else:
 789                 fp_out.write(data)
 790
 791         if fp_out is None:
 792             return ntob('').join(chunks)
 793
 794     def readline(self, size=None):
 795         """Read a line from the request body and return it."""
 796         chunks = []
 797         while size is None or size > 0:
 798             chunksize = self.bufsize
 799             if size is not None and size < self.bufsize:
 800                 chunksize = size
 801             data = self.read(chunksize)
 802             if not data:
 803                 break
 804             pos = data.find(ntob('\n')) + 1
 805             if pos:
 806                 chunks.append(data[:pos])
 807                 remainder = data[pos:]
 808                 self.buffer += remainder
 809                 self.bytes_read -= len(remainder)
 810                 break
 811             else:
 812                 chunks.append(data)
 813         return ntob('').join(chunks)
 814
 815     def readlines(self, sizehint=None):
 816         """Read lines from the request body and return them."""
 817         if self.length is not None:
 818             if sizehint is None:
 819                 sizehint = self.length - self.bytes_read
 820             else:
 821                 sizehint = min(sizehint, self.length - self.bytes_read)
 822
 823         lines = []
 824         seen = 0
 825         while True:
 826             line = self.readline()
 827             if not line:
 828                 break
 829             lines.append(line)
 830             seen += len(line)
 831             if seen >= sizehint:
 832                 break
 833         return lines
 834
 835     def finish(self):
 836         self.done = True
 837         if self.has_trailers and hasattr(self.fp, 'read_trailer_lines'):
 838             self.trailers = {}
 839
 840             try:
 841                 for line in self.fp.read_trailer_lines():
 842                     if line[0] in ntob(' \t'):
 843                         # It's a continuation line.
 844                         v = line.strip()
 845                     else:
 846                         try:
 847                             k, v = line.split(ntob(":"), 1)
 848                         except ValueError:
 849                             raise ValueError("Illegal header line.")
 850                         k = k.strip().title()
 851                         v = v.strip()
 852
 853                     if k in comma_separated_headers:
 854                         existing = self.trailers.get(envname)
 855                         if existing:
 856                             v = ntob(", ").join((existing, v))
 857                     self.trailers[k] = v
 858             except Exception:
 859                 e = sys.exc_info()[1]
 860                 if e.__class__.__name__ == 'MaxSizeExceeded':
 861                     # Post data is too big
 862                     raise cherrypy.HTTPError(
 863                         413, "Maximum request length: %r" % e.args[1])
 864                 else:
 865                     raise
 866
 867
 868 class RequestBody(Entity):
 869     """The entity of the HTTP request."""
 870
 871     bufsize = 8 * 1024
 872     """The buffer size used when reading the socket."""
 873
 874     # Don't parse the request body at all if the client didn't provide
 875     # a Content-Type header. See http://www.cherrypy.org/ticket/790
 876     default_content_type = ''
 877     """This defines a default ``Content-Type`` to use if no Content-Type header
 878     is given. The empty string is used for RequestBody, which results in the
 879     request body not being read or parsed at all. This is by design; a missing
 880     ``Content-Type`` header in the HTTP request entity is an error at best,
 881     and a security hole at worst. For multipart parts, however, the MIME spec
 882     declares that a part with no Content-Type defaults to "text/plain"
 883     (see :class:`Part<cherrypy._cpreqbody.Part>`).
 884     """
 885
 886     maxbytes = None
 887     """Raise ``MaxSizeExceeded`` if more bytes than this are read from the socket."""
 888
 889     def __init__(self, fp, headers, params=None, request_params=None):
 890         Entity.__init__(self, fp, headers, params)
 891
 892         # http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
 893         # When no explicit charset parameter is provided by the
 894         # sender, media subtypes of the "text" type are defined
 895         # to have a default charset value of "ISO-8859-1" when
 896         # received via HTTP.
 897         if self.content_type.value.startswith('text/'):
 898             for c in ('ISO-8859-1', 'iso-8859-1', 'Latin-1', 'latin-1'):
 899                 if c in self.attempt_charsets:
 900                     break
 901             else:
 902                 self.attempt_charsets.append('ISO-8859-1')
 903
 904         # Temporary fix while deprecating passing .parts as .params.
 905         self.processors['multipart'] = _old_process_multipart
 906
 907         if request_params is None:
 908             request_params = {}
 909         self.request_params = request_params
 910
 911     def process(self):
 912         """Process the request entity based on its Content-Type."""
 913         # "The presence of a message-body in a request is signaled by the
 914         # inclusion of a Content-Length or Transfer-Encoding header field in
 915         # the request's message-headers."
 916         # It is possible to send a POST request with no body, for example;
 917         # however, app developers are responsible in that case to set
 918         # cherrypy.request.process_body to False so this method isn't called.
 919         h = cherrypy.serving.request.headers
 920         if 'Content-Length' not in h and 'Transfer-Encoding' not in h:
 921             raise cherrypy.HTTPError(411)
 922
 923         self.fp = SizedReader(self.fp, self.length,
 924                               self.maxbytes, bufsize=self.bufsize,
 925                               has_trailers='Trailer' in h)
 926         super(RequestBody, self).process()
 927
 928         # Body params should also be a part of the request_params
 929         # add them in here.
 930         request_params = self.request_params
 931         for key, value in self.params.items():
 932             # Python 2 only: keyword arguments must be byte strings (type 'str').
 933             if isinstance(key, unicode):
 934                 key = key.encode('ISO-8859-1')
 935
 936             if key in request_params:
 937                 if not isinstance(request_params[key], list):
 938                     request_params[key] = [request_params[key]]
 939                 request_params[key].append(value)
 940             else:
 941                 request_params[key] = value