Doc/lib/libcgi.tex

   1 \section{\module{cgi} ---
   2          Common Gateway Interface support.}
   3 \declaremodule{standard}{cgi}
   4
   5 \modulesynopsis{Common Gateway Interface support, used to interpret
   6 forms in server-side scripts.}
   7
   8 \indexii{WWW}{server}
   9 \indexii{CGI}{protocol}
  10 \indexii{HTTP}{protocol}
  11 \indexii{MIME}{headers}
  12 \index{URL}
  13
  14
  15 Support module for Common Gateway Interface (CGI) scripts.%
  16 \index{Common Gateway Interface}
  17
  18 This module defines a number of utilities for use by CGI scripts
  19 written in Python.
  20
  21 \subsection{Introduction}
  22 \nodename{cgi-intro}
  23
  24 A CGI script is invoked by an HTTP server, usually to process user
  25 input submitted through an HTML \code{<FORM>} or \code{<ISINDEX>} element.
  26
  27 Most often, CGI scripts live in the server's special \file{cgi-bin}
  28 directory.  The HTTP server places all sorts of information about the
  29 request (such as the client's hostname, the requested URL, the query
  30 string, and lots of other goodies) in the script's shell environment,
  31 executes the script, and sends the script's output back to the client.
  32
  33 The script's input is connected to the client too, and sometimes the
  34 form data is read this way; at other times the form data is passed via
  35 the ``query string'' part of the URL.  This module is intended
  36 to take care of the different cases and provide a simpler interface to
  37 the Python script.  It also provides a number of utilities that help
  38 in debugging scripts, and the latest addition is support for file
  39 uploads from a form (if your browser supports it).
  40
  41 The output of a CGI script should consist of two sections, separated
  42 by a blank line.  The first section contains a number of headers,
  43 telling the client what kind of data is following.  Python code to
  44 generate a minimal header section looks like this:
  45
  46 \begin{verbatim}
  47 print "Content-Type: text/html"     # HTML is following
  48 print                               # blank line, end of headers
  49 \end{verbatim}
  50
  51 The second section is usually HTML, which allows the client software
  52 to display nicely formatted text with header, in-line images, etc.
  53 Here's Python code that prints a simple piece of HTML:
  54
  55 \begin{verbatim}
  56 print "<TITLE>CGI script output</TITLE>"
  57 print "<H1>This is my first CGI script</H1>"
  58 print "Hello, world!"
  59 \end{verbatim}
  60
  61 \subsection{Using the cgi module}
  62 \nodename{Using the cgi module}
  63
  64 Begin by writing \samp{import cgi}.  Do not use \samp{from cgi import
  65 *} --- the module defines all sorts of names for its own use or for
  66 backward compatibility that you don't want in your namespace.
  67
  68 When you write a new script, consider adding the line:
  69
  70 \begin{verbatim}
  71 import cgitb; cgitb.enable()
  72 \end{verbatim}
  73
  74 This activates a special exception handler that will display detailed
  75 reports in the Web browser if any errors occur.  If you'd rather not
  76 show the guts of your program to users of your script, you can have
  77 the reports saved to files instead, with a line like this:
  78
  79 \begin{verbatim}
  80 import cgitb; cgitb.enable(display=0, logdir="/tmp")
  81 \end{verbatim}
  82
  83 It's very helpful to use this feature during script development.
  84 The reports produced by \refmodule{cgitb} provide information that
  85 can save you a lot of time in tracking down bugs.  You can always
  86 remove the \code{cgitb} line later when you have tested your script
  87 and are confident that it works correctly.
  88
  89 To get at submitted form data,
  90 it's best to use the \class{FieldStorage} class.  The other classes
  91 defined in this module are provided mostly for backward compatibility.
  92 Instantiate it exactly once, without arguments.  This reads the form
  93 contents from standard input or the environment (depending on the
  94 value of various environment variables set according to the CGI
  95 standard).  Since it may consume standard input, it should be
  96 instantiated only once.
  97
  98 The \class{FieldStorage} instance can be indexed like a Python
  99 dictionary, and also supports the standard dictionary methods
 100 \method{has_key()} and \method{keys()}.  The built-in \function{len()}
 101 is also supported.  Form fields containing empty strings are ignored
 102 and do not appear in the dictionary; to keep such values, provide
 103 a true value for the optional \var{keep_blank_values} keyword
 104 parameter when creating the \class{FieldStorage} instance.
 105
 106 For instance, the following code (which assumes that the
 107 \mailheader{Content-Type} header and blank line have already been
 108 printed) checks that the fields \code{name} and \code{addr} are both
 109 set to a non-empty string:
 110
 111 \begin{verbatim}
 112 form = cgi.FieldStorage()
 113 if not (form.has_key("name") and form.has_key("addr")):
 114     print "<H1>Error</H1>"
 115     print "Please fill in the name and addr fields."
 116     return
 117 print "<p>name:", form["name"].value
 118 print "<p>addr:", form["addr"].value
 119 ...further form processing here...
 120 \end{verbatim}
 121
 122 Here the fields, accessed through \samp{form[\var{key}]}, are
 123 themselves instances of \class{FieldStorage} (or
 124 \class{MiniFieldStorage}, depending on the form encoding).
 125 The \member{value} attribute of the instance yields the string value
 126 of the field.  The \method{getvalue()} method returns this string value
 127 directly; it also accepts an optional second argument as a default to
 128 return if the requested key is not present.
 129
 130 If the submitted form data contains more than one field with the same
 131 name, the object retrieved by \samp{form[\var{key}]} is not a
 132 \class{FieldStorage} or \class{MiniFieldStorage}
 133 instance but a list of such instances.  Similarly, in this situation,
 134 \samp{form.getvalue(\var{key})} would return a list of strings.
 135 If you expect this possibility
 136 (when your HTML form contains multiple fields with the same name), use
 137 the \function{getlist()} function, which always returns a list of values (so that you
 138 do not need to special-case the single item case).  For example, this
 139 code concatenates any number of username fields, separated by
 140 commas:
 141
 142 \begin{verbatim}
 143 value = form.getlist("username")
 144 usernames = ",".join(value)
 145 \end{verbatim}
 146
 147 If a field represents an uploaded file, accessing the value via the
 148 \member{value} attribute or the \function{getvalue()} method reads the
 149 entire file in memory as a string.  This may not be what you want.
 150 You can test for an uploaded file by testing either the \member{filename}
 151 attribute or the \member{file} attribute.  You can then read the data at
 152 leisure from the \member{file} attribute:
 153
 154 \begin{verbatim}
 155 fileitem = form["userfile"]
 156 if fileitem.file:
 157     # It's an uploaded file; count lines
 158     linecount = 0
 159     while 1:
 160         line = fileitem.file.readline()
 161         if not line: break
 162         linecount = linecount + 1
 163 \end{verbatim}
 164
 165 The file upload draft standard entertains the possibility of uploading
 166 multiple files from one field (using a recursive
 167 \mimetype{multipart/*} encoding).  When this occurs, the item will be
 168 a dictionary-like \class{FieldStorage} item.  This can be determined
 169 by testing its \member{type} attribute, which should be
 170 \mimetype{multipart/form-data} (or perhaps another MIME type matching
 171 \mimetype{multipart/*}).  In this case, it can be iterated over
 172 recursively just like the top-level form object.
 173
 174 When a form is submitted in the ``old'' format (as the query string or
 175 as a single data part of type
 176 \mimetype{application/x-www-form-urlencoded}), the items will actually
 177 be instances of the class \class{MiniFieldStorage}.  In this case, the
 178 \member{list}, \member{file}, and \member{filename} attributes are
 179 always \code{None}.
 180
 181
 182 \subsection{Higher Level Interface}
 183
 184 \versionadded{2.2}  % XXX: Is this true ?
 185
 186 The previous section explains how to read CGI form data using the
 187 \class{FieldStorage} class.  This section describes a higher level
 188 interface which was added to this class to allow one to do it in a
 189 more readable and intuitive way.  The interface doesn't make the
 190 techniques described in previous sections obsolete --- they are still
 191 useful to process file uploads efficiently, for example.
 192
 193 The interface consists of two simple methods. Using the methods
 194 you can process form data in a generic way, without the need to worry
 195 whether only one or more values were posted under one name.
 196
 197 In the previous section, you learned to write following code anytime
 198 you expected a user to post more than one value under one name:
 199
 200 \begin{verbatim}
 201 item = form.getvalue("item")
 202 if isinstance(item, list):
 203     # The user is requesting more than one item.
 204 else:
 205     # The user is requesting only one item.
 206 \end{verbatim}
 207
 208 This situation is common for example when a form contains a group of
 209 multiple checkboxes with the same name:
 210
 211 \begin{verbatim}
 212 <input type="checkbox" name="item" value="1" />
 213 <input type="checkbox" name="item" value="2" />
 214 \end{verbatim}
 215
 216 In most situations, however, there's only one form control with a
 217 particular name in a form and then you expect and need only one value
 218 associated with this name.  So you write a script containing for
 219 example this code:
 220
 221 \begin{verbatim}
 222 user = form.getvalue("user").upper()
 223 \end{verbatim}
 224
 225 The problem with the code is that you should never expect that a
 226 client will provide valid input to your scripts.  For example, if a
 227 curious user appends another \samp{user=foo} pair to the query string,
 228 then the script would crash, because in this situation the
 229 \code{getvalue("user")} method call returns a list instead of a
 230 string.  Calling the \method{toupper()} method on a list is not valid
 231 (since lists do not have a method of this name) and results in an
 232 \exception{AttributeError} exception.
 233
 234 Therefore, the appropriate way to read form data values was to always
 235 use the code which checks whether the obtained value is a single value
 236 or a list of values.  That's annoying and leads to less readable
 237 scripts.
 238
 239 A more convenient approach is to use the methods \method{getfirst()}
 240 and \method{getlist()} provided by this higher level interface.
 241
 242 \begin{methoddesc}[FieldStorage]{getfirst}{name\optional{, default}}
 243   This method always returns only one value associated with form field
 244   \var{name}.  The method returns only the first value in case that
 245   more values were posted under such name.  Please note that the order
 246   in which the values are received may vary from browser to browser
 247   and should not be counted on.\footnote{Note that some recent
 248       versions of the HTML specification do state what order the
 249       field values should be supplied in, but knowing whether a
 250       request was received from a conforming browser, or even from a
 251       browser at all, is tedious and error-prone.}  If no such form
 252   field or value exists then the method returns the value specified by
 253   the optional parameter \var{default}.  This parameter defaults to
 254   \code{None} if not specified.
 255 \end{methoddesc}
 256
 257 \begin{methoddesc}[FieldStorage]{getlist}{name}
 258   This method always returns a list of values associated with form
 259   field \var{name}.  The method returns an empty list if no such form
 260   field or value exists for \var{name}.  It returns a list consisting
 261   of one item if only one such value exists.
 262 \end{methoddesc}
 263
 264 Using these methods you can write nice compact code:
 265
 266 \begin{verbatim}
 267 import cgi
 268 form = cgi.FieldStorage()
 269 user = form.getfirst("user", "").upper()    # This way it's safe.
 270 for item in form.getlist("item"):
 271     do_something(item)
 272 \end{verbatim}
 273
 274
 275 \subsection{Old classes}
 276
 277 These classes, present in earlier versions of the \module{cgi} module,
 278 are still supported for backward compatibility.  New applications
 279 should use the \class{FieldStorage} class.
 280
 281 \class{SvFormContentDict} stores single value form content as
 282 dictionary; it assumes each field name occurs in the form only once.
 283
 284 \class{FormContentDict} stores multiple value form content as a
 285 dictionary (the form items are lists of values).  Useful if your form
 286 contains multiple fields with the same name.
 287
 288 Other classes (\class{FormContent}, \class{InterpFormContentDict}) are
 289 present for backwards compatibility with really old applications only.
 290 If you still use these and would be inconvenienced when they
 291 disappeared from a next version of this module, drop me a note.
 292
 293
 294 \subsection{Functions}
 295 \nodename{Functions in cgi module}
 296
 297 These are useful if you want more control, or if you want to employ
 298 some of the algorithms implemented in this module in other
 299 circumstances.
 300
 301 \begin{funcdesc}{parse}{fp\optional{, keep_blank_values\optional{,
 302                         strict_parsing}}}
 303   Parse a query in the environment or from a file (the file defaults
 304   to \code{sys.stdin}).  The \var{keep_blank_values} and
 305   \var{strict_parsing} parameters are passed to \function{parse_qs()}
 306   unchanged.
 307 \end{funcdesc}
 308
 309 \begin{funcdesc}{parse_qs}{qs\optional{, keep_blank_values\optional{,
 310                            strict_parsing}}}
 311 Parse a query string given as a string argument (data of type
 312 \mimetype{application/x-www-form-urlencoded}).  Data are
 313 returned as a dictionary.  The dictionary keys are the unique query
 314 variable names and the values are lists of values for each name.
 315
 316 The optional argument \var{keep_blank_values} is
 317 a flag indicating whether blank values in
 318 URL encoded queries should be treated as blank strings.
 319 A true value indicates that blanks should be retained as
 320 blank strings.  The default false value indicates that
 321 blank values are to be ignored and treated as if they were
 322 not included.
 323
 324 The optional argument \var{strict_parsing} is a flag indicating what
 325 to do with parsing errors.  If false (the default), errors
 326 are silently ignored.  If true, errors raise a \exception{ValueError}
 327 exception.
 328
 329 Use the \function{\refmodule{urllib}.urlencode()} function to convert
 330 such dictionaries into query strings.
 331
 332 \end{funcdesc}
 333
 334 \begin{funcdesc}{parse_qsl}{qs\optional{, keep_blank_values\optional{,
 335                             strict_parsing}}}
 336 Parse a query string given as a string argument (data of type
 337 \mimetype{application/x-www-form-urlencoded}).  Data are
 338 returned as a list of name, value pairs.
 339
 340 The optional argument \var{keep_blank_values} is
 341 a flag indicating whether blank values in
 342 URL encoded queries should be treated as blank strings.
 343 A true value indicates that blanks should be retained as
 344 blank strings.  The default false value indicates that
 345 blank values are to be ignored and treated as if they were
 346 not included.
 347
 348 The optional argument \var{strict_parsing} is a flag indicating what
 349 to do with parsing errors.  If false (the default), errors
 350 are silently ignored.  If true, errors raise a \exception{ValueError}
 351 exception.
 352
 353 Use the \function{\refmodule{urllib}.urlencode()} function to convert
 354 such lists of pairs into query strings.
 355 \end{funcdesc}
 356
 357 \begin{funcdesc}{parse_multipart}{fp, pdict}
 358 Parse input of type \mimetype{multipart/form-data} (for
 359 file uploads).  Arguments are \var{fp} for the input file and
 360 \var{pdict} for a dictionary containing other parameters in
 361 the \mailheader{Content-Type} header.
 362
 363 Returns a dictionary just like \function{parse_qs()} keys are the
 364 field names, each value is a list of values for that field.  This is
 365 easy to use but not much good if you are expecting megabytes to be
 366 uploaded --- in that case, use the \class{FieldStorage} class instead
 367 which is much more flexible.
 368
 369 Note that this does not parse nested multipart parts --- use
 370 \class{FieldStorage} for that.
 371 \end{funcdesc}
 372
 373 \begin{funcdesc}{parse_header}{string}
 374 Parse a MIME header (such as \mailheader{Content-Type}) into a main
 375 value and a dictionary of parameters.
 376 \end{funcdesc}
 377
 378 \begin{funcdesc}{test}{}
 379 Robust test CGI script, usable as main program.
 380 Writes minimal HTTP headers and formats all information provided to
 381 the script in HTML form.
 382 \end{funcdesc}
 383
 384 \begin{funcdesc}{print_environ}{}
 385 Format the shell environment in HTML.
 386 \end{funcdesc}
 387
 388 \begin{funcdesc}{print_form}{form}
 389 Format a form in HTML.
 390 \end{funcdesc}
 391
 392 \begin{funcdesc}{print_directory}{}
 393 Format the current directory in HTML.
 394 \end{funcdesc}
 395
 396 \begin{funcdesc}{print_environ_usage}{}
 397 Print a list of useful (used by CGI) environment variables in
 398 HTML.
 399 \end{funcdesc}
 400
 401 \begin{funcdesc}{escape}{s\optional{, quote}}
 402 Convert the characters
 403 \character{\&}, \character{<} and \character{>} in string \var{s} to
 404 HTML-safe sequences.  Use this if you need to display text that might
 405 contain such characters in HTML.  If the optional flag \var{quote} is
 406 true, the quotation mark character (\character{"}) is also translated;
 407 this helps for inclusion in an HTML attribute value, as in \code{<A
 408 HREF="...">}.  If the value to be quoted might include single- or
 409 double-quote characters, or both, consider using the
 410 \function{quoteattr()} function in the \refmodule{xml.sax.saxutils}
 411 module instead.
 412 \end{funcdesc}
 413
 414
 415 \subsection{Caring about security \label{cgi-security}}
 416
 417 \indexii{CGI}{security}
 418
 419 There's one important rule: if you invoke an external program (via the
 420 \function{os.system()} or \function{os.popen()} functions. or others
 421 with similar functionality), make very sure you don't pass arbitrary
 422 strings received from the client to the shell.  This is a well-known
 423 security hole whereby clever hackers anywhere on the Web can exploit a
 424 gullible CGI script to invoke arbitrary shell commands.  Even parts of
 425 the URL or field names cannot be trusted, since the request doesn't
 426 have to come from your form!
 427
 428 To be on the safe side, if you must pass a string gotten from a form
 429 to a shell command, you should make sure the string contains only
 430 alphanumeric characters, dashes, underscores, and periods.
 431
 432
 433 \subsection{Installing your CGI script on a \UNIX\ system}
 434
 435 Read the documentation for your HTTP server and check with your local
 436 system administrator to find the directory where CGI scripts should be
 437 installed; usually this is in a directory \file{cgi-bin} in the server tree.
 438
 439 Make sure that your script is readable and executable by ``others''; the
 440 \UNIX{} file mode should be \code{0755} octal (use \samp{chmod 0755
 441 \var{filename}}).  Make sure that the first line of the script contains
 442 \code{\#!} starting in column 1 followed by the pathname of the Python
 443 interpreter, for instance:
 444
 445 \begin{verbatim}
 446 #!/usr/local/bin/python
 447 \end{verbatim}
 448
 449 Make sure the Python interpreter exists and is executable by ``others''.
 450
 451 Make sure that any files your script needs to read or write are
 452 readable or writable, respectively, by ``others'' --- their mode
 453 should be \code{0644} for readable and \code{0666} for writable.  This
 454 is because, for security reasons, the HTTP server executes your script
 455 as user ``nobody'', without any special privileges.  It can only read
 456 (write, execute) files that everybody can read (write, execute).  The
 457 current directory at execution time is also different (it is usually
 458 the server's cgi-bin directory) and the set of environment variables
 459 is also different from what you get when you log in.  In particular, don't
 460 count on the shell's search path for executables (\envvar{PATH}) or
 461 the Python module search path (\envvar{PYTHONPATH}) to be set to
 462 anything interesting.
 463
 464 If you need to load modules from a directory which is not on Python's
 465 default module search path, you can change the path in your script,
 466 before importing other modules.  For example:
 467
 468 \begin{verbatim}
 469 import sys
 470 sys.path.insert(0, "/usr/home/joe/lib/python")
 471 sys.path.insert(0, "/usr/local/lib/python")
 472 \end{verbatim}
 473
 474 (This way, the directory inserted last will be searched first!)
 475
 476 Instructions for non-\UNIX{} systems will vary; check your HTTP server's
 477 documentation (it will usually have a section on CGI scripts).
 478
 479
 480 \subsection{Testing your CGI script}
 481
 482 Unfortunately, a CGI script will generally not run when you try it
 483 from the command line, and a script that works perfectly from the
 484 command line may fail mysteriously when run from the server.  There's
 485 one reason why you should still test your script from the command
 486 line: if it contains a syntax error, the Python interpreter won't
 487 execute it at all, and the HTTP server will most likely send a cryptic
 488 error to the client.
 489
 490 Assuming your script has no syntax errors, yet it does not work, you
 491 have no choice but to read the next section.
 492
 493
 494 \subsection{Debugging CGI scripts} \indexii{CGI}{debugging}
 495
 496 First of all, check for trivial installation errors --- reading the
 497 section above on installing your CGI script carefully can save you a
 498 lot of time.  If you wonder whether you have understood the
 499 installation procedure correctly, try installing a copy of this module
 500 file (\file{cgi.py}) as a CGI script.  When invoked as a script, the file
 501 will dump its environment and the contents of the form in HTML form.
 502 Give it the right mode etc, and send it a request.  If it's installed
 503 in the standard \file{cgi-bin} directory, it should be possible to send it a
 504 request by entering a URL into your browser of the form:
 505
 506 \begin{verbatim}
 507 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
 508 \end{verbatim}
 509
 510 If this gives an error of type 404, the server cannot find the script
 511 -- perhaps you need to install it in a different directory.  If it
 512 gives another error, there's an installation problem that
 513 you should fix before trying to go any further.  If you get a nicely
 514 formatted listing of the environment and form content (in this
 515 example, the fields should be listed as ``addr'' with value ``At Home''
 516 and ``name'' with value ``Joe Blow''), the \file{cgi.py} script has been
 517 installed correctly.  If you follow the same procedure for your own
 518 script, you should now be able to debug it.
 519
 520 The next step could be to call the \module{cgi} module's
 521 \function{test()} function from your script: replace its main code
 522 with the single statement
 523
 524 \begin{verbatim}
 525 cgi.test()
 526 \end{verbatim}
 527
 528 This should produce the same results as those gotten from installing
 529 the \file{cgi.py} file itself.
 530
 531 When an ordinary Python script raises an unhandled exception (for
 532 whatever reason: of a typo in a module name, a file that can't be
 533 opened, etc.), the Python interpreter prints a nice traceback and
 534 exits.  While the Python interpreter will still do this when your CGI
 535 script raises an exception, most likely the traceback will end up in
 536 one of the HTTP server's log files, or be discarded altogether.
 537
 538 Fortunately, once you have managed to get your script to execute
 539 \emph{some} code, you can easily send tracebacks to the Web browser
 540 using the \refmodule{cgitb} module.  If you haven't done so already,
 541 just add the line:
 542
 543 \begin{verbatim}
 544 import cgitb; cgitb.enable()
 545 \end{verbatim}
 546
 547 to the top of your script.  Then try running it again; when a
 548 problem occurs, you should see a detailed report that will
 549 likely make apparent the cause of the crash.
 550
 551 If you suspect that there may be a problem in importing the
 552 \refmodule{cgitb} module, you can use an even more robust approach
 553 (which only uses built-in modules):
 554
 555 \begin{verbatim}
 556 import sys
 557 sys.stderr = sys.stdout
 558 print "Content-Type: text/plain"
 559 print
 560 ...your code here...
 561 \end{verbatim}
 562
 563 This relies on the Python interpreter to print the traceback.  The
 564 content type of the output is set to plain text, which disables all
 565 HTML processing.  If your script works, the raw HTML will be displayed
 566 by your client.  If it raises an exception, most likely after the
 567 first two lines have been printed, a traceback will be displayed.
 568 Because no HTML interpretation is going on, the traceback will be
 569 readable.
 570
 571
 572 \subsection{Common problems and solutions}
 573
 574 \begin{itemize}
 575 \item Most HTTP servers buffer the output from CGI scripts until the
 576 script is completed.  This means that it is not possible to display a
 577 progress report on the client's display while the script is running.
 578
 579 \item Check the installation instructions above.
 580
 581 \item Check the HTTP server's log files.  (\samp{tail -f logfile} in a
 582 separate window may be useful!)
 583
 584 \item Always check a script for syntax errors first, by doing something
 585 like \samp{python script.py}.
 586
 587 \item If your script does not have any syntax errors, try adding
 588 \samp{import cgitb; cgitb.enable()} to the top of the script.
 589
 590 \item When invoking external programs, make sure they can be found.
 591 Usually, this means using absolute path names --- \envvar{PATH} is
 592 usually not set to a very useful value in a CGI script.
 593
 594 \item When reading or writing external files, make sure they can be read
 595 or written by the userid under which your CGI script will be running:
 596 this is typically the userid under which the web server is running, or some
 597 explicitly specified userid for a web server's \samp{suexec} feature.
 598
 599 \item Don't try to give a CGI script a set-uid mode.  This doesn't work on
 600 most systems, and is a security liability as well.
 601 \end{itemize}
 602