1 :mod:`urllib2` --- extensible library for opening URLs
2 ======================================================
5 :synopsis: Next generation URL opening library.
6 .. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
7 .. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
10 The :mod:`urllib2` module defines functions and classes which help in opening
11 URLs (mostly HTTP) in a complex world --- basic and digest authentication,
12 redirections, cookies and more.
14 The :mod:`urllib2` module defines the following functions:
17 .. function:: urlopen(url[, data][, timeout])
19 Open the URL *url*, which can be either a string or a :class:`Request` object.
21 *data* may be a string specifying additional data to send to the server, or
22 ``None`` if no such data is needed. Currently HTTP requests are the only ones
23 that use *data*; the HTTP request will be a POST instead of a GET when the
24 *data* parameter is provided. *data* should be a buffer in the standard
25 :mimetype:`application/x-www-form-urlencoded` format. The
26 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
27 returns a string in this format.
29 The optional *timeout* parameter specifies a timeout in seconds for the
30 connection attempt (if not specified, or passed as None, the global default
31 timeout setting will be used). This actually only work for HTTP, HTTPS, FTP and
34 This function returns a file-like object with two additional methods:
36 * :meth:`geturl` --- return the URL of the resource retrieved
38 * :meth:`info` --- return the meta-information of the page, as a dictionary-like
41 Raises :exc:`URLError` on errors.
43 Note that ``None`` may be returned if no handler handles the request (though the
44 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to
45 ensure this never happens).
47 .. versionchanged:: 2.6
51 .. function:: install_opener(opener)
53 Install an :class:`OpenerDirector` instance as the default global opener.
54 Installing an opener is only necessary if you want urlopen to use that opener;
55 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
56 The code does not check for a real :class:`OpenerDirector`, and any class with
57 the appropriate interface will work.
60 .. function:: build_opener([handler, ...])
62 Return an :class:`OpenerDirector` instance, which chains the handlers in the
63 order given. *handler*\s can be either instances of :class:`BaseHandler`, or
64 subclasses of :class:`BaseHandler` (in which case it must be possible to call
65 the constructor without any parameters). Instances of the following classes
66 will be in front of the *handler*\s, unless the *handler*\s contain them,
67 instances of them or subclasses of them: :class:`ProxyHandler`,
68 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`,
69 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`,
70 :class:`HTTPErrorProcessor`.
72 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported),
73 :class:`HTTPSHandler` will also be added.
75 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its
76 :attr:`handler_order` member variable to modify its position in the handlers
79 The following exceptions are raised as appropriate:
82 .. exception:: URLError
84 The handlers raise this exception (or derived exceptions) when they run into a
85 problem. It is a subclass of :exc:`IOError`.
88 .. exception:: HTTPError
90 A subclass of :exc:`URLError`, it can also function as a non-exceptional
91 file-like return value (the same thing that :func:`urlopen` returns). This
92 is useful when handling exotic HTTP errors, such as requests for
95 The following classes are provided:
98 .. class:: Request(url[, data][, headers] [, origin_req_host][, unverifiable])
100 This class is an abstraction of a URL request.
102 *url* should be a string containing a valid URL.
104 *data* may be a string specifying additional data to send to the server, or
105 ``None`` if no such data is needed. Currently HTTP requests are the only ones
106 that use *data*; the HTTP request will be a POST instead of a GET when the
107 *data* parameter is provided. *data* should be a buffer in the standard
108 :mimetype:`application/x-www-form-urlencoded` format. The
109 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and
110 returns a string in this format.
112 *headers* should be a dictionary, and will be treated as if :meth:`add_header`
113 was called with each key and value as arguments.
115 The final two arguments are only of interest for correct handling of third-party
118 *origin_req_host* should be the request-host of the origin transaction, as
119 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This
120 is the host name or IP address of the original request that was initiated by the
121 user. For example, if the request is for an image in an HTML document, this
122 should be the request-host of the request for the page containing the image.
124 *unverifiable* should indicate whether the request is unverifiable, as defined
125 by RFC 2965. It defaults to False. An unverifiable request is one whose URL
126 the user did not have the option to approve. For example, if the request is for
127 an image in an HTML document, and the user had no option to approve the
128 automatic fetching of the image, this should be true.
131 .. class:: OpenerDirector()
133 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained
134 together. It manages the chaining of handlers, and recovery from errors.
137 .. class:: BaseHandler()
139 This is the base class for all registered handlers --- and handles only the
140 simple mechanics of registration.
143 .. class:: HTTPDefaultErrorHandler()
145 A class which defines a default handler for HTTP error responses; all responses
146 are turned into :exc:`HTTPError` exceptions.
149 .. class:: HTTPRedirectHandler()
151 A class to handle redirections.
154 .. class:: HTTPCookieProcessor([cookiejar])
156 A class to handle HTTP Cookies.
159 .. class:: ProxyHandler([proxies])
161 Cause requests to go through a proxy. If *proxies* is given, it must be a
162 dictionary mapping protocol names to URLs of proxies. The default is to read the
163 list of proxies from the environment variables :envvar:`<protocol>_proxy`.
166 .. class:: HTTPPasswordMgr()
168 Keep a database of ``(realm, uri) -> (user, password)`` mappings.
171 .. class:: HTTPPasswordMgrWithDefaultRealm()
173 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of
174 ``None`` is considered a catch-all realm, which is searched if no other realm
178 .. class:: AbstractBasicAuthHandler([password_mgr])
180 This is a mixin class that helps with HTTP authentication, both to the remote
181 host and to a proxy. *password_mgr*, if given, should be something that is
182 compatible with :class:`HTTPPasswordMgr`; refer to section
183 :ref:`http-password-mgr` for information on the interface that must be
187 .. class:: HTTPBasicAuthHandler([password_mgr])
189 Handle authentication with the remote host. *password_mgr*, if given, should be
190 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
191 :ref:`http-password-mgr` for information on the interface that must be
195 .. class:: ProxyBasicAuthHandler([password_mgr])
197 Handle authentication with the proxy. *password_mgr*, if given, should be
198 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
199 :ref:`http-password-mgr` for information on the interface that must be
203 .. class:: AbstractDigestAuthHandler([password_mgr])
205 This is a mixin class that helps with HTTP authentication, both to the remote
206 host and to a proxy. *password_mgr*, if given, should be something that is
207 compatible with :class:`HTTPPasswordMgr`; refer to section
208 :ref:`http-password-mgr` for information on the interface that must be
212 .. class:: HTTPDigestAuthHandler([password_mgr])
214 Handle authentication with the remote host. *password_mgr*, if given, should be
215 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
216 :ref:`http-password-mgr` for information on the interface that must be
220 .. class:: ProxyDigestAuthHandler([password_mgr])
222 Handle authentication with the proxy. *password_mgr*, if given, should be
223 something that is compatible with :class:`HTTPPasswordMgr`; refer to section
224 :ref:`http-password-mgr` for information on the interface that must be
228 .. class:: HTTPHandler()
230 A class to handle opening of HTTP URLs.
233 .. class:: HTTPSHandler()
235 A class to handle opening of HTTPS URLs.
238 .. class:: FileHandler()
243 .. class:: FTPHandler()
248 .. class:: CacheFTPHandler()
250 Open FTP URLs, keeping a cache of open FTP connections to minimize delays.
253 .. class:: UnknownHandler()
255 A catch-all class to handle unknown URLs.
263 The following methods describe all of :class:`Request`'s public interface, and
264 so all must be overridden in subclasses.
267 .. method:: Request.add_data(data)
269 Set the :class:`Request` data to *data*. This is ignored by all handlers except
270 HTTP handlers --- and there it should be a byte string, and will change the
271 request to be ``POST`` rather than ``GET``.
274 .. method:: Request.get_method()
276 Return a string indicating the HTTP request method. This is only meaningful for
277 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``.
280 .. method:: Request.has_data()
282 Return whether the instance has a non-\ ``None`` data.
285 .. method:: Request.get_data()
287 Return the instance's data.
290 .. method:: Request.add_header(key, val)
292 Add another header to the request. Headers are currently ignored by all
293 handlers except HTTP handlers, where they are added to the list of headers sent
294 to the server. Note that there cannot be more than one header with the same
295 name, and later calls will overwrite previous calls in case the *key* collides.
296 Currently, this is no loss of HTTP functionality, since all headers which have
297 meaning when used more than once have a (header-specific) way of gaining the
298 same functionality using only one header.
301 .. method:: Request.add_unredirected_header(key, header)
303 Add a header that will not be added to a redirected request.
305 .. versionadded:: 2.4
308 .. method:: Request.has_header(header)
310 Return whether the instance has the named header (checks both regular and
313 .. versionadded:: 2.4
316 .. method:: Request.get_full_url()
318 Return the URL given in the constructor.
321 .. method:: Request.get_type()
323 Return the type of the URL --- also known as the scheme.
326 .. method:: Request.get_host()
328 Return the host to which a connection will be made.
331 .. method:: Request.get_selector()
333 Return the selector --- the part of the URL that is sent to the server.
336 .. method:: Request.set_proxy(host, type)
338 Prepare the request by connecting to a proxy server. The *host* and *type* will
339 replace those of the instance, and the instance's selector will be the original
340 URL given in the constructor.
343 .. method:: Request.get_origin_req_host()
345 Return the request-host of the origin transaction, as defined by :rfc:`2965`.
346 See the documentation for the :class:`Request` constructor.
349 .. method:: Request.is_unverifiable()
351 Return whether the request is unverifiable, as defined by RFC 2965. See the
352 documentation for the :class:`Request` constructor.
355 .. _opener-director-objects:
357 OpenerDirector Objects
358 ----------------------
360 :class:`OpenerDirector` instances have the following methods:
363 .. method:: OpenerDirector.add_handler(handler)
365 *handler* should be an instance of :class:`BaseHandler`. The following methods
366 are searched, and added to the possible chains (note that HTTP errors are a
369 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol*
372 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP
373 errors with HTTP error code *type*.
375 * :meth:`protocol_error` --- signal that the handler knows how to handle errors
376 from (non-\ ``http``) *protocol*.
378 * :meth:`protocol_request` --- signal that the handler knows how to pre-process
381 * :meth:`protocol_response` --- signal that the handler knows how to
382 post-process *protocol* responses.
385 .. method:: OpenerDirector.open(url[, data][, timeout])
387 Open the given *url* (which can be a request object or a string), optionally
388 passing the given *data*. Arguments, return values and exceptions raised are the
389 same as those of :func:`urlopen` (which simply calls the :meth:`open` method on
390 the currently installed global :class:`OpenerDirector`). The optional *timeout*
391 parameter specifies a timeout in seconds for the connection attempt (if not
392 specified, or passed as None, the global default timeout setting will be used;
393 this actually only work for HTTP, HTTPS, FTP and FTPS connections).
395 .. versionchanged:: 2.6
399 .. method:: OpenerDirector.error(proto[, arg[, ...]])
401 Handle an error of the given protocol. This will call the registered error
402 handlers for the given protocol with the given arguments (which are protocol
403 specific). The HTTP protocol is a special case which uses the HTTP response
404 code to determine the specific error handler; refer to the :meth:`http_error_\*`
405 methods of the handler classes.
407 Return values and exceptions raised are the same as those of :func:`urlopen`.
409 OpenerDirector objects open URLs in three stages:
411 The order in which these methods are called within each stage is determined by
412 sorting the handler instances.
414 #. Every handler with a method named like :meth:`protocol_request` has that
415 method called to pre-process the request.
417 #. Handlers with a method named like :meth:`protocol_open` are called to handle
418 the request. This stage ends when a handler either returns a non-\ :const:`None`
419 value (ie. a response), or raises an exception (usually :exc:`URLError`).
420 Exceptions are allowed to propagate.
422 In fact, the above algorithm is first tried for methods named
423 :meth:`default_open`. If all such methods return :const:`None`, the algorithm
424 is repeated for methods named like :meth:`protocol_open`. If all such methods
425 return :const:`None`, the algorithm is repeated for methods named
426 :meth:`unknown_open`.
428 Note that the implementation of these methods may involve calls of the parent
429 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods.
431 #. Every handler with a method named like :meth:`protocol_response` has that
432 method called to post-process the response.
435 .. _base-handler-objects:
440 :class:`BaseHandler` objects provide a couple of methods that are directly
441 useful, and others that are meant to be used by derived classes. These are
442 intended for direct use:
445 .. method:: BaseHandler.add_parent(director)
447 Add a director as parent.
450 .. method:: BaseHandler.close()
454 The following members and methods should only be used by classes derived from
455 :class:`BaseHandler`.
459 The convention has been adopted that subclasses defining
460 :meth:`protocol_request` or :meth:`protocol_response` methods are named
461 :class:`\*Processor`; all others are named :class:`\*Handler`.
464 .. attribute:: BaseHandler.parent
466 A valid :class:`OpenerDirector`, which can be used to open using a different
467 protocol, or handle errors.
470 .. method:: BaseHandler.default_open(req)
472 This method is *not* defined in :class:`BaseHandler`, but subclasses should
473 define it if they want to catch all URLs.
475 This method, if implemented, will be called by the parent
476 :class:`OpenerDirector`. It should return a file-like object as described in
477 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``.
478 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for
479 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`).
481 This method will be called before any protocol-specific open method.
484 .. method:: BaseHandler.protocol_open(req)
487 This method is *not* defined in :class:`BaseHandler`, but subclasses should
488 define it if they want to handle URLs with the given protocol.
490 This method, if defined, will be called by the parent :class:`OpenerDirector`.
491 Return values should be the same as for :meth:`default_open`.
494 .. method:: BaseHandler.unknown_open(req)
496 This method is *not* defined in :class:`BaseHandler`, but subclasses should
497 define it if they want to catch all URLs with no specific registered handler to
500 This method, if implemented, will be called by the :attr:`parent`
501 :class:`OpenerDirector`. Return values should be the same as for
502 :meth:`default_open`.
505 .. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs)
507 This method is *not* defined in :class:`BaseHandler`, but subclasses should
508 override it if they intend to provide a catch-all for otherwise unhandled HTTP
509 errors. It will be called automatically by the :class:`OpenerDirector` getting
510 the error, and should not normally be called in other circumstances.
512 *req* will be a :class:`Request` object, *fp* will be a file-like object with
513 the HTTP error body, *code* will be the three-digit code of the error, *msg*
514 will be the user-visible explanation of the code and *hdrs* will be a mapping
515 object with the headers of the error.
517 Return values and exceptions raised should be the same as those of
521 .. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs)
523 *nnn* should be a three-digit HTTP error code. This method is also not defined
524 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a
525 subclass, when an HTTP error with code *nnn* occurs.
527 Subclasses should override this method to handle specific HTTP errors.
529 Arguments, return values and exceptions raised should be the same as for
530 :meth:`http_error_default`.
533 .. method:: BaseHandler.protocol_request(req)
536 This method is *not* defined in :class:`BaseHandler`, but subclasses should
537 define it if they want to pre-process requests of the given protocol.
539 This method, if defined, will be called by the parent :class:`OpenerDirector`.
540 *req* will be a :class:`Request` object. The return value should be a
541 :class:`Request` object.
544 .. method:: BaseHandler.protocol_response(req, response)
547 This method is *not* defined in :class:`BaseHandler`, but subclasses should
548 define it if they want to post-process responses of the given protocol.
550 This method, if defined, will be called by the parent :class:`OpenerDirector`.
551 *req* will be a :class:`Request` object. *response* will be an object
552 implementing the same interface as the return value of :func:`urlopen`. The
553 return value should implement the same interface as the return value of
557 .. _http-redirect-handler:
559 HTTPRedirectHandler Objects
560 ---------------------------
564 Some HTTP redirections require action from this module's client code. If this
565 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the
566 precise meanings of the various redirection codes.
569 .. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs)
571 Return a :class:`Request` or ``None`` in response to a redirect. This is called
572 by the default implementations of the :meth:`http_error_30\*` methods when a
573 redirection is received from the server. If a redirection should take place,
574 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the
575 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to
576 handle this URL, or return ``None`` if you can't but another handler might.
580 The default implementation of this method does not strictly follow :rfc:`2616`,
581 which says that 301 and 302 responses to ``POST`` requests must not be
582 automatically redirected without confirmation by the user. In reality, browsers
583 do allow automatic redirection of these responses, changing the POST to a
584 ``GET``, and the default implementation reproduces this behavior.
587 .. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)
589 Redirect to the ``Location:`` URL. This method is called by the parent
590 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response.
593 .. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)
595 The same as :meth:`http_error_301`, but called for the 'found' response.
598 .. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)
600 The same as :meth:`http_error_301`, but called for the 'see other' response.
603 .. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)
605 The same as :meth:`http_error_301`, but called for the 'temporary redirect'
609 .. _http-cookie-processor:
611 HTTPCookieProcessor Objects
612 ---------------------------
614 .. versionadded:: 2.4
616 :class:`HTTPCookieProcessor` instances have one attribute:
619 .. attribute:: HTTPCookieProcessor.cookiejar
621 The :class:`cookielib.CookieJar` in which cookies are stored.
630 .. method:: ProxyHandler.protocol_open(request)
633 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every
634 *protocol* which has a proxy in the *proxies* dictionary given in the
635 constructor. The method will modify requests to go through the proxy, by
636 calling ``request.set_proxy()``, and call the next handler in the chain to
637 actually execute the protocol.
640 .. _http-password-mgr:
642 HTTPPasswordMgr Objects
643 -----------------------
645 These methods are available on :class:`HTTPPasswordMgr` and
646 :class:`HTTPPasswordMgrWithDefaultRealm` objects.
649 .. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd)
651 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and
652 *passwd* must be strings. This causes ``(user, passwd)`` to be used as
653 authentication tokens when authentication for *realm* and a super-URI of any of
654 the given URIs is given.
657 .. method:: HTTPPasswordMgr.find_user_password(realm, authuri)
659 Get user/password for given realm and URI, if any. This method will return
660 ``(None, None)`` if there is no matching user/password.
662 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be
663 searched if the given *realm* has no matching user/password.
666 .. _abstract-basic-auth-handler:
668 AbstractBasicAuthHandler Objects
669 --------------------------------
672 .. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
674 Handle an authentication request by getting a user/password pair, and re-trying
675 the request. *authreq* should be the name of the header where the information
676 about the realm is included in the request, *host* specifies the URL and path to
677 authenticate for, *req* should be the (failed) :class:`Request` object, and
678 *headers* should be the error headers.
680 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an
681 authority component (e.g. ``"http://python.org/"``). In either case, the
682 authority must not contain a userinfo component (so, ``"python.org"`` and
683 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not).
686 .. _http-basic-auth-handler:
688 HTTPBasicAuthHandler Objects
689 ----------------------------
692 .. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)
694 Retry the request with authentication information, if available.
697 .. _proxy-basic-auth-handler:
699 ProxyBasicAuthHandler Objects
700 -----------------------------
703 .. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)
705 Retry the request with authentication information, if available.
708 .. _abstract-digest-auth-handler:
710 AbstractDigestAuthHandler Objects
711 ---------------------------------
714 .. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)
716 *authreq* should be the name of the header where the information about the realm
717 is included in the request, *host* should be the host to authenticate to, *req*
718 should be the (failed) :class:`Request` object, and *headers* should be the
722 .. _http-digest-auth-handler:
724 HTTPDigestAuthHandler Objects
725 -----------------------------
728 .. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)
730 Retry the request with authentication information, if available.
733 .. _proxy-digest-auth-handler:
735 ProxyDigestAuthHandler Objects
736 ------------------------------
739 .. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)
741 Retry the request with authentication information, if available.
744 .. _http-handler-objects:
750 .. method:: HTTPHandler.http_open(req)
752 Send an HTTP request, which can be either GET or POST, depending on
756 .. _https-handler-objects:
762 .. method:: HTTPSHandler.https_open(req)
764 Send an HTTPS request, which can be either GET or POST, depending on
768 .. _file-handler-objects:
774 .. method:: FileHandler.file_open(req)
776 Open the file locally, if there is no host name, or the host name is
777 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it
778 using :attr:`parent`.
781 .. _ftp-handler-objects:
787 .. method:: FTPHandler.ftp_open(req)
789 Open the FTP file indicated by *req*. The login is always done with empty
790 username and password.
793 .. _cacheftp-handler-objects:
795 CacheFTPHandler Objects
796 -----------------------
798 :class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the
799 following additional methods:
802 .. method:: CacheFTPHandler.setTimeout(t)
804 Set timeout of connections to *t* seconds.
807 .. method:: CacheFTPHandler.setMaxConns(m)
809 Set maximum number of cached connections to *m*.
812 .. _unknown-handler-objects:
814 UnknownHandler Objects
815 ----------------------
818 .. method:: UnknownHandler.unknown_open()
820 Raise a :exc:`URLError` exception.
823 .. _http-error-processor-objects:
825 HTTPErrorProcessor Objects
826 --------------------------
828 .. versionadded:: 2.4
831 .. method:: HTTPErrorProcessor.unknown_open()
833 Process HTTP error responses.
835 For 200 error codes, the response object is returned immediately.
837 For non-200 error codes, this simply passes the job on to the
838 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
839 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
840 :exc:`HTTPError` if no other handler handles the error.
843 .. _urllib2-examples:
848 This example gets the python.org main page and displays the first 100 bytes of
852 >>> f = urllib2.urlopen('http://www.python.org/')
853 >>> print f.read(100)
854 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
855 <?xml-stylesheet href="./css/ht2html
857 Here we are sending a data-stream to the stdin of a CGI and reading the data it
858 returns to us. Note that this example will only work when the Python
859 installation supports SSL. ::
862 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
863 ... data='This data is passed to stdin of the CGI')
864 >>> f = urllib2.urlopen(req)
866 Got Data: "This data is passed to stdin of the CGI"
868 The code for the sample CGI used in the above example is::
870 #!/usr/bin/env python
872 data = sys.stdin.read()
873 print 'Content-type: text-plain\n\nGot Data: "%s"' % data
875 Use of Basic HTTP Authentication::
878 # Create an OpenerDirector with support for Basic HTTP Authentication...
879 auth_handler = urllib2.HTTPBasicAuthHandler()
880 auth_handler.add_password(realm='PDQ Application',
881 uri='https://mahler:8092/site-updates.py',
883 passwd='kadidd!ehopper')
884 opener = urllib2.build_opener(auth_handler)
885 # ...and install it globally so it can be used with urlopen.
886 urllib2.install_opener(opener)
887 urllib2.urlopen('http://www.example.com/login.html')
889 :func:`build_opener` provides many handlers by default, including a
890 :class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment
891 variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme
892 involved. For example, the :envvar:`http_proxy` environment variable is read to
893 obtain the HTTP proxy's URL.
895 This example replaces the default :class:`ProxyHandler` with one that uses
896 programatically-supplied proxy URLs, and adds proxy authorization support with
897 :class:`ProxyBasicAuthHandler`. ::
899 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
900 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
901 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
903 opener = build_opener(proxy_handler, proxy_auth_handler)
904 # This time, rather than install the OpenerDirector, we use it directly:
905 opener.open('http://www.example.com/login.html')
909 Use the *headers* argument to the :class:`Request` constructor, or::
912 req = urllib2.Request('http://www.example.com/')
913 req.add_header('Referer', 'http://www.python.org/')
914 r = urllib2.urlopen(req)
916 :class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to
917 every :class:`Request`. To change this::
920 opener = urllib2.build_opener()
921 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
922 opener.open('http://www.example.com/')
924 Also, remember that a few standard headers (:mailheader:`Content-Length`,
925 :mailheader:`Content-Type` and :mailheader:`Host`) are added when the
926 :class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).