1 \section{\module{urllib2
} ---
2 extensible library for opening URLs
}
4 \declaremodule{standard
}{urllib2
}
5 \moduleauthor{Jeremy Hylton
}{jhylton@users.sourceforge.net
}
6 \sectionauthor{Moshe Zadka
}{moshez@users.sourceforge.net
}
8 \modulesynopsis{An extensible library for opening URLs using a variety of
11 The
\module{urllib2
} module defines functions and classes which help
12 in opening URLs (mostly HTTP) in a complex world --- basic and digest
13 authentication, redirections, cookies and more.
15 The
\module{urllib2
} module defines the following functions:
17 \begin{funcdesc
}{urlopen
}{url
\optional{, data
}}
18 Open the URL
\var{url
}, which can be either a string or a
\class{Request
}
21 \var{data
} may be a string specifying additional data to send to the
22 server, or
\code{None
} if no such data is needed.
23 Currently HTTP requests are the only ones that use
\var{data
};
24 the HTTP request will be a POST instead of a GET when the
\var{data
}
25 parameter is provided.
\var{data
} should be a buffer in the standard
26 \mimetype{application/x-www-form-urlencoded
} format. The
27 \function{urllib.urlencode()
} function takes a mapping or sequence of
28 2-tuples and returns a string in this format.
30 This function returns a file-like object with two additional methods:
33 \item \method{geturl()
} --- return the URL of the resource retrieved
34 \item \method{info()
} --- return the meta-information of the page, as
35 a dictionary-like object
38 Raises
\exception{URLError
} on errors.
40 Note that
\code{None
} may be returned if no handler handles the
41 request (though the default installed global
\class{OpenerDirector
}
42 uses
\class{UnknownHandler
} to ensure this never happens).
45 \begin{funcdesc
}{install_opener
}{opener
}
46 Install an
\class{OpenerDirector
} instance as the default global
47 opener. Installing an opener is only necessary if you want urlopen to
48 use that opener; otherwise, simply call
\method{OpenerDirector.open()
}
49 instead of
\function{urlopen()
}. The code does not check for a real
50 \class{OpenerDirector
}, and any class with the appropriate interface
54 \begin{funcdesc
}{build_opener
}{\optional{handler,
\moreargs}}
55 Return an
\class{OpenerDirector
} instance, which chains the
56 handlers in the order given.
\var{handler
}s can be either instances
57 of
\class{BaseHandler
}, or subclasses of
\class{BaseHandler
} (in
58 which case it must be possible to call the constructor without
59 any parameters). Instances of the following classes will be in
60 front of the
\var{handler
}s, unless the
\var{handler
}s contain
61 them, instances of them or subclasses of them:
62 \class{ProxyHandler
},
\class{UnknownHandler
},
\class{HTTPHandler
},
63 \class{HTTPDefaultErrorHandler
},
\class{HTTPRedirectHandler
},
64 \class{FTPHandler
},
\class{FileHandler
},
\class{HTTPErrorProcessor
}.
66 If the Python installation has SSL support (
\function{socket.ssl()
}
67 exists),
\class{HTTPSHandler
} will also be added.
69 Beginning in Python
2.3, a
\class{BaseHandler
} subclass may also
70 change its
\member{handler_order
} member variable to modify its
71 position in the handlers list.
75 The following exceptions are raised as appropriate:
77 \begin{excdesc
}{URLError
}
78 The handlers raise this exception (or derived exceptions) when they
79 run into a problem. It is a subclass of
\exception{IOError
}.
82 \begin{excdesc
}{HTTPError
}
83 A subclass of
\exception{URLError
}, it can also function as a
84 non-exceptional file-like return value (the same thing that
85 \function{urlopen()
} returns). This is useful when handling exotic
86 HTTP errors, such as requests for authentication.
89 \begin{excdesc
}{GopherError
}
90 A subclass of
\exception{URLError
}, this is the error raised by the
95 The following classes are provided:
97 \begin{classdesc
}{Request
}{url
\optional{, data
}\optional{, headers
}
98 \optional{, origin_req_host
}\optional{, unverifiable
}}
99 This class is an abstraction of a URL request.
101 \var{url
} should be a string containing a valid URL.
103 \var{data
} may be a string specifying additional data to send to the
104 server, or
\code{None
} if no such data is needed.
105 Currently HTTP requests are the only ones that use
\var{data
};
106 the HTTP request will be a POST instead of a GET when the
\var{data
}
107 parameter is provided.
\var{data
} should be a buffer in the standard
108 \mimetype{application/x-www-form-urlencoded
} format. The
109 \function{urllib.urlencode()
} function takes a mapping or sequence of
110 2-tuples and returns a string in this format.
112 \var{headers
} should be a dictionary, and will be treated as if
113 \method{add_header()
} was called with each key and value as arguments.
115 The final two arguments are only of interest for correct handling of
116 third-party HTTP cookies:
118 \var{origin_req_host
} should be the request-host of the origin
119 transaction, as defined by
\rfc{2965}. It defaults to
120 \code{cookielib.request_host(self)
}. This is the host name or IP
121 address of the original request that was initiated by the user. For
122 example, if the request is for an image in an HTML
document, this
123 should be the request-host of the request for the page containing the
126 \var{unverifiable
} should indicate whether the request is
127 unverifiable, as defined by RFC
2965. It defaults to False. An
128 unverifiable request is one whose URL the user did not have the option
129 to approve. For example, if the request is for an image in an HTML
130 document, and the user had no option to approve the automatic fetching
131 of the image, this should be true.
134 \begin{classdesc
}{OpenerDirector
}{}
135 The
\class{OpenerDirector
} class opens URLs via
\class{BaseHandler
}s
136 chained together. It manages the chaining of handlers, and recovery
140 \begin{classdesc
}{BaseHandler
}{}
141 This is the base class for all registered handlers --- and handles only
142 the simple mechanics of registration.
145 \begin{classdesc
}{HTTPDefaultErrorHandler
}{}
146 A class which defines a default handler for HTTP error responses; all
147 responses are turned into
\exception{HTTPError
} exceptions.
150 \begin{classdesc
}{HTTPRedirectHandler
}{}
151 A class to handle redirections.
154 \begin{classdesc
}{HTTPCookieProcessor
}{\optional{cookiejar
}}
155 A class to handle HTTP Cookies.
158 \begin{classdesc
}{ProxyHandler
}{\optional{proxies
}}
159 Cause requests to go through a proxy.
160 If
\var{proxies
} is given, it must be a dictionary mapping
161 protocol names to URLs of proxies.
162 The default is to read the list of proxies from the environment
163 variables
\envvar{<protocol>_proxy
}.
166 \begin{classdesc
}{HTTPPasswordMgr
}{}
168 \code{(
\var{realm
},
\var{uri
}) -> (
\var{user
},
\var{password
})
}
172 \begin{classdesc
}{HTTPPasswordMgrWithDefaultRealm
}{}
174 \code{(
\var{realm
},
\var{uri
}) -> (
\var{user
},
\var{password
})
} mappings.
175 A realm of
\code{None
} is considered a catch-all realm, which is searched
176 if no other realm fits.
179 \begin{classdesc
}{AbstractBasicAuthHandler
}{\optional{password_mgr
}}
180 This is a mixin class that helps with HTTP authentication, both
181 to the remote host and to a proxy.
182 \var{password_mgr
}, if given, should be something that is compatible
183 with
\class{HTTPPasswordMgr
}; refer to section~
\ref{http-password-mgr
}
184 for information on the interface that must be supported.
187 \begin{classdesc
}{HTTPBasicAuthHandler
}{\optional{password_mgr
}}
188 Handle authentication with the remote host.
189 \var{password_mgr
}, if given, should be something that is compatible
190 with
\class{HTTPPasswordMgr
}; refer to section~
\ref{http-password-mgr
}
191 for information on the interface that must be supported.
194 \begin{classdesc
}{ProxyBasicAuthHandler
}{\optional{password_mgr
}}
195 Handle authentication with the proxy.
196 \var{password_mgr
}, if given, should be something that is compatible
197 with
\class{HTTPPasswordMgr
}; refer to section~
\ref{http-password-mgr
}
198 for information on the interface that must be supported.
201 \begin{classdesc
}{AbstractDigestAuthHandler
}{\optional{password_mgr
}}
202 This is a mixin class that helps with HTTP authentication, both
203 to the remote host and to a proxy.
204 \var{password_mgr
}, if given, should be something that is compatible
205 with
\class{HTTPPasswordMgr
}; refer to section~
\ref{http-password-mgr
}
206 for information on the interface that must be supported.
209 \begin{classdesc
}{HTTPDigestAuthHandler
}{\optional{password_mgr
}}
210 Handle authentication with the remote host.
211 \var{password_mgr
}, if given, should be something that is compatible
212 with
\class{HTTPPasswordMgr
}; refer to section~
\ref{http-password-mgr
}
213 for information on the interface that must be supported.
216 \begin{classdesc
}{ProxyDigestAuthHandler
}{\optional{password_mgr
}}
217 Handle authentication with the proxy.
218 \var{password_mgr
}, if given, should be something that is compatible
219 with
\class{HTTPPasswordMgr
}; refer to section~
\ref{http-password-mgr
}
220 for information on the interface that must be supported.
223 \begin{classdesc
}{HTTPHandler
}{}
224 A class to handle opening of HTTP URLs.
227 \begin{classdesc
}{HTTPSHandler
}{}
228 A class to handle opening of HTTPS URLs.
231 \begin{classdesc
}{FileHandler
}{}
235 \begin{classdesc
}{FTPHandler
}{}
239 \begin{classdesc
}{CacheFTPHandler
}{}
240 Open FTP URLs, keeping a cache of open FTP connections to minimize
244 \begin{classdesc
}{GopherHandler
}{}
248 \begin{classdesc
}{UnknownHandler
}{}
249 A catch-all class to handle unknown URLs.
253 \subsection{Request Objects
\label{request-objects
}}
255 The following methods describe all of
\class{Request
}'s public interface,
256 and so all must be overridden in subclasses.
258 \begin{methoddesc
}[Request
]{add_data
}{data
}
259 Set the
\class{Request
} data to
\var{data
}. This is ignored by all
260 handlers except HTTP handlers --- and there it should be a byte
261 string, and will change the request to be
\code{POST
} rather than
265 \begin{methoddesc
}[Request
]{get_method
}{}
266 Return a string indicating the HTTP request method. This is only
267 meaningful for HTTP requests, and currently always returns
268 \code{'GET'
} or
\code{'POST'
}.
271 \begin{methoddesc
}[Request
]{has_data
}{}
272 Return whether the instance has a non-
\code{None
} data.
275 \begin{methoddesc
}[Request
]{get_data
}{}
276 Return the instance's data.
279 \begin{methoddesc
}[Request
]{add_header
}{key, val
}
280 Add another header to the request. Headers are currently ignored by
281 all handlers except HTTP handlers, where they are added to the list
282 of headers sent to the server. Note that there cannot be more than
283 one header with the same name, and later calls will overwrite
284 previous calls in case the
\var{key
} collides. Currently, this is
285 no loss of HTTP functionality, since all headers which have meaning
286 when used more than once have a (header-specific) way of gaining the
287 same functionality using only one header.
290 \begin{methoddesc
}[Request
]{add_unredirected_header
}{key, header
}
291 Add a header that will not be added to a redirected request.
295 \begin{methoddesc
}[Request
]{has_header
}{header
}
296 Return whether the instance has the named header (checks both regular
301 \begin{methoddesc
}[Request
]{get_full_url
}{}
302 Return the URL given in the constructor.
305 \begin{methoddesc
}[Request
]{get_type
}{}
306 Return the type of the URL --- also known as the scheme.
309 \begin{methoddesc
}[Request
]{get_host
}{}
310 Return the host to which a connection will be made.
313 \begin{methoddesc
}[Request
]{get_selector
}{}
314 Return the selector --- the part of the URL that is sent to
318 \begin{methoddesc
}[Request
]{set_proxy
}{host, type
}
319 Prepare the request by connecting to a proxy server. The
\var{host
}
320 and
\var{type
} will replace those of the instance, and the instance's
321 selector will be the original URL given in the constructor.
324 \begin{methoddesc
}[Request
]{get_origin_req_host
}{}
325 Return the request-host of the origin transaction, as defined by
326 \rfc{2965}. See the documentation for the
\class{Request
}
330 \begin{methoddesc
}[Request
]{is_unverifiable
}{}
331 Return whether the request is unverifiable, as defined by RFC
2965.
332 See the documentation for the
\class{Request
} constructor.
336 \subsection{OpenerDirector Objects
\label{opener-director-objects
}}
338 \class{OpenerDirector
} instances have the following methods:
340 \begin{methoddesc
}[OpenerDirector
]{add_handler
}{handler
}
341 \var{handler
} should be an instance of
\class{BaseHandler
}. The
342 following methods are searched, and added to the possible chains (note
343 that HTTP errors are a special case).
346 \item \method{\var{protocol
}_open()
} ---
347 signal that the handler knows how to open
\var{protocol
} URLs.
348 \item \method{http_error_
\var{type
}()
} ---
349 signal that the handler knows how to handle HTTP errors with HTTP
350 error code
\var{type
}.
351 \item \method{\var{protocol
}_error()
} ---
352 signal that the handler knows how to handle errors from
353 (non-
\code{http
})
\var{protocol
}.
354 \item \method{\var{protocol
}_request()
} ---
355 signal that the handler knows how to pre-process
\var{protocol
}
357 \item \method{\var{protocol
}_response()
} ---
358 signal that the handler knows how to post-process
\var{protocol
}
363 \begin{methoddesc
}[OpenerDirector
]{open
}{url
\optional{, data
}}
364 Open the given
\var{url
} (which can be a request object or a string),
365 optionally passing the given
\var{data
}.
366 Arguments, return values and exceptions raised are the same as those
367 of
\function{urlopen()
} (which simply calls the
\method{open()
} method
368 on the currently installed global
\class{OpenerDirector
}).
371 \begin{methoddesc
}[OpenerDirector
]{error
}{proto
\optional{,
372 arg
\optional{,
\moreargs}}}
373 Handle an error of the given protocol. This will call the registered
374 error handlers for the given protocol with the given arguments (which
375 are protocol specific). The HTTP protocol is a special case which
376 uses the HTTP response code to determine the specific error handler;
377 refer to the
\method{http_error_*()
} methods of the handler classes.
379 Return values and exceptions raised are the same as those
380 of
\function{urlopen()
}.
383 OpenerDirector objects open URLs in three stages:
385 The order in which these methods are called within each stage is
386 determined by sorting the handler instances.
389 \item Every handler with a method named like
390 \method{\var{protocol
}_request()
} has that method called to
391 pre-process the request.
393 \item Handlers with a method named like
394 \method{\var{protocol
}_open()
} are called to handle the request.
395 This stage ends when a handler either returns a
396 non-
\constant{None
} value (ie. a response), or raises an exception
397 (usually
\exception{URLError
}). Exceptions are allowed to propagate.
399 In fact, the above algorithm is first tried for methods named
400 \method{default_open
}. If all such methods return
401 \constant{None
}, the algorithm is repeated for methods named like
402 \method{\var{protocol
}_open()
}. If all such methods return
403 \constant{None
}, the algorithm is repeated for methods named
404 \method{unknown_open()
}.
406 Note that the implementation of these methods may involve calls of
407 the parent
\class{OpenerDirector
} instance's
\method{.open()
} and
408 \method{.error()
} methods.
410 \item Every handler with a method named like
411 \method{\var{protocol
}_response()
} has that method called to
412 post-process the response.
416 \subsection{BaseHandler Objects
\label{base-handler-objects
}}
418 \class{BaseHandler
} objects provide a couple of methods that are
419 directly useful, and others that are meant to be used by derived
420 classes. These are intended for direct use:
422 \begin{methoddesc
}[BaseHandler
]{add_parent
}{director
}
423 Add a director as parent.
426 \begin{methoddesc
}[BaseHandler
]{close
}{}
430 The following members and methods should only be used by classes
431 derived from
\class{BaseHandler
}.
\note{The convention has been
432 adopted that subclasses defining
\method{\var{protocol
}_request()
} or
433 \method{\var{protocol
}_response()
} methods are named
434 \class{*Processor
}; all others are named
\class{*Handler
}.
}
437 \begin{memberdesc
}[BaseHandler
]{parent
}
438 A valid
\class{OpenerDirector
}, which can be used to open using a
439 different protocol, or handle errors.
442 \begin{methoddesc
}[BaseHandler
]{default_open
}{req
}
443 This method is
\emph{not
} defined in
\class{BaseHandler
}, but
444 subclasses should define it if they want to catch all URLs.
446 This method, if implemented, will be called by the parent
447 \class{OpenerDirector
}. It should return a file-like object as
448 described in the return value of the
\method{open()
} of
449 \class{OpenerDirector
}, or
\code{None
}. It should raise
450 \exception{URLError
}, unless a truly exceptional thing happens (for
451 example,
\exception{MemoryError
} should not be mapped to
452 \exception{URLError
}).
454 This method will be called before any protocol-specific open method.
457 \begin{methoddescni
}[BaseHandler
]{\var{protocol
}_open
}{req
}
458 This method is
\emph{not
} defined in
\class{BaseHandler
}, but
459 subclasses should define it if they want to handle URLs with the given
462 This method, if defined, will be called by the parent
463 \class{OpenerDirector
}. Return values should be the same as for
464 \method{default_open()
}.
467 \begin{methoddesc
}[BaseHandler
]{unknown_open
}{req
}
468 This method is
\var{not
} defined in
\class{BaseHandler
}, but
469 subclasses should define it if they want to catch all URLs with no
470 specific registered handler to open it.
472 This method, if implemented, will be called by the
\member{parent
}
473 \class{OpenerDirector
}. Return values should be the same as for
474 \method{default_open()
}.
477 \begin{methoddesc
}[BaseHandler
]{http_error_default
}{req, fp, code, msg, hdrs
}
478 This method is
\emph{not
} defined in
\class{BaseHandler
}, but
479 subclasses should override it if they intend to provide a catch-all
480 for otherwise unhandled HTTP errors. It will be called automatically
481 by the
\class{OpenerDirector
} getting the error, and should not
482 normally be called in other circumstances.
484 \var{req
} will be a
\class{Request
} object,
\var{fp
} will be a
485 file-like object with the HTTP error body,
\var{code
} will be the
486 three-digit code of the error,
\var{msg
} will be the user-visible
487 explanation of the code and
\var{hdrs
} will be a mapping object with
488 the headers of the error.
490 Return values and exceptions raised should be the same as those
491 of
\function{urlopen()
}.
494 \begin{methoddesc
}[BaseHandler
]{http_error_
\var{nnn
}}{req, fp, code, msg, hdrs
}
495 \var{nnn
} should be a three-digit HTTP error code. This method is
496 also not defined in
\class{BaseHandler
}, but will be called, if it
497 exists, on an instance of a subclass, when an HTTP error with code
500 Subclasses should override this method to handle specific HTTP
503 Arguments, return values and exceptions raised should be the same as
504 for
\method{http_error_default()
}.
507 \begin{methoddescni
}[BaseHandler
]{\var{protocol
}_request
}{req
}
508 This method is
\emph{not
} defined in
\class{BaseHandler
}, but
509 subclasses should define it if they want to pre-process requests of
512 This method, if defined, will be called by the parent
513 \class{OpenerDirector
}.
\var{req
} will be a
\class{Request
} object.
514 The return value should be a
\class{Request
} object.
517 \begin{methoddescni
}[BaseHandler
]{\var{protocol
}_response
}{req, response
}
518 This method is
\emph{not
} defined in
\class{BaseHandler
}, but
519 subclasses should define it if they want to post-process responses of
522 This method, if defined, will be called by the parent
523 \class{OpenerDirector
}.
\var{req
} will be a
\class{Request
} object.
524 \var{response
} will be an object implementing the same interface as
525 the return value of
\function{urlopen()
}. The return value should
526 implement the same interface as the return value of
527 \function{urlopen()
}.
530 \subsection{HTTPRedirectHandler Objects
\label{http-redirect-handler
}}
532 \note{Some HTTP redirections require action from this module's client
533 code. If this is the case,
\exception{HTTPError
} is raised. See
534 \rfc{2616} for details of the precise meanings of the various
537 \begin{methoddesc
}[HTTPRedirectHandler
]{redirect_request
}{req,
539 Return a
\class{Request
} or
\code{None
} in response to a redirect.
540 This is called by the default implementations of the
541 \method{http_error_30*()
} methods when a redirection is received from
542 the server. If a redirection should take place, return a new
543 \class{Request
} to allow
\method{http_error_30*()
} to perform the
544 redirect. Otherwise, raise
\exception{HTTPError
} if no other handler
545 should try to handle this URL, or return
\code{None
} if you can't but
546 another handler might.
549 The default implementation of this method does not strictly
550 follow
\rfc{2616}, which says that
301 and
302 responses to
\code{POST
}
551 requests must not be automatically redirected without confirmation by
552 the user. In reality, browsers do allow automatic redirection of
553 these responses, changing the POST to a
\code{GET
}, and the default
554 implementation reproduces this behavior.
559 \begin{methoddesc
}[HTTPRedirectHandler
]{http_error_301
}{req,
561 Redirect to the
\code{Location:
} URL. This method is called by
562 the parent
\class{OpenerDirector
} when getting an HTTP
563 `moved permanently' response.
566 \begin{methoddesc
}[HTTPRedirectHandler
]{http_error_302
}{req,
568 The same as
\method{http_error_301()
}, but called for the
572 \begin{methoddesc
}[HTTPRedirectHandler
]{http_error_303
}{req,
574 The same as
\method{http_error_301()
}, but called for the
575 `see other' response.
578 \begin{methoddesc
}[HTTPRedirectHandler
]{http_error_307
}{req,
580 The same as
\method{http_error_301()
}, but called for the
581 `temporary redirect' response.
585 \subsection{HTTPCookieProcessor Objects
\label{http-cookie-processor
}}
589 \class{HTTPCookieProcessor
} instances have one attribute:
591 \begin{memberdesc
}{cookiejar
}
592 The
\class{cookielib.CookieJar
} in which cookies are stored.
596 \subsection{ProxyHandler Objects
\label{proxy-handler
}}
598 \begin{methoddescni
}[ProxyHandler
]{\var{protocol
}_open
}{request
}
599 The
\class{ProxyHandler
} will have a method
600 \method{\var{protocol
}_open()
} for every
\var{protocol
} which has a
601 proxy in the
\var{proxies
} dictionary given in the constructor. The
602 method will modify requests to go through the proxy, by calling
603 \code{request.set_proxy()
}, and call the next handler in the chain to
604 actually execute the protocol.
608 \subsection{HTTPPasswordMgr Objects
\label{http-password-mgr
}}
610 These methods are available on
\class{HTTPPasswordMgr
} and
611 \class{HTTPPasswordMgrWithDefaultRealm
} objects.
613 \begin{methoddesc
}[HTTPPasswordMgr
]{add_password
}{realm, uri, user, passwd
}
614 \var{uri
} can be either a single URI, or a sequence of URIs.
\var{realm
},
615 \var{user
} and
\var{passwd
} must be strings. This causes
616 \code{(
\var{user
},
\var{passwd
})
} to be used as authentication tokens
617 when authentication for
\var{realm
} and a super-URI of any of the
621 \begin{methoddesc
}[HTTPPasswordMgr
]{find_user_password
}{realm, authuri
}
622 Get user/password for given realm and URI, if any. This method will
623 return
\code{(None, None)
} if there is no matching user/password.
625 For
\class{HTTPPasswordMgrWithDefaultRealm
} objects, the realm
626 \code{None
} will be searched if the given
\var{realm
} has no matching
631 \subsection{AbstractBasicAuthHandler Objects
632 \label{abstract-basic-auth-handler
}}
634 \begin{methoddesc
}[AbstractBasicAuthHandler
]{http_error_auth_reqed
}
635 {authreq, host, req, headers
}
636 Handle an authentication request by getting a user/password pair, and
637 re-trying the request.
\var{authreq
} should be the name of the header
638 where the information about the realm is included in the request,
639 \var{host
} specifies the URL and path to authenticate for,
\var{req
}
640 should be the (failed)
\class{Request
} object, and
\var{headers
}
641 should be the error headers.
643 \var{host
} is either an authority (e.g.
\code{"python.org"
}) or a URL
644 containing an authority component (e.g.
\code{"http://python.org/"
}).
645 In either case, the authority must not contain a userinfo component
646 (so,
\code{"python.org"
} and
\code{"python.org:
80"
} are fine,
647 \code{"joe:password@python.org"
} is not).
651 \subsection{HTTPBasicAuthHandler Objects
652 \label{http-basic-auth-handler
}}
654 \begin{methoddesc
}[HTTPBasicAuthHandler
]{http_error_401
}{req, fp, code,
656 Retry the request with authentication information, if available.
660 \subsection{ProxyBasicAuthHandler Objects
661 \label{proxy-basic-auth-handler
}}
663 \begin{methoddesc
}[ProxyBasicAuthHandler
]{http_error_407
}{req, fp, code,
665 Retry the request with authentication information, if available.
669 \subsection{AbstractDigestAuthHandler Objects
670 \label{abstract-digest-auth-handler
}}
672 \begin{methoddesc
}[AbstractDigestAuthHandler
]{http_error_auth_reqed
}
673 {authreq, host, req, headers
}
674 \var{authreq
} should be the name of the header where the information about
675 the realm is included in the request,
\var{host
} should be the host to
676 authenticate to,
\var{req
} should be the (failed)
\class{Request
}
677 object, and
\var{headers
} should be the error headers.
681 \subsection{HTTPDigestAuthHandler Objects
682 \label{http-digest-auth-handler
}}
684 \begin{methoddesc
}[HTTPDigestAuthHandler
]{http_error_401
}{req, fp, code,
686 Retry the request with authentication information, if available.
690 \subsection{ProxyDigestAuthHandler Objects
691 \label{proxy-digest-auth-handler
}}
693 \begin{methoddesc
}[ProxyDigestAuthHandler
]{http_error_407
}{req, fp, code,
695 Retry the request with authentication information, if available.
699 \subsection{HTTPHandler Objects
\label{http-handler-objects
}}
701 \begin{methoddesc
}[HTTPHandler
]{http_open
}{req
}
702 Send an HTTP request, which can be either GET or POST, depending on
703 \code{\var{req
}.has_data()
}.
707 \subsection{HTTPSHandler Objects
\label{https-handler-objects
}}
709 \begin{methoddesc
}[HTTPSHandler
]{https_open
}{req
}
710 Send an HTTPS request, which can be either GET or POST, depending on
711 \code{\var{req
}.has_data()
}.
715 \subsection{FileHandler Objects
\label{file-handler-objects
}}
717 \begin{methoddesc
}[FileHandler
]{file_open
}{req
}
718 Open the file locally, if there is no host name, or
719 the host name is
\code{'localhost'
}. Change the
720 protocol to
\code{ftp
} otherwise, and retry opening
721 it using
\member{parent
}.
725 \subsection{FTPHandler Objects
\label{ftp-handler-objects
}}
727 \begin{methoddesc
}[FTPHandler
]{ftp_open
}{req
}
728 Open the FTP file indicated by
\var{req
}.
729 The login is always done with empty username and password.
733 \subsection{CacheFTPHandler Objects
\label{cacheftp-handler-objects
}}
735 \class{CacheFTPHandler
} objects are
\class{FTPHandler
} objects with
736 the following additional methods:
738 \begin{methoddesc
}[CacheFTPHandler
]{setTimeout
}{t
}
739 Set timeout of connections to
\var{t
} seconds.
742 \begin{methoddesc
}[CacheFTPHandler
]{setMaxConns
}{m
}
743 Set maximum number of cached connections to
\var{m
}.
747 \subsection{GopherHandler Objects
\label{gopher-handler
}}
749 \begin{methoddesc
}[GopherHandler
]{gopher_open
}{req
}
750 Open the gopher resource indicated by
\var{req
}.
754 \subsection{UnknownHandler Objects
\label{unknown-handler-objects
}}
756 \begin{methoddesc
}[UnknownHandler
]{unknown_open
}{}
757 Raise a
\exception{URLError
} exception.
761 \subsection{HTTPErrorProcessor Objects
\label{http-error-processor-objects
}}
765 \begin{methoddesc
}[HTTPErrorProcessor
]{unknown_open
}{}
766 Process HTTP error responses.
768 For
200 error codes, the response object is returned immediately.
770 For non-
200 error codes, this simply passes the job on to the
771 \method{\var{protocol
}_error_
\var{code
}()
} handler methods, via
772 \method{OpenerDirector.error()
}. Eventually,
773 \class{urllib2.HTTPDefaultErrorHandler
} will raise an
774 \exception{HTTPError
} if no other handler handles the error.
778 \subsection{Examples
\label{urllib2-examples
}}
780 This example gets the python.org main page and displays the first
100
785 >>> f = urllib2.urlopen('http://www.python.org/')
786 >>> print f.read(
100)
787 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML
4.01 Transitional//EN">
788 <?xml-stylesheet href="./css/ht2html
791 Here we are sending a data-stream to the stdin of a CGI and reading
792 the data it returns to us. Note that this example will only work when the
793 Python installation supports SSL.
797 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
798 ... data='This data is passed to stdin of the CGI')
799 >>> f = urllib2.urlopen(req)
801 Got Data: "This data is passed to stdin of the CGI"
804 The code for the sample CGI used in the above example is:
807 #!/usr/bin/env python
809 data = sys.stdin.read()
810 print 'Content-type: text-plain
\n\nGot Data: "
%s"' % data
814 Use of Basic HTTP Authentication:
818 # Create an OpenerDirector with support for Basic HTTP Authentication...
819 auth_handler = urllib2.HTTPBasicAuthHandler()
820 auth_handler.add_password('realm', 'host', 'username', 'password')
821 opener = urllib2.build_opener(auth_handler)
822 # ...and install it globally so it can be used with urlopen.
823 urllib2.install_opener(opener)
824 urllib2.urlopen('http://www.example.com/login.html')
827 \function{build_opener()
} provides many handlers by default, including a
828 \class{ProxyHandler
}. By default,
\class{ProxyHandler
} uses the
829 environment variables named
\code{<scheme>_proxy
}, where
\code{<scheme>
}
830 is the URL scheme involved. For example, the
\envvar{http_proxy
}
831 environment variable is read to obtain the HTTP proxy's URL.
833 This example replaces the default
\class{ProxyHandler
} with one that uses
834 programatically-supplied proxy URLs, and adds proxy authorization support
835 with
\class{ProxyBasicAuthHandler
}.
838 proxy_handler = urllib2.ProxyHandler(
{'http': 'http://www.example.com:
3128/'
})
839 proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
840 proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
842 opener = build_opener(proxy_handler, proxy_auth_handler)
843 # This time, rather than install the OpenerDirector, we use it directly:
844 opener.open('http://www.example.com/login.html')
850 Use the
\var{headers
} argument to the
\class{Request
} constructor, or:
854 req = urllib2.Request('http://www.example.com/')
855 req.add_header('Referer', 'http://www.python.org/')
856 r = urllib2.urlopen(req)
859 \class{OpenerDirector
} automatically adds a
\mailheader{User-Agent
}
860 header to every
\class{Request
}. To change this:
864 opener = urllib2.build_opener()
865 opener.addheaders =
[('User-agent', 'Mozilla/
5.0')
]
866 opener.open('http://www.example.com/')
869 Also, remember that a few standard headers
870 (
\mailheader{Content-Length
},
\mailheader{Content-Type
} and
871 \mailheader{Host
}) are added when the
\class{Request
} is passed to
872 \function{urlopen()
} (or
\method{OpenerDirector.open()
}).