Prevent threading.Thread.join() from blocking when a previous call raised an
[python.git] / Doc / lib / librobotparser.tex
blob5eac5283eea7492bbd72bce22fcc9f6e27d1a3b8
1 \section{\module{robotparser} ---
2 Parser for robots.txt}
4 \declaremodule{standard}{robotparser}
5 \modulesynopsis{Loads a \protect\file{robots.txt} file and
6 answers questions about fetchability of other URLs.}
7 \sectionauthor{Skip Montanaro}{skip@mojam.com}
9 \index{WWW}
10 \index{World Wide Web}
11 \index{URL}
12 \index{robots.txt}
14 This module provides a single class, \class{RobotFileParser}, which answers
15 questions about whether or not a particular user agent can fetch a URL on
16 the Web site that published the \file{robots.txt} file. For more details on
17 the structure of \file{robots.txt} files, see
18 \url{http://www.robotstxt.org/wc/norobots.html}.
20 \begin{classdesc}{RobotFileParser}{}
22 This class provides a set of methods to read, parse and answer questions
23 about a single \file{robots.txt} file.
25 \begin{methoddesc}{set_url}{url}
26 Sets the URL referring to a \file{robots.txt} file.
27 \end{methoddesc}
29 \begin{methoddesc}{read}{}
30 Reads the \file{robots.txt} URL and feeds it to the parser.
31 \end{methoddesc}
33 \begin{methoddesc}{parse}{lines}
34 Parses the lines argument.
35 \end{methoddesc}
37 \begin{methoddesc}{can_fetch}{useragent, url}
38 Returns \code{True} if the \var{useragent} is allowed to fetch the \var{url}
39 according to the rules contained in the parsed \file{robots.txt} file.
40 \end{methoddesc}
42 \begin{methoddesc}{mtime}{}
43 Returns the time the \code{robots.txt} file was last fetched. This is
44 useful for long-running web spiders that need to check for new
45 \code{robots.txt} files periodically.
46 \end{methoddesc}
48 \begin{methoddesc}{modified}{}
49 Sets the time the \code{robots.txt} file was last fetched to the current
50 time.
51 \end{methoddesc}
53 \end{classdesc}
55 The following example demonstrates basic use of the RobotFileParser class.
57 \begin{verbatim}
58 >>> import robotparser
59 >>> rp = robotparser.RobotFileParser()
60 >>> rp.set_url("http://www.musi-cal.com/robots.txt")
61 >>> rp.read()
62 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
63 False
64 >>> rp.can_fetch("*", "http://www.musi-cal.com/")
65 True
66 \end{verbatim}