Merge commit 'origin/master'
[libwww-perl-eserte.git] / doc / examples.html
blobe0252a1a24e31322c4e735b133525ddea889187d
1 <title>libwww-perl-5</title>
3 <h1 align=center>LIBWWW-PERL-5</h1>
5 <h2>Introduction</h2>
7 <p>The libwww-perl-5 library is a collection of perl modules that
8 provide a simple and consistent programming interface to the
9 World-Wide Web. The library also contain modules that are of more
10 general use.
12 <p>This article gives an introduction to the library and ...
14 <p>The main focus of the library is to provide functions that allow
15 you to write WWW clients, thus libwww-perl is often presented as a WWW
16 client library. The main features of the library are:
18 <ul>
20 <li> Contains various reuseable components (modules) that can be
21 used separately.
23 <li> Provides an object oriented model of HTTP-style communication.
24 Within this framework we currently support access to
26 http,
27 gopher,
28 ftp,
29 file, and
30 mailto
32 resources.
34 <li> Support basic authorization
36 <li> Transparent redirect handling
38 <li> Supports proxy
40 <li> URL handling (both absolute &amp; relative)
42 <li> RobotRules (a parser for robots.txt files)
44 <li> MailCap handling
46 <li> HTML parser and formatter (PS and plain text)
48 <li> The library be used through the full object oriented interface
49 or through a very simple procedural interface.
51 <li> A simple command line client application that is called
52 <em>request</em>.
54 <li> The library can cooperate with Tk.
55 A simple Tk-based GUI browser is distributed with the Tk
56 extention for perl.
57 </ul>
62 <h2>HTTP style communication</h2>
64 The libwww-perl library is based on HTTP style communication. What
65 does that mean? This is a quote from the <a
66 href="http://www.w3.org/pub/WWW/Protocols/">HTTP specification</a>
67 document:
69 <blockquote>
70 <p>The HTTP protocol is based on a request/response paradigm. A client
71 establishes a connection with a server and sends a request to the
72 server in the form of a request method, URI, and protocol version,
73 followed by a MIME-like message containing request modifiers, client
74 information, and possible body content. The server responds with a
75 status line, including the message's protocol version and a success or
76 error code, followed by a MIME-like message containing server
77 information, entity metainformation, and possible body content.
78 </blockquote>
80 <p>What this means to libwww-perl is that communcation always take
81 place by creating and configuring a <em>request</em> object. This
82 object is then passed to a server and we get a <em>response</em>
83 object in return that we can examine. The same simple model is used
84 for any kind of service we want to access.
86 <p>If we want to fetch a document from a remote file server we send it
87 a request that contains a name for that document and the response
88 contains the document itself. If we want to send a mail message to
89 somebody then we send the request object which contains our message to
90 the mail server and the response object will contain an acknowledgment
91 that tells us that the message has been accepted and will be forwarded
92 to the receipients.
94 <p>It is as simple as that!
96 <h3>Request object</h3>
98 The request object has the class name <em>HTTP::Request</em> in
99 libwww-perl. The fact that the class name use <em>HTTP::</em> as a
100 name prefix only implies that we use this model of communication. It
101 does not limit the kind of services we can try to send this
102 <em>request</em> to.
104 The main attributes of <em>HTTP::Request</em> objects are:
106 <ul>
108 <li> The <b>method</b> is a short string that tells what kind of
109 request this is. The most usual methods are <em>GET</em>,
110 <em>POST</em> and <em>HEAD</em>.
112 <li> The <b>url</b> is a string denoting the protocol, server and
113 the name of the "document" we want to access. The url might
114 also encode various other parameters. This is the name of the
115 resource we want to access.
117 <li> The <b>headers</b> contain additional information about the
118 request and can also used to describe the content. The headers
119 is a set of keyword/value pairs.
121 <li> The <b>content</b> is an arbitrary amount of binary data.
123 </ul>
127 <h3>Response object</h3>
129 The request object has the class name <em>HTTP::Response</em> in
130 libwww-perl. The main attributes of objects of this class are:
132 <ul>
133 <li> The <b>code</b> is a numerical value that encode the overall
134 outcome of the request.
136 <li> The <b>message</b> is a short (human readable) string that
137 corresponds to the <em>code</em>.
139 <li> The <b>headers</b> contain additional information about the
140 response and they describe the content.
142 <li> The <b>content</b> is an arbitrary amount of binary data.
144 </ul>
146 Since we don't want to handle the <em>code</em> directly in our
147 programs the libwww-perl response object have methods that can be used
148 to query the kind of code present:
150 <ul>
152 <li> <b>isSuccess</b>
153 <li> <b>isRedirect</b>
154 <li> <b>isError</b>
156 </ul>
159 <h3>User Agent</h3>
161 Ok, I have created this nice <em>request</em> object. What do I do
162 with it?
164 <p>The answer is that you pass it on to the <em>user agent</em> object
165 and it will take care of all the things that need to be done
166 (low-level communcation and error handling) and the user agent will
167 give you back a <em>response</em> object. The user agent represents
168 your application on the network and it provides you with an interface
169 that can accept <em>requests</em> and will return <em>responses</em>.
171 <p><i>There should be a nice figure here explaining this. It should
172 show the UA as an interface layer between the application code and the
173 network.</i>
175 <p>The libwww-perl class name for the user agent is
176 <em>LWP::UserAgent</em>. Every libwww-perl application that wants to
177 communicate should create at least one object of this kind. The main
178 method provided by this object is <em>request()</em>. This method
179 takes an <em>HTTP::Request</em> object as argument and will return a
180 <em>HTTP::Response</em> object.
182 <p>The <em>LWP::UserAgent</em> has many other attributes that lets you
183 configure how it will interact with the network and with your
184 application code.
186 <ul>
188 <li> The <b>timeout</b> specify how much time we give remote servers
189 in creating responses before the library creates an internal
190 <em>timeout</em> response.
191 <li> The <b>agent</b> specify the name that your application should
192 present itself as on the network.
193 <li> The <b>useAlarm</b> specify if it is ok for the user agent to
194 use the alarm(3) system to implement timeouts.
195 <li> The <b>useEval</b> specify if the agent should raise an
196 expection (<em>die</em> in perl) if an error condition occur.
198 <li> The <b>proxy</b> and <b>noProxy</b> specify when communication
199 should go through a <a
200 href="http://www.w3.org/pub/WWW/Proxies/">proxy server</a>.
202 <li> The <b>credentials</b> provide a way to set up usernames and
203 passwords that is needed to access certain services.
205 </ul>
207 <p>Many applications would want even more control over how they
208 interact with the network and they get this by specializing the
209 <em>LWP::UserAgent</em> by sub-classing.
211 <!-- I don't want to describe these!!!
212 <ul>
214 <li> simpleRequest()
215 <li> redirectOK()
216 <li> credentials()
217 <li> getBasicCredentials()
218 <li> mirror
220 </ul>
223 <h1>Examples</h1>
225 Let's turn to a few examples to illustrate the concepts described
226 above. You should be able to run these examples directly given that
227 you have both perl and libwww-perl <a
228 href="install.html">installed</a> on your system. If you store the
229 examples in files you might want to change the first line (#!....) to
230 reflect the location of the perl interpreter on your system.
232 <a name="ex1"><h3>Example 1</h3></a>
233 <hr>
234 <pre>
235 #!/local/bin/perl -w
237 require LWP;
239 $ua = new LWP::UserAgent;
241 $request = new HTTP::Request 'GET', 'http://www.perl.com/perl/';
242 $request->header('Accept', 'text/html');
244 $response = $ua->request($request);
246 if ($response->isSuccess) {
247 die "This is bad" if $response->header('Content-Type') ne 'text/html';
248 print $response->content;
249 } else {
250 die "Request failed: " . $response->code . " " . $response->message . "\n";
253 </pre>
254 <hr>
256 This example show a simple application that fetch an HTML document
257 with the name <a
258 href="http://www.perl.com/perl/">http://www.perl.com/perl/</a> from
259 the network and then prints it out (without reformatting). What is
260 going on is the following:
262 <ul>
264 <li> First the statement <em>"require LWP;"</em> is needed to make the
265 libwww-perl classes available to the application.
267 <li> The next thing that happens is that we create an user agent
268 object and assigns the reference to this object to the variable
269 <em>$ua</em>.
271 <li> Then we create a request object and assing it to the
272 <em>$request</em> variable. The request object is initialized
273 with the method <em>GET</em> and the URL <em>http://www....</em>.
275 <li> The next thing that happens it that we configure the request by
276 adding an <em>Accept</em> header to it. This header informs the
277 server serving this request that we want an HTML document back.
279 <li> Then we hand the request object over to the user agent and we
280 receive a response object in return. We assign the response
281 object to the $response variable.
283 <li> Then we check the response to see that it really was a
284 successful response and if it is we print the content (i.e. the
285 document) that comes with the response.
287 <li> If it was not a successful response we print an error message
288 and die. <em>You might want to try to change the URL so that
289 you get an unsuccessful response back!</em>
291 </ul>
293 Was this complicated for something as simple as retrieving a simple
294 file from a network server? Let's take a look at how we can make the
295 same thing much simpler.
297 <a name="ex2"><h3>Example 2</h3></a>
299 <hr>
300 <pre>
301 #!/local/bin/perl -w
302 use LWP::Simple;
303 getprint 'http://www.perl.com/perl/';
304 </pre>
305 <hr>
307 In this example we have used a module called <em>LWP::Simple</em>.
308 This two-line program essentially does the same as the code in <a
309 href="#ex1">example 1</a>. The <em>LWP::Simple</em> module provide a
310 very simplied procedural interface to the libwww-perl library. After
311 you have executed the <em>use LWP::Simple;</em> statement you have
312 access to the following routines:
314 <ul>
315 <li> <em>get($url)</em><br> Takes an URL as an argument and returns the
316 content. Returns <em>undef</em> if an error occured.
318 <li> <em>getprint($url)</em>
319 <li> <em>head($url)</em> --&gt;
320 ($content_type,
321 $document_size,
322 $modified_time, $expires, $server)
323 <li> <em>mirror($url, $file)</em>
326 </ul>
328 <p>The LWP::Simple module is also suitable for direct invocation from
329 the command line. The following command is equivalent with the script
330 above:
332 <pre>
333 perl -MLWP::Simple -e "getprint 'http://www.perl.com/perl/'"
334 </pre>
336 <p>Let's write a more "complete" web browser...
337 <hr>
338 <pre>
339 #!/local/bin/perl -w
340 use LWP::Simple;
341 getprint shift || die "Usage: $0 &lt;url&gt;\n";
342 </pre>
343 <hr>
346 <a name="ex3"><h3>Example 3</h3></a>
348 Process data as it arrives from the network. Use a callback routine.
350 <h3>Example 4</h3>
352 Let's try to reformat the document using the HTML formatter.
354 <p>Let's write an even more "complete" web browser...
355 <hr>
356 <pre>
357 #!/local/bin/perl -w
358 use LWP::Simple;
359 use HTML::Parse;
360 print parse_html(get shift || die "Usage : $0 &lt;url&gt;\n")->format;
361 <pre>
362 </hr>
364 <p>Invoke a viewer for the document (MailCap)
366 <h3>Example 5</h3>
368 <ul>
370 <li> A robot
372 <li> Postscript output using the font metrics modules
374 <li> Base64/Quoted printable
376 <li> URL handling
378 <li> HTTP headers
380 </ul>