Code refactoring
[CGIscriptor.git] / CGIservlet.html
blob705db61646c937fe94fbfdd320487b6195516642
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2 <HTML>
4 <HEAD>
6 <TITLE>CGIservlet manual</TITLE>
9 </HEAD>
11 <BODY>
13 <H1 ALIGN="CENTER">CGIservlet</H1>
15 <P>
16 A HTTPd "connector" for running CGI scripts on unix systems as WWW
17 accessible Web sites. The servlet starts a true HTTP daemon that channels
18 HTTP requests to forked daughter processes. CGIservlet.pl is NOT a
19 full fledged server. Moreover, this servlet is definitely NOT intended
20 as a replacement for a real server (e.g., Apache). It's design goal was
21 SIMPLICITY, and not mileage.
22 </P>
24 <P>
25 Note that a HTTP server can be accessed on your local machine WITHOUT
26 internet access (but WITH a DNS?):
27 use "http://localhost[:port]/[path]" or "http://127.0.0.1[:port]/[path]"
28 as the URL. It is also easy to restrict access to the servlet to localhost
29 users (i.e., the computer running the servlet).
30 </P>
32 <P>
33 Suggested uses:
34 </P>
36 <ul>
37 <li>
39 <P>
40 A testbed for CGI-scripts and document-trees outside the primary server.
41 When developing new scripts and services, you don't want to mess up your
42 current Web-site. CGIservlet is an easy way to start a temporary (private)
43 server. CGIservlet allows to test separate HTTP server components, e.g.,
44 user authentication, in isolation.
45 </P>
47 <li>
49 <P>
50 A special purpose temporary server (WWW everywhere/anytime).
51 We run identification and other experiments over the inter-/intra-net using
52 CGI-scripts. This means a lot of development and changes and only little
53 actual run-time. The people doing this do not want "scripting" access to our
54 departmental server with all its restrictions and security. So we need a
55 small, lightweigth, easy-to-configure server that can be run by each
56 investigator on her own account (and risk).
57 </P>
59 <li>
61 <P>
62 Interactive WWW presentations.
63 Not everyone is content with the features of "standard" office presentation
64 software. HTML and its associated browsers are an alternative (especially
65 under Linux). However, you need a server to realize the full interactive
66 nature of the WWW. CGIservlet with the necessary scripts can be run from
67 a floppie (a Web server in 100 kB). The CGIservlet can actually run a
68 (small) web site from RAM, without disk access (if you DONOT use the
69 2>pid.log redirection on startup).
70 With the "localhost" or "127.0.0.1" id in your browser you can use it
71 standalone.
72 </P>
74 </ul>
76 <P>
77 When the servlet is started with the -r option, only requests from "localhost"
78 or "127.0.0.1" are accepted (default) or from addresses indicated after the
79 -r switch.
80 </P>
82 <P>
83 Running demo's and more information can be found at
84 <A HREF="http://www.fon.hum.uva.nl/rob/OSS/OSS.html">
85 http://www.fon.hum.uva.nl/rob/OSS/OSS.html</A>
86 </P>
88 <H2 ALIGN="CENTER">Inner workings</H2>
90 <P>
91 Whenever an HTTP request is received, the specified CGI script is
92 started inside a child process as if it was inside a real server (e.g.,
93 Apache). The evironment variables are set more or less as in Apache.
94 Note that CGIservlet only uses a SINGLE script for ALL requests.
95 No attemps for security are made, it is the script's responsibility to
96 check access rights and the validity of the request.<br>
97 When no scripts are given, CGIservlet runs as a bare bone WWW server
98 configurable to execute scripts (the default setting is as a STATIC server).
99 </P>
101 <PRE>
102 Use: CGIservlet.pl -&lt;switch&gt; &lt;argument&gt; 2&gt;pid.log & (sh)
103 CGIservlet.pl -&lt;switch&gt; &lt;argument&gt; &gt;&pid.log & (csh)
104 </PRE>
107 The servlet prints out pid and port number on STDERR. It is
108 adviced to store these in a separate file (this will become the
109 error log). <BR>
110 NOTE: When running CGIservlet from a Memmory Image (i.e. RAM),
111 do NOT redirect the error output to a file, but use something
112 like MAILTO!
113 </P>
115 <PRE>
116 Stop: sh pid.log (kills the server process)
117 </PRE>
120 The first line in the file that receives STDERR output is a command
121 to stop CGIservlet.
122 </P>
125 examples:
126 </P>
128 <PRE>
129 CGIservlet.pl -p 2345 -d /cgi-bin/CGIscriptor.pl -t /WWW 2>pid.log &
130 CGIservlet.pl -p 8080 -b 'require "CGIscriptor.pl";' -t $PWD -e 'Handle_Request();' 2>pid.log &
131 </PRE>
134 The following example settings implement a static WWW server using 'cat'
135 (and prohibiting Queries):
136 <dl>
137 <dt>-p 8080
138 <dt>-t `pwd`
139 <dt>-b ''
140 <dt>-e
141 'exit if $ENV{QUERY_STRING};$ENV{PATH_INFO}=~/\.([\w]+)$/; "Content-type: ".$mimeType{uc($1)}."\n\n";'
142 <dt>-d 'cat -u -s'
143 <dt>-w '/index.html'
144 <dt>-c 32
145 <dt>-l 512
146 </dl>
148 This is identical to the (static) behaviour of CGIservlet when
149 -e '' -d '' -x '' is used.
150 <br>
151 The CGIservlet command should be run from the intended server-root directory.
152 </P>
155 Another setting will use a package 'CGIscriptor.pl' with a function
156 'HandleRequest()' to implement an interactive WWW server with inline Perl
157 scripting:
158 <dl>
159 <dt>-p 8080
160 <dt>-t `pwd`
161 <dt>-b 'require "CGIscriptor.pl";'
162 <dt>-e 'HandleRequest();'
163 <dt>-d ''
164 <dt>-w '/index.html'
165 <dt>-c 32
166 <dt>-l 32767
167 </dl>
168 </P>
171 Look in the source code or in the CGIservletSETUP.pl file for the current
172 default settings.
173 </P>
175 <H2 ALIGN="CENTER">Command-line switches</H2>
178 There are many switches to tailor the workings of CGIservlet.pl.
179 Some are fairly esoteric and you should only look for them if you
180 need something special urgently. When building a Web site,
181 the specific options you need will "suggest" themselves (e.g., port
182 number, script, or server-root directory). Most default settings
183 should work fine.
184 </P>
187 You can add your own configuration in a file
188 called 'CGIservletSETUP.pl'. This file will be executed ("eval"-ed)
189 after the default setup, but before the command line options take
190 effect. CGIservlet looks for the SETUP file in the startup directory
191 and in the CGIscriptor subdirectory.<br>
192 (Note that the $beginarg variable is evaluated AFTER the setup file).
193 </P>
196 In any case, it is best to change the default settings instead of
197 using the option switches. All defaults are put in a single block.
198 </P>
201 <H3>switches and arguments</H3>
202 </P>
204 <dl>
205 <dt>Realy important
206 <ul>
207 <li>-p[ort] port number<br>
208 For example -p 2345<br>
209 Obviously the port CGIservlet listenes to. Default: -p 8080
211 <li>-a[lias] Alias1 RealURL1 ...<br>
212 For example -a '/Stimulus.aifc' '/catAIFC.xmr'<br>
213 Replaces the given Alias URL path by its real URL path. Accepts full
214 regular expressions too (identified by NON-URL characters).<br>
215 That is, on each request it performs (in order):<br>
216 <pre>
217 if($AliasTranslation{$Path})
219 $Path = $AliasTranslation{$Path};
221 elsif(@RegAliasTranslation)
223 my $i;
224 for($i=0; $i&lt;scalar(@RegAliasTranslation); ++$i)
226 my $Alias = $RegAliasTranslation[$i];
227 my $RealURL = $RegURLTranslation[$i];
228 last if ($Path =~ s#$Alias#$RealURL#g);
231 </pre>
233 The effects can be quite drastic, so be
234 carefull. Note also, that entering many Regular Expression
235 aliases could slow down your servlet. Checking stops after
236 the first match.<br>
237 Full regular expression alias translations are done in the
238 order given! They are recognized as Aliases containing
239 regexp's (i.e., non-URL) operator characters like '^' and
240 '$'.<br>
241 Note: The command line is NOT a good place for entering
242 Aliases, change the code below or add aliases to
243 CGIservletSETUP.pl.
244 </p>
246 <li>--help<br>
247 Prints the manual
249 </ul>
251 <dt>Script related
252 <ul>
253 <li>-b[egin] perl commands<br>
254 For example -b 'require "CGIscriptor.pl";' or
255 'require "/WWW/cgi-bin/XMLelement.pl";'<br>
256 Perl commands evaluated at startup
258 <li>-d[o] perl script file<BR>
259 For example -d '/WWW/cgi-bin/CGIscriptor.pl'<br>
260 The actual CGI-script started as a perl {do "scriptfile"} command.
261 The PATH_INFO and the QUERY are pushed on @ARGV.
263 <li>-x shell command
264 <li>-qx shell command
265 <li>-exec shell command<br>
266 OS shell script or command, e.g., -x 'CGIscriptor.pl' or
267 -x '/WWW/cgi-bin/my-script' <br>
268 The actual CGI-script started as `my-script \'$Path\' \'$QueryString\'`.
269 -qx and -exec[ute] are aliases of -x. For secutiry reasons, Paths or
270 queries containing '-quotes are rejected.
272 <li>-e[val] perl commands<br>
273 For example -e 'Handle_Request();' <br>
274 The argument is evaluated as perl code. The actual CGI-script
275 can be loaded once with -b 'require module.pm' and you only have to
276 call the central function(s).
277 </ul>
279 <dt>WWW-tree related
280 <ul>
281 <li>-t[extroot] path
282 For example -t "$PWD" or -t "/WWW/documents"<br>
283 The root of the server hierachy. Defaults to the working directory
284 at startup time (`pwd`)
286 <li>-w[elcome] filepath
287 For example -w "/index.html" (default)<br>
288 The default welcome page used when no path is entered. Note that
289 this path can point to anything (or nothing real at all).
290 </ul>
291 <dt>Security related<br>
292 The following arguments supply some rudimentary security. It is the
293 responsibility of the script to ensure that the requests are indeed
294 "legal".
295 <ul>
296 <li>-c[hildren] maximum child processes<br>
297 For example -c 32<br>
298 The maximum number of subprocesses started. If there are more requests,
299 the oldest requests are "killed". This should take care of "zombie"
300 processes and server overloading. Note that new requests will be
301 serviced irrespective of the success of killing it's older siblings.
303 <li>-xtime maximum running time of a child<br>
304 For example -xtime 36000<br>
305 The maximum time a child may run in seconds. After a new request has
306 been servised, all children that have run for longer than this time
307 will be killed. This stops runaway processes, often connected to
308 web-crawlers.
310 <li>-l[ength] maximum length of request in bytes<br>
311 For example -l 32768 <br>
312 This prevents overloading the server with enormous queries. Reading of
313 requests simply stops when this limit is reached. This DOES affect
314 POST requests. If the combined length of the COMPLETE HTTP request,
315 including headers, exceeds this limit, the whole request is dropped.
317 <li>-r[estrict] [Remote-address [Remote-host]]<br>
318 For example -r 127.0.0.1 (default of -r)<br>
319 A space separated list of client IP addresses and/or domain names that
320 should be serviced. Default, i.e., '-r' without any addresses or domain
321 names, is the localhost IP address '127.0.0.1'.<br>
322 When using CGIservlet for local purposes only (e.g., development or a
323 presentation), it would be unsafe to allow others to access the servlet.
324 If -r is used (or the corresponding @RemoteAddr or @RemoteHost lists are
325 filled in the code below), all requests from clients whose Remote-address
326 or Remote-host do not match the indicated addresses will be rejected.
327 Partial addresses and domain names are allowed. Matching is done according
328 to Remote-addr =~ /^\Q$pattern\E/ (front to back) and
329 Remote-host =~ /\Q$pattern\E$/ (back to front)
331 <li>-m[emory]<br>
332 No arguments.<br>
333 Reads complete Web site into memory and runs from this image.
334 Set $UseRAMimage = 1; to activate memory-only running.<br>
335 Note that running osshellscripts from this
336 image makes any "security" related claims very shaky.<br>
337 </ul>
338 <dt>Speedup
339 <ul>
340 <li>-n[oname]<br>
341 No arguments. <br>
342 Retrieving the domain name of the Client (i.e., Remote-host) is a
343 very slow process and normally useless. To skip it, enter this
344 option. Note that you cannot use '-r Remote-host' anymore after
345 you enter -n, only IP addresses will work.
346 </ul>
347 </dl>
349 <H3 ALIGN="CENTER">Configuration with the <a href="/CGIservletSETUP.pl">CGIservletSETUP.pl</a> file</H3>
352 You can add your own configuration in a file
353 called '<a href="/CGIservletSETUP.pl">CGIservletSETUP.pl</a>'.
354 This file will be executed ("eval"-ed)
355 after the default setup, but before the command line options take
356 effect. CGIservlet looks for the SETUP file in the startup directory
357 and in the CGIservlet and CGIscriptor subdirectories.
358 (Note that the $beginarg variable is evaluated even later).
359 </P>
362 <H3 ALIGN="CENTER">Changing POST to GET requests</H3>
365 CGIservlet normally only handles requests with the GET method. Processing
366 the input from POST requests is left to the reading application. POST
367 requests add some extra complexity to processing requests. Sometimes,
368 the reading application doesn't handle POST requests. CGIservlet
369 already has to manage the HTTP request. Therefore, it can easily
370 handle the POST request. If the variable $POSTtoGET is set to any
371 non-false value, the content of whole POST request is added to the
372 QUERY_STRING environment variable (preceeded by a '&' if necessary).
373 The content-length is set to 0. If $POSTtoGET equals 'GET', the method
374 will also be changed to 'GET'.
375 </P>
377 <H3>remarks:</H3>
380 All of the arguments of -d, -x, and -e are processed sequentially
381 in this order. This might not be what you want so you should be
382 carefull when using multiple executable arguments.
383 If none of the executable arguments is DEFINED (i.e., they are entered
384 as -d '' -e '' -x ''), each request is treated as a simple
385 text-retrieval. THIS CAN BE A SECURITY RISK!
386 </P>
389 The wiring of an interactive web-server, which also calls shell
390 scripts with the extension '.cgi', is in place. You can
391 "activate" it by changing the "$ExecuteOSshell = 0;" line to
392 "$ExecuteOSshell = 1;".<br>
393 If you have trouble doing this, it might be a good idea
394 to reconsider using a dynamic web server. Executing shell
395 scripts inside a web server is a rather dangerous practise.
396 </P>
399 CGIservlet can run its "standard" web server from memory.
400 At startup, all files are read into a hash table. Upon
401 request, the contents of the file are placed in the
402 environment variable: CGI_FILE_CONTENTS.<br>
403 No further disk access is necessary. This means that:
404 <ol>
405 <li> CGIservlet can run a WWW site from a removable disk,
406 e.g., a floppy
407 <li> The web servlet can run without any read or write privilege.
408 <li> The integrity of the Web-site contents can be secured at the
409 level you want
410 </ol>
411 To compres the memory (RAM) immage, you should hook the
412 compression function to <br>
413 $CompressRAMimage = sub { return shift;}; <br>
414 and the decompression function to <br>
415 $DecompressRAMimage = sub { return shift;}; <br>
416 </P>
418 <H2 ALIGN="CENTER">license</H2>
421 This program is free software; you can redistribute it and/or
422 modify it under the terms of the GNU General Public License
423 as published by the Free Software Foundation; either version 2
424 of the License, or (at your option) any later version.
425 </P>
428 This program is distributed in the hope that it will be useful,
429 but WITHOUT ANY WARRANTY; without even the implied warranty of
430 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
431 GNU General Public License for more details.
432 </P>
435 You should have received a copy of the GNU General Public License
436 along with this program; if not, write to the Free Software
437 Foundation, Inc., 59 Temple Place - Suite 330,
438 Boston, MA 02111-1307, USA.
439 </P>
441 <PRE>
442 Author: Rob van Son
443 email:
444 R.J.J.H.vanSon@uva.nl
445 Institute of Phonetic Sciences/ACLC
446 University of Amsterdam
448 copying freely from the mhttpd server by Jerry LeVan (levan@eagle.eku.edu)
449 Date: 15 Jan 2002
450 Ver: 1.3
451 Env: Perl 5.002 and later
453 Note: CGIservlet.pl was directly inspired by Jerry LeVan's
454 (levan@eagle.eku.edu) simple mhttpd server which again was
455 inspired by work of others. CGIservlet is used as a bare bones
456 socket server for a single CGI script at a time.
457 </PRE>
459 </BODY>
461 </HTML>