Added documentation of USEFAT
[CGIscriptor.git] / CGIservlet.html
blob2d0cc08e2b3f16382efd2b5ca8d7e5156b17b69c
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2 <HTML>
4 <HEAD>
6 <TITLE>CGIservlet manual</TITLE>
9 </HEAD>
11 <BODY>
13 <H1 ALIGN="CENTER">CGIservlet</H1>
15 <P>
16 A HTTPd "connector" for running CGI scripts on unix systems as WWW
17 accessible Web sites. The servlet starts a true HTTP daemon that channels
18 HTTP requests to forked daughter processes. CGIservlet.pl is NOT a
19 full fledged server. Moreover, this servlet is definitely NOT intended
20 as a replacement for a real server (e.g., Apache). It's design goal was
21 SIMPLICITY, and not mileage.
22 </P>
24 <P>
25 Note that a HTTP server can be accessed on your local machine WITHOUT
26 internet access (but WITH a DNS?):
27 use "http://localhost[:port]/[path]" or "http://127.0.0.1[:port]/[path]"
28 as the URL. It is also easy to restrict access to the servlet to localhost
29 users (i.e., the computer running the servlet).
30 </P>
32 <P>
33 Suggested uses:
34 </P>
36 <ul>
37 <li>
39 <P>
40 A testbed for CGI-scripts and document-trees outside the primary server.
41 When developing new scripts and services, you don't want to mess up your
42 current Web-site. CGIservlet is an easy way to start a temporary (private)
43 server. CGIservlet allows to test separate HTTP server components, e.g.,
44 user authentication, in isolation.
45 </P>
47 <li>
49 <P>
50 A special purpose temporary server (WWW everywhere/anytime).
51 We run identification and other experiments over the inter-/intra-net using
52 CGI-scripts. This means a lot of development and changes and only little
53 actual run-time. The people doing this do not want "scripting" access to our
54 departmental server with all its restrictions and security. So we need a
55 small, lightweigth, easy-to-configure server that can be run by each
56 investigator on her own account (and risk).
57 </P>
59 <li>
61 <P>
62 Interactive WWW presentations.
63 Not everyone is content with the features of "standard" office presentation
64 software. HTML and its associated browsers are an alternative (especially
65 under Linux). However, you need a server to realize the full interactive
66 nature of the WWW. CGIservlet with the necessary scripts can be run from
67 a floppie (a Web server in 100 kB). The CGIservlet can actually run a
68 (small) web site from RAM, without disk access (if you DONOT use the
69 2>pid.log redirection on startup).
70 With the "localhost" or "127.0.0.1" id in your browser you can use it
71 standalone.
72 </P>
74 </ul>
76 <P>
77 When the servlet is started with the -r option, only requests from "localhost"
78 or "127.0.0.1" are accepted (default) or from addresses indicated after the
79 -r switch.
80 </P>
82 <P>
83 Running demo's and more information can be found at
84 <A HREF="http://www.fon.hum.uva.nl/rob/OSS/OSS.html">
85 http://www.fon.hum.uva.nl/rob/OSS/OSS.html</A>
86 </P>
88 <H2 ALIGN="CENTER">Inner workings</H2>
90 <P>
91 Whenever an HTTP request is received, the specified CGI script is
92 started inside a child process as if it was inside a real server (e.g.,
93 Apache). The evironment variables are set more or less as in Apache.
94 Note that CGIservlet only uses a SINGLE script for ALL requests.
95 No attemps for security are made, it is the script's responsibility to
96 check access rights and the validity of the request.<br>
97 When no scripts are given, CGIservlet runs as a bare bone WWW server
98 configurable to execute scripts (the default setting is as a STATIC server).
99 </P>
101 <PRE>
102 Use: CGIservlet.pl -&lt;switch&gt; &lt;argument&gt; 2&gt;pid.log & (sh)
103 CGIservlet.pl -&lt;switch&gt; &lt;argument&gt; &gt;&pid.log & (csh)
104 </PRE>
107 The servlet prints out pid and port number on STDERR. It is
108 adviced to store these in a separate file (this will become the
109 error log). <BR>
110 NOTE: When running CGIservlet from a Memmory Image (i.e. RAM),
111 do NOT redirect the error output to a file, but use something
112 like MAILTO!
113 </P>
115 <PRE>
116 Stop: sh pid.log (kills the server process)
117 </PRE>
120 The first line in the file that receives STDERR output is a command
121 to stop CGIservlet.
122 </P>
125 examples:
126 </P>
128 <PRE>
129 CGIservlet.pl -p 2345 -d /cgi-bin/CGIscriptor.pl -t /WWW 2>pid.log &
130 CGIservlet.pl -p 8080 -b 'require "CGIscriptor.pl";' -t $PWD -e 'Handle_Request();' 2>pid.log &
131 </PRE>
134 The following example settings implement a static WWW server using 'cat'
135 (and prohibiting Queries):
136 <dl>
137 <dt>-p 8080
138 <dt>-t `pwd`
139 <dt>-b ''
140 <dt>-e
141 'exit if $ENV{QUERY_STRING};$ENV{PATH_INFO}=~/\.([\w]+)$/; "Content-type: ".$mimeType{uc($1)}."\n\n";'
142 <dt>-d 'cat -u -s'
143 <dt>-w '/index.html'
144 <dt>-c 32
145 <dt>-l 512
146 </dl>
148 This is identical to the (static) behaviour of CGIservlet when
149 -e '' -d '' -x '' is used.
150 <br>
151 The CGIservlet command should be run from the intended server-root directory.
152 </P>
155 Another setting will use a package 'CGIscriptor.pl' with a function
156 'HandleRequest()' to implement an interactive WWW server with inline Perl
157 scripting:
158 <dl>
159 <dt>-p 8080
160 <dt>-t `pwd`
161 <dt>-b 'require "CGIscriptor.pl";'
162 <dt>-e 'HandleRequest();'
163 <dt>-d ''
164 <dt>-w '/index.html'
165 <dt>-c 32
166 <dt>-l 32767
167 </dl>
168 </P>
171 Look in the source code or in the CGIservletSETUP.pl file for the current
172 default settings.
173 </P>
175 <H2 ALIGN="CENTER">Command-line switches</H2>
178 There are many switches to tailor the workings of CGIservlet.pl.
179 Some are fairly esoteric and you should only look for them if you
180 need something special urgently. When building a Web site,
181 the specific options you need will "suggest" themselves (e.g., port
182 number, script, or server-root directory). Most default settings
183 should work fine.
184 </P>
187 You can add your own configuration in a file
188 called 'CGIservletSETUP.pl'. This file will be executed ("eval"-ed)
189 after the default setup, but before the command line options take
190 effect. CGIservlet looks for the SETUP file in the startup directory
191 and in the CGIscriptor subdirectory.<br>
192 (Note that the $beginarg variable is evaluated AFTER the setup file).
193 </P>
196 In any case, it is best to change the default settings instead of
197 using the option switches. All defaults are put in a single block.
198 </P>
201 <H3>switches and arguments</H3>
202 </P>
204 <dl>
205 <dt>Realy important
206 <ul>
207 <li>-p[ort] port number<br>
208 For example -p 2345<br>
209 Obviously the port CGIservlet listenes to. Default: -p 8080
211 <li>-a[lias] Alias1 RealURL1 ...<br>
212 For example -a '/Stimulus.aifc' '/catAIFC.xmr'<br>
213 Replaces the given Alias URL path by its real URL path. Accepts full
214 regular expressions too (identified by NON-URL characters).<br>
215 That is, on each request it performs (in order):<br>
216 <pre>
217 if($AliasTranslation{$Path})
219 $Path = $AliasTranslation{$Path};
221 elsif(@RegAliasTranslation)
223 my $i;
224 for($i=0; $i&lt;scalar(@RegAliasTranslation); ++$i)
226 my $Alias = $RegAliasTranslation[$i];
227 my $RealURL = $RegURLTranslation[$i];
228 last if ($Path =~ s#$Alias#$RealURL#g);
231 </pre>
233 The effects can be quite drastic, so be
234 carefull. Note also, that entering many Regular Expression
235 aliases could slow down your servlet. Checking stops after
236 the first match.<br>
237 Full regular expression alias translations are done in the
238 order given! They are recognized as Aliases containing
239 regexp's (i.e., non-URL) operator characters like '^' and
240 '$'.<br>
241 Note: The command line is NOT a good place for entering
242 Aliases, change the code below or add aliases to
243 CGIservletSETUP.pl.
244 </p>
246 <li>--help<br>
247 Prints the manual
249 </ul>
251 <dt>Script related
252 <ul>
253 <li>-b[egin] perl commands<br>
254 For example -b 'require "CGIscriptor.pl";' or
255 'require "/WWW/cgi-bin/XMLelement.pl";'<br>
256 Perl commands evaluated at startup
258 <li>-d[o] perl script file<BR>
259 For example -d '/WWW/cgi-bin/CGIscriptor.pl'<br>
260 The actual CGI-script started as a perl {do "scriptfile"} command.
261 The PATH_INFO and the QUERY are pushed on @ARGV.
263 <li>-x shell command
264 <li>-qx shell command
265 <li>-exec shell command<br>
266 OS shell script or command, e.g., -x 'CGIscriptor.pl' or
267 -x '/WWW/cgi-bin/my-script' <br>
268 The actual CGI-script started as `my-script \'$Path\' \'$QueryString\'`.
269 -qx and -exec[ute] are aliases of -x. For secutiry reasons, Paths or
270 queries containing '-quotes are rejected.
272 <li>-e[val] perl commands<br>
273 For example -e 'Handle_Request();' <br>
274 The argument is evaluated as perl code. The actual CGI-script
275 can be loaded once with -b 'require module.pm' and you only have to
276 call the central function(s).
277 </ul>
279 <dt>WWW-tree related
280 <ul>
281 <li>-t[extroot] path
282 For example -t "$PWD" or -t "/WWW/documents"<br>
283 The root of the server hierachy. Defaults to the working directory
284 at startup time (`pwd`)
286 <li>-w[elcome] filepath
287 For example -w "/index.html" (default)<br>
288 The default welcome page used when no path is entered. Note that
289 this path can point to anything (or nothing real at all).
290 </ul>
291 <dt>Security related<br>
292 The following arguments supply some rudimentary security. It is the
293 responsibility of the script to ensure that the requests are indeed
294 "legal".
295 <ul>
296 <li>-c[hildren] maximum child processes<br>
297 For example -c 32<br>
298 The maximum number of subprocesses started. If there are more requests,
299 the oldest requests are "killed". This should take care of "zombie"
300 processes and server overloading. Note that new requests will be
301 serviced irrespective of the success of killing it's older siblings.
303 <li>-xtime maximum running time of a child<br>
304 For example -xtime 36000<br>
305 The maximum time a child may run in seconds. After a new request has
306 been servised, all children that have run for longer than this time
307 will be killed. This stops runaway processes, often connected to
308 web-crawlers.
310 <li>-l[ength] maximum length of request in bytes<br>
311 For example -l 32768 <br>
312 This prevents overloading the server with enormous queries. Reading of
313 requests simply stops when this limit is reached. This DOES affect
314 POST requests. If the combined length of the COMPLETE HTTP request,
315 including headers, exceeds this limit, the whole request is dropped.
317 <li>-r[estrict] [Remote-address [Remote-host]]<br>
318 For example -r 127.0.0.1 (default of -r)<br>
319 A space separated list of client IP addresses and/or domain names that
320 should be serviced. Default, i.e., '-r' without any addresses or domain
321 names, is the localhost IP address '127.0.0.1'.<br>
322 When using CGIservlet for local purposes only (e.g., development or a
323 presentation), it would be unsafe to allow others to access the servlet.
324 If -r is used (or the corresponding @RemoteAddr or @RemoteHost lists are
325 filled in the code below), all requests from clients whose Remote-address
326 or Remote-host do not match the indicated addresses will be rejected.
327 Partial addresses and domain names are allowed. Matching is done according
328 to Remote-addr =~ /^\Q$pattern\E/ (front to back) and
329 Remote-host =~ /\Q$pattern\E$/ (back to front)
331 <li>--env name=value,name=value<br>
332 Watch double dash. Define $ENV{name}=value for every pair. These are
333 internally stored in %UserEnv, eg, $UserEnv{name}=value; This is set anew
334 in the Child with every request. That is, Changes in %ENV are not stored.
336 <li>--USEFAT<br>
337 Watch double dash. Define $ENV{USEFAT}=1. This is used to signal that
338 runs from an MS FAT file system without file permissions.
340 <li>-m[emory]<br>
341 No arguments.<br>
342 Reads complete Web site into memory and runs from this image.
343 Set $UseRAMimage = 1; to activate memory-only running.<br>
344 Note that running osshellscripts from this
345 image makes any "security" related claims very shaky.<br>
346 </ul>
347 <dt>Speedup
348 <ul>
349 <li>-n[oname]<br>
350 No arguments. <br>
351 Retrieving the domain name of the Client (i.e., Remote-host) is a
352 very slow process and normally useless. To skip it, enter this
353 option. Note that you cannot use '-r Remote-host' anymore after
354 you enter -n, only IP addresses will work.
355 </ul>
356 </dl>
358 <H3 ALIGN="CENTER">Configuration with the <a href="/CGIservletSETUP.pl">CGIservletSETUP.pl</a> file</H3>
361 You can add your own configuration in a file
362 called '<a href="/CGIservletSETUP.pl">CGIservletSETUP.pl</a>'.
363 This file will be executed ("eval"-ed)
364 after the default setup, but before the command line options take
365 effect. CGIservlet looks for the SETUP file in the startup directory
366 and in the CGIservlet and CGIscriptor subdirectories.
367 (Note that the $beginarg variable is evaluated even later).
368 </P>
371 <H3 ALIGN="CENTER">Changing POST to GET requests</H3>
374 CGIservlet normally only handles requests with the GET method. Processing
375 the input from POST requests is left to the reading application. POST
376 requests add some extra complexity to processing requests. Sometimes,
377 the reading application doesn't handle POST requests. CGIservlet
378 already has to manage the HTTP request. Therefore, it can easily
379 handle the POST request. If the variable $POSTtoGET is set to any
380 non-false value, the content of whole POST request is added to the
381 QUERY_STRING environment variable (preceeded by a '&' if necessary).
382 The content-length is set to 0. If $POSTtoGET equals 'GET', the method
383 will also be changed to 'GET'.
384 </P>
386 <H3>remarks:</H3>
389 All of the arguments of -d, -x, and -e are processed sequentially
390 in this order. This might not be what you want so you should be
391 carefull when using multiple executable arguments.
392 If none of the executable arguments is DEFINED (i.e., they are entered
393 as -d '' -e '' -x ''), each request is treated as a simple
394 text-retrieval. THIS CAN BE A SECURITY RISK!
395 </P>
398 The wiring of an interactive web-server, which also calls shell
399 scripts with the extension '.cgi', is in place. You can
400 "activate" it by changing the "$ExecuteOSshell = 0;" line to
401 "$ExecuteOSshell = 1;".<br>
402 If you have trouble doing this, it might be a good idea
403 to reconsider using a dynamic web server. Executing shell
404 scripts inside a web server is a rather dangerous practise.
405 </P>
408 CGIservlet can run its "standard" web server from memory.
409 At startup, all files are read into a hash table. Upon
410 request, the contents of the file are placed in the
411 environment variable: CGI_FILE_CONTENTS.<br>
412 No further disk access is necessary. This means that:
413 <ol>
414 <li> CGIservlet can run a WWW site from a removable disk,
415 e.g., a floppy
416 <li> The web servlet can run without any read or write privilege.
417 <li> The integrity of the Web-site contents can be secured at the
418 level you want
419 </ol>
420 To compres the memory (RAM) immage, you should hook the
421 compression function to <br>
422 $CompressRAMimage = sub { return shift;}; <br>
423 and the decompression function to <br>
424 $DecompressRAMimage = sub { return shift;}; <br>
425 </P>
427 <H2 ALIGN="CENTER">license</H2>
430 This program is free software; you can redistribute it and/or
431 modify it under the terms of the GNU General Public License
432 as published by the Free Software Foundation; either version 2
433 of the License, or (at your option) any later version.
434 </P>
437 This program is distributed in the hope that it will be useful,
438 but WITHOUT ANY WARRANTY; without even the implied warranty of
439 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
440 GNU General Public License for more details.
441 </P>
444 You should have received a copy of the GNU General Public License
445 along with this program; if not, write to the Free Software
446 Foundation, Inc., 59 Temple Place - Suite 330,
447 Boston, MA 02111-1307, USA.
448 </P>
450 <PRE>
451 Author: Rob van Son
452 email:
453 R.J.J.H.vanSon@gmail.com
454 Institute of Phonetic Sciences/ACLC
455 University of Amsterdam
457 copying freely from the mhttpd server by Jerry LeVan (levan@eagle.eku.edu)
458 Date: 15 Jan 2002
459 Ver: 1.3
460 Env: Perl 5.002 and later
462 Note: CGIservlet.pl was directly inspired by Jerry LeVan's
463 (levan@eagle.eku.edu) simple mhttpd server which again was
464 inspired by work of others. CGIservlet is used as a bare bones
465 socket server for a single CGI script at a time.
466 </PRE>
468 </BODY>
470 </HTML>