1 <?xml version="1.0" encoding="UTF-8" ?>
2 <!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
3 <?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
4 <!-- $LastChangedRevision$ -->
7 Licensed to the Apache Software Foundation (ASF) under one or more
8 contributor license agreements. See the NOTICE file distributed with
9 this work for additional information regarding copyright ownership.
10 The ASF licenses this file to You under the Apache License, Version 2.0
11 (the "License"); you may not use this file except in compliance with
12 the License. You may obtain a copy of the License at
14 http://www.apache.org/licenses/LICENSE-2.0
16 Unless required by applicable law or agreed to in writing, software
17 distributed under the License is distributed on an "AS IS" BASIS,
18 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19 See the License for the specific language governing permissions and
20 limitations under the License.
23 <manualpage metafile="rewrite_guide_advanced.xml.meta">
24 <parentdocument href="./">Rewrite</parentdocument>
26 <title>URL Rewriting Guide - Advanced topics</title>
30 <p>This document supplements the <module>mod_rewrite</module>
31 <a href="../mod/mod_rewrite.html">reference documentation</a>.
32 It describes how one can use Apache's <module>mod_rewrite</module>
33 to solve typical URL-based problems with which webmasters are
34 commonly confronted. We give detailed descriptions on how to
35 solve each problem by configuring URL rewriting rulesets.</p>
37 <note type="warning">ATTENTION: Depending on your server configuration
38 it may be necessary to adjust the examples for your
39 situation, e.g., adding the <code>[PT]</code> flag if
40 using <module>mod_alias</module> and
41 <module>mod_userdir</module>, etc. Or rewriting a ruleset
42 to work in <code>.htaccess</code> context instead
43 of per-server context. Always try to understand what a
44 particular ruleset really does before you use it; this
45 avoids many problems.</note>
48 <seealso><a href="../mod/mod_rewrite.html">Module
49 documentation</a></seealso>
50 <seealso><a href="rewrite_intro.html">mod_rewrite
51 introduction</a></seealso>
52 <seealso><a href="rewrite_guide.html">Rewrite Guide - useful
53 examples</a></seealso>
54 <seealso><a href="rewrite_tech.html">Technical details</a></seealso>
57 <section id="cluster">
59 <title>Web Cluster with Consistent URL Space</title>
65 <p>We want to create a homogeneous and consistent URL
66 layout across all WWW servers on an Intranet web cluster, i.e.,
67 all URLs (by definition server-local and thus
68 server-dependent!) become server <em>independent</em>!
69 What we want is to give the WWW namespace a single consistent
70 layout: no URL should refer to
71 any particular target server. The cluster itself
72 should connect users automatically to a physical target
73 host as needed, invisibly.</p>
79 <p>First, the knowledge of the target servers comes from
80 (distributed) external maps which contain information on
81 where our users, groups, and entities reside. They have the
90 <p>We put them into files <code>map.xxx-to-host</code>.
91 Second we need to instruct all servers to redirect URLs
103 http://physical-host/u/user/anypath
104 http://physical-host/g/group/anypath
105 http://physical-host/e/entity/anypath
108 <p>when any URL path need not be valid on every server. The
109 following ruleset does this for us with the help of the map
110 files (assuming that server0 is a default server which
111 will be used if a user has no entry in the map):</p>
116 RewriteMap user-to-host txt:/path/to/map.user-to-host
117 RewriteMap group-to-host txt:/path/to/map.group-to-host
118 RewriteMap entity-to-host txt:/path/to/map.entity-to-host
120 RewriteRule ^/u/<strong>([^/]+)</strong>/?(.*) http://<strong>${user-to-host:$1|server0}</strong>/u/$1/$2
121 RewriteRule ^/g/<strong>([^/]+)</strong>/?(.*) http://<strong>${group-to-host:$1|server0}</strong>/g/$1/$2
122 RewriteRule ^/e/<strong>([^/]+)</strong>/?(.*) http://<strong>${entity-to-host:$1|server0}</strong>/e/$1/$2
124 RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/
125 RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
132 <section id="structuredhomedirs">
134 <title>Structured Homedirs</title>
137 <dt>Description:</dt>
140 <p>Some sites with thousands of users use a
141 structured homedir layout, <em>i.e.</em> each homedir is in a
142 subdirectory which begins (for instance) with the first
143 character of the username. So, <code>/~foo/anypath</code>
144 is <code>/home/<strong>f</strong>/foo/.www/anypath</code>
145 while <code>/~bar/anypath</code> is
146 <code>/home/<strong>b</strong>/bar/.www/anypath</code>.</p>
152 <p>We use the following ruleset to expand the tilde URLs
153 into the above layout.</p>
157 RewriteRule ^/~(<strong>([a-z])</strong>[a-z0-9]+)(.*) /home/<strong>$2</strong>/$1/.www$3
164 <section id="filereorg">
166 <title>Filesystem Reorganization</title>
169 <dt>Description:</dt>
172 <p>This really is a hardcore example: a killer application
173 which heavily uses per-directory
174 <code>RewriteRules</code> to get a smooth look and feel
175 on the Web while its data structure is never touched or
176 adjusted. Background: <strong><em>net.sw</em></strong> is
177 my archive of freely available Unix software packages,
178 which I started to collect in 1992. It is both my hobby
179 and job to do this, because while I'm studying computer
180 science I have also worked for many years as a system and
181 network administrator in my spare time. Every week I need
182 some sort of software so I created a deep hierarchy of
183 directories where I stored the packages:</p>
186 drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/
187 drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/
188 drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/
189 drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/
190 drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/
191 drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/
192 drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/
193 drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/
194 drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/
195 drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/
196 drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/
197 drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/
198 drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/
199 drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/
200 drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/
201 drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/
204 <p>In July 1996 I decided to make this archive public to
205 the world via a nice Web interface. "Nice" means that I
206 wanted to offer an interface where you can browse
207 directly through the archive hierarchy. And "nice" means
208 that I didn't want to change anything inside this
209 hierarchy - not even by putting some CGI scripts at the
210 top of it. Why? Because the above structure should later be
211 accessible via FTP as well, and I didn't want any
212 Web or CGI stuff mixed in there.</p>
218 <p>The solution has two parts: The first is a set of CGI
219 scripts which create all the pages at all directory
220 levels on-the-fly. I put them under
221 <code>/e/netsw/.www/</code> as follows:</p>
224 -rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl
225 drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/
226 -rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE
227 -rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO
228 -rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html
229 -rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl
230 -rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi
231 -rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi
232 drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/
233 -rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi
234 -rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi
235 -rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi
236 -rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst
239 <p>The <code>DATA/</code> subdirectory holds the above
240 directory structure, <em>i.e.</em> the real
241 <strong><em>net.sw</em></strong> stuff, and gets
242 automatically updated via <code>rdist</code> from time to
243 time. The second part of the problem remains: how to link
244 these two structures together into one smooth-looking URL
245 tree? We want to hide the <code>DATA/</code> directory
246 from the user while running the appropriate CGI scripts
247 for the various URLs. Here is the solution: first I put
248 the following into the per-directory configuration file
249 in the <directive module="core">DocumentRoot</directive>
250 of the server to rewrite the public URL path
251 <code>/net.sw/</code> to the internal path
252 <code>/e/netsw</code>:</p>
255 RewriteRule ^net.sw$ net.sw/ [R]
256 RewriteRule ^net.sw/(.*)$ e/netsw/$1
259 <p>The first rule is for requests which miss the trailing
260 slash! The second rule does the real thing. And then
261 comes the killer configuration which stays in the
262 per-directory config file
263 <code>/e/netsw/.www/.wwwacl</code>:</p>
266 Options ExecCGI FollowSymLinks Includes MultiViews
270 # we are reached via /net.sw/ prefix
273 # first we rewrite the root dir to
274 # the handling cgi script
275 RewriteRule ^$ netsw-home.cgi [L]
276 RewriteRule ^index\.html$ netsw-home.cgi [L]
278 # strip out the subdirs when
279 # the browser requests us from perdir pages
280 RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L]
282 # and now break the rewriting for local files
283 RewriteRule ^netsw-home\.cgi.* - [L]
284 RewriteRule ^netsw-changes\.cgi.* - [L]
285 RewriteRule ^netsw-search\.cgi.* - [L]
286 RewriteRule ^netsw-tree\.cgi$ - [L]
287 RewriteRule ^netsw-about\.html$ - [L]
288 RewriteRule ^netsw-img/.*$ - [L]
290 # anything else is a subdir which gets handled
291 # by another cgi script
292 RewriteRule !^netsw-lsdir\.cgi.* - [C]
293 RewriteRule (.*) netsw-lsdir.cgi/$1
296 <p>Some hints for interpretation:</p>
299 <li>Notice the <code>L</code> (last) flag and no
300 substitution field ('<code>-</code>') in the fourth part</li>
302 <li>Notice the <code>!</code> (not) character and
303 the <code>C</code> (chain) flag at the first rule
304 in the last part</li>
306 <li>Notice the catch-all pattern in the last rule</li>
313 <section id="redirect404">
315 <title>Redirect Failing URLs to Another Web Server</title>
318 <dt>Description:</dt>
321 <p>A typical FAQ about URL rewriting is how to redirect
322 failing requests on webserver A to webserver B. Usually
323 this is done via <directive module="core"
324 >ErrorDocument</directive> CGI scripts in Perl, but
325 there is also a <module>mod_rewrite</module> solution.
326 But note that this performs more poorly than using an
327 <directive module="core">ErrorDocument</directive>
334 <p>The first solution has the best performance but less
335 flexibility, and is less safe:</p>
339 RewriteCond %{DOCUMENT_ROOT/%{REQUEST_URI} <strong>!-f</strong>
340 RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1
343 <p>The problem here is that this will only work for pages
344 inside the <directive module="core">DocumentRoot</directive>. While you can add more
345 Conditions (for instance to also handle homedirs, etc.)
346 there is a better variant:</p>
350 RewriteCond %{REQUEST_URI} <strong>!-U</strong>
351 RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1
354 <p>This uses the URL look-ahead feature of <module>mod_rewrite</module>.
355 The result is that this will work for all types of URLs
356 and is safe. But it does have a performance impact on
357 the web server, because for every request there is one
358 more internal subrequest. So, if your web server runs on a
359 powerful CPU, use this one. If it is a slow machine, use
360 the first approach or better an <directive module="core"
361 >ErrorDocument</directive> CGI script.</p>
367 <section id="archive-access-multiplexer">
369 <title>Archive Access Multiplexer</title>
372 <dt>Description:</dt>
375 <p>Do you know the great CPAN (Comprehensive Perl Archive
376 Network) under <a href="http://www.perl.com/CPAN"
377 >http://www.perl.com/CPAN</a>?
378 CPAN automatically redirects browsers to one of many FTP
379 servers around the world (generally one near the requesting
380 client); each server carries a full CPAN mirror. This is
381 effectively an FTP access multiplexing service.
382 CPAN runs via CGI scripts, but how could a similar approach
383 be implemented via <module>mod_rewrite</module>?</p>
389 <p>First we notice that as of version 3.0.0,
390 <module>mod_rewrite</module> can
391 also use the "<code>ftp:</code>" scheme on redirects.
392 And second, the location approximation can be done by a
393 <directive module="mod_rewrite">RewriteMap</directive>
394 over the top-level domain of the client.
395 With a tricky chained ruleset we can use this top-level
396 domain as a key to our multiplexing map.</p>
400 RewriteMap multiplex txt:/path/to/map.cxan
401 RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C]
402 RewriteRule ^.+\.<strong>([a-zA-Z]+)</strong>::(.*)$ ${multiplex:<strong>$1</strong>|ftp.default.dom}$2 [R,L]
407 ## map.cxan -- Multiplexing Map for CxAN
410 de ftp://ftp.cxan.de/CxAN/
411 uk ftp://ftp.cxan.uk/CxAN/
412 com ftp://ftp.cxan.com/CxAN/
421 <section id="browser-dependent-content">
423 <title>Browser Dependent Content</title>
426 <dt>Description:</dt>
429 <p>At least for important top-level pages it is sometimes
430 necessary to provide the optimum of browser dependent
431 content, i.e., one has to provide one version for
432 current browsers, a different version for the Lynx and text-mode
433 browsers, and another for other browsers.</p>
439 <p>We cannot use content negotiation because the browsers do
440 not provide their type in that form. Instead we have to
441 act on the HTTP header "User-Agent". The following config
442 does the following: If the HTTP header "User-Agent"
443 begins with "Mozilla/3", the page <code>foo.html</code>
444 is rewritten to <code>foo.NS.html</code> and the
445 rewriting stops. If the browser is "Lynx" or "Mozilla" of
446 version 1 or 2, the URL becomes <code>foo.20.html</code>.
447 All other browsers receive page <code>foo.32.html</code>.
448 This is done with the following ruleset:</p>
451 RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/3</strong>.*
452 RewriteRule ^foo\.html$ foo.<strong>NS</strong>.html [<strong>L</strong>]
454 RewriteCond %{HTTP_USER_AGENT} ^<strong>Lynx/</strong>.* [OR]
455 RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/[12]</strong>.*
456 RewriteRule ^foo\.html$ foo.<strong>20</strong>.html [<strong>L</strong>]
458 RewriteRule ^foo\.html$ foo.<strong>32</strong>.html [<strong>L</strong>]
465 <section id="dynamic-mirror">
467 <title>Dynamic Mirror</title>
470 <dt>Description:</dt>
473 <p>Assume there are nice web pages on remote hosts we want
474 to bring into our namespace. For FTP servers we would use
475 the <code>mirror</code> program which actually maintains an
476 explicit up-to-date copy of the remote data on the local
477 machine. For a web server we could use the program
478 <code>webcopy</code> which runs via HTTP. But both
479 techniques have a major drawback: The local copy is
480 always only as up-to-date as the last time we ran the program. It
481 would be much better if the mirror was not a static one we
482 have to establish explicitly. Instead we want a dynamic
483 mirror with data which gets updated automatically
484 as needed on the remote host(s).</p>
490 <p>To provide this feature we map the remote web page or even
491 the complete remote web area to our namespace by the use
492 of the <dfn>Proxy Throughput</dfn> feature
493 (flag <code>[P]</code>):</p>
498 RewriteRule ^<strong>hotsheet/</strong>(.*)$ <strong>http://www.tstimpreso.com/hotsheet/</strong>$1 [<strong>P</strong>]
504 RewriteRule ^<strong>usa-news\.html</strong>$ <strong>http://www.quux-corp.com/news/index.html</strong> [<strong>P</strong>]
511 <section id="reverse-dynamic-mirror">
513 <title>Reverse Dynamic Mirror</title>
516 <dt>Description:</dt>
525 RewriteCond /mirror/of/remotesite/$1 -U
526 RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
533 <section id="retrieve-missing-data">
535 <title>Retrieve Missing Data from Intranet</title>
538 <dt>Description:</dt>
541 <p>This is a tricky way of virtually running a corporate
542 (external) Internet web server
543 (<code>www.quux-corp.dom</code>), while actually keeping
544 and maintaining its data on an (internal) Intranet web server
545 (<code>www2.quux-corp.dom</code>) which is protected by a
546 firewall. The trick is that the external web server retrieves
547 the requested data on-the-fly from the internal
554 <p>First, we must make sure that our firewall still
555 protects the internal web server and only the
556 external web server is allowed to retrieve data from it.
557 On a packet-filtering firewall, for instance, we could
558 configure a firewall ruleset like the following:</p>
561 <strong>ALLOW</strong> Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port <strong>80</strong>
562 <strong>DENY</strong> Host * Port * --> Host www2.quux-corp.dom Port <strong>80</strong>
565 <p>Just adjust it to your actual configuration syntax.
566 Now we can establish the <module>mod_rewrite</module>
567 rules which request the missing data in the background
568 through the proxy throughput feature:</p>
571 RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2 [C]
572 # REQUEST_FILENAME usage below is correct in this per-server context example
573 # because the rule that references REQUEST_FILENAME is chained to a rule that
574 # sets REQUEST_FILENAME.
575 RewriteCond %{REQUEST_FILENAME} <strong>!-f</strong>
576 RewriteCond %{REQUEST_FILENAME} <strong>!-d</strong>
577 RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>]
584 <section id="load-balancing">
586 <title>Load Balancing</title>
589 <dt>Description:</dt>
592 <p>Suppose we want to load balance the traffic to
593 <code>www.example.com</code> over <code>www[0-5].example.com</code>
594 (a total of 6 servers). How can this be done?</p>
600 <p>There are many possible solutions for this problem.
601 We will first discuss a common DNS-based method,
602 and then one based on <module>mod_rewrite</module>:</p>
606 <strong>DNS Round-Robin</strong>
608 <p>The simplest method for load-balancing is to use
610 Here you just configure <code>www[0-9].example.com</code>
611 as usual in your DNS with A (address) records, e.g.,</p>
622 <p>Then you additionally add the following entries:</p>
632 <p>Now when <code>www.example.com</code> gets
633 resolved, <code>BIND</code> gives out <code>www0-www5</code>
634 - but in a permutated (rotated) order every time.
635 This way the clients are spread over the various
636 servers. But notice that this is not a perfect load
637 balancing scheme, because DNS resolutions are
638 cached by clients and other nameservers, so
639 once a client has resolved <code>www.example.com</code>
640 to a particular <code>wwwN.example.com</code>, all its
641 subsequent requests will continue to go to the same
642 IP (and thus a single server), rather than being
643 distributed across the other available servers. But the
645 okay because the requests are collectively
646 spread over the various web servers.</p>
650 <strong>DNS Load-Balancing</strong>
652 <p>A sophisticated DNS-based method for
653 load-balancing is to use the program
654 <code>lbnamed</code> which can be found at <a
655 href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html">
656 http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>.
657 It is a Perl 5 program which, in conjunction with auxilliary
658 tools, provides real load-balancing via
663 <strong>Proxy Throughput Round-Robin</strong>
665 <p>In this variant we use <module>mod_rewrite</module>
666 and its proxy throughput feature. First we dedicate
667 <code>www0.example.com</code> to be actually
668 <code>www.example.com</code> by using a single</p>
671 www IN CNAME www0.example.com.
674 <p>entry in the DNS. Then we convert
675 <code>www0.example.com</code> to a proxy-only server,
676 i.e., we configure this machine so all arriving URLs
677 are simply passed through its internal proxy to one of
678 the 5 other servers (<code>www1-www5</code>). To
679 accomplish this we first establish a ruleset which
680 contacts a load balancing script <code>lb.pl</code>
685 RewriteMap lb prg:/path/to/lb.pl
686 RewriteRule ^/(.+)$ ${lb:$1} [P,L]
689 <p>Then we write <code>lb.pl</code>:</p>
694 ## lb.pl -- load balancing script
699 $name = "www"; # the hostname base
700 $first = 1; # the first server (not 0 here, because 0 is myself)
701 $last = 5; # the last server in the round-robin
702 $domain = "foo.dom"; # the domainname
705 while (<STDIN>) {
706 $cnt = (($cnt+1) % ($last+1-$first));
707 $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
708 print "http://$server/$_";
714 <note>A last notice: Why is this useful? Seems like
715 <code>www0.example.com</code> still is overloaded? The
716 answer is yes, it is overloaded, but with plain proxy
717 throughput requests, only! All SSI, CGI, ePerl, etc.
718 processing is handled done on the other machines.
719 For a complicated site, this may work well. The biggest
720 risk here is that www0 is now a single point of failure --
721 if it crashes, the other servers are inaccessible.</note>
725 <strong>Dedicated Load Balancers</strong>
727 <p>There are more sophisticated solutions, as well. Cisco,
728 F5, and several other companies sell hardware load
729 balancers (typically used in pairs for redundancy), which
730 offer sophisticated load balancing and auto-failover
731 features. There are software packages which offer similar
732 features on commodity hardware, as well. If you have
733 enough money or need, check these out. The <a
734 href="http://vegan.net/lb/">lb-l mailing list</a> is a
735 good place to research.</p>
743 <section id="new-mime-type">
745 <title>New MIME-type, New Service</title>
748 <dt>Description:</dt>
751 <p>On the net there are many nifty CGI programs. But
752 their usage is usually boring, so a lot of webmasters
753 don't use them. Even Apache's Action handler feature for
754 MIME-types is only appropriate when the CGI programs
755 don't need special URLs (actually <code>PATH_INFO</code>
756 and <code>QUERY_STRINGS</code>) as their input. First,
757 let us configure a new file type with extension
758 <code>.scgi</code> (for secure CGI) which will be processed
759 by the popular <code>cgiwrap</code> program. The problem
760 here is that for instance if we use a Homogeneous URL Layout
761 (see above) a file inside the user homedirs might have a URL
762 like <code>/u/user/foo/bar.scgi</code>, but
763 <code>cgiwrap</code> needs URLs in the form
764 <code>/~user/foo/bar.scgi/</code>. The following rule
765 solves the problem:</p>
768 RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ...
769 ... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3 [NS,<strong>T=application/x-http-cgi</strong>]
772 <p>Or assume we have some more nifty programs:
773 <code>wwwlog</code> (which displays the
774 <code>access.log</code> for a URL subtree) and
775 <code>wwwidx</code> (which runs Glimpse on a URL
776 subtree). We have to provide the URL area to these
777 programs so they know which area they are really working with.
778 But usually this is complicated, because they may still be
779 requested by the alternate URL form, i.e., typically we would
780 run the <code>swwidx</code> program from within
781 <code>/u/user/foo/</code> via hyperlink to</p>
784 /internal/cgi/user/swwidx?i=/u/user/foo/
787 <p>which is ugly, because we have to hard-code
788 <strong>both</strong> the location of the area
789 <strong>and</strong> the location of the CGI inside the
790 hyperlink. When we have to reorganize, we spend a
791 lot of time changing the various hyperlinks.</p>
797 <p>The solution here is to provide a special new URL format
798 which automatically leads to the proper CGI invocation.
799 We configure the following:</p>
802 RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/
803 RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
806 <p>Now the hyperlink to search at
807 <code>/u/user/foo/</code> reads only</p>
813 <p>which internally gets automatically transformed to</p>
816 /internal/cgi/user/wwwidx?i=/u/user/foo/
819 <p>The same approach leads to an invocation for the
820 access log CGI program when the hyperlink
821 <code>:log</code> gets used.</p>
827 <section id="on-the-fly-content">
829 <title>On-the-fly Content-Regeneration</title>
832 <dt>Description:</dt>
835 <p>Here comes a really esoteric feature: Dynamically
836 generated but statically served pages, i.e., pages should be
837 delivered as pure static pages (read from the filesystem
838 and just passed through), but they have to be generated
839 dynamically by the web server if missing. This way you can
840 have CGI-generated pages which are statically served unless an
841 admin (or a <code>cron</code> job) removes the static contents. Then the
842 contents gets refreshed.</p>
848 This is done via the following ruleset:
851 # This example is valid in per-directory context only
852 RewriteCond %{REQUEST_FILENAME} <strong>!-s</strong>
853 RewriteRule ^page\.<strong>html</strong>$ page.<strong>cgi</strong> [T=application/x-httpd-cgi,L]
856 <p>Here a request for <code>page.html</code> leads to an
857 internal run of a corresponding <code>page.cgi</code> if
858 <code>page.html</code> is missing or has filesize
859 null. The trick here is that <code>page.cgi</code> is a
860 CGI script which (additionally to its <code>STDOUT</code>)
861 writes its output to the file <code>page.html</code>.
862 Once it has completed, the server sends out
863 <code>page.html</code>. When the webmaster wants to force
864 a refresh of the contents, he just removes
865 <code>page.html</code> (typically from <code>cron</code>).</p>
871 <section id="autorefresh">
873 <title>Document With Autorefresh</title>
876 <dt>Description:</dt>
879 <p>Wouldn't it be nice, while creating a complex web page, if
880 the web browser would automatically refresh the page every
881 time we save a new version from within our editor?
888 <p>No! We just combine the MIME multipart feature, the
889 web server NPH feature, and the URL manipulation power of
890 <module>mod_rewrite</module>. First, we establish a new
891 URL feature: Adding just <code>:refresh</code> to any
892 URL causes the 'page' to be refreshed every time it is
893 updated on the filesystem.</p>
896 RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1
899 <p>Now when we reference the URL</p>
902 /u/foo/bar/page.html:refresh
905 <p>this leads to the internal invocation of the URL</p>
908 /internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
911 <p>The only missing part is the NPH-CGI script. Although
912 one would usually say "left as an exercise to the reader"
913 ;-) I will provide this, too.</p>
918 ## nph-refresh -- NPH/CGI script for auto refreshing pages
919 ## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
923 # split the QUERY_STRING variable
924 @pairs = split(/&/, $ENV{'QUERY_STRING'});
925 foreach $pair (@pairs) {
926 ($name, $value) = split(/=/, $pair);
927 $name =~ tr/A-Z/a-z/;
928 $name = 'QS_' . $name;
929 $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
930 eval "\$$name = \"$value\"";
932 $QS_s = 1 if ($QS_s eq '');
933 $QS_n = 3600 if ($QS_n eq '');
935 print "HTTP/1.0 200 OK\n";
936 print "Content-type: text/html\n\n";
937 print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
941 print "HTTP/1.0 200 OK\n";
942 print "Content-type: text/html\n\n";
943 print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
947 sub print_http_headers_multipart_begin {
948 print "HTTP/1.0 200 OK\n";
949 $bound = "ThisRandomString12345";
950 print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
951 &print_http_headers_multipart_next;
954 sub print_http_headers_multipart_next {
955 print "\n--$bound\n";
958 sub print_http_headers_multipart_end {
959 print "\n--$bound--\n";
964 $len = length($buffer);
965 print "Content-type: text/html\n";
966 print "Content-length: $len\n\n";
972 local(*FP, $size, $buffer, $bytes);
973 ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
974 $size = sprintf("%d", $size);
975 open(FP, "&lt;$file");
976 $bytes = sysread(FP, $buffer, $size);
981 $buffer = &readfile($QS_f);
982 &print_http_headers_multipart_begin;
983 &displayhtml($buffer);
986 local($file) = $_[0];
989 ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
993 $mtimeL = &mystat($QS_f);
995 for ($n = 0; $n &lt; $QS_n; $n++) {
997 $mtime = &mystat($QS_f);
998 if ($mtime ne $mtimeL) {
1001 $buffer = &readfile($QS_f);
1002 &print_http_headers_multipart_next;
1003 &displayhtml($buffer);
1005 $mtimeL = &mystat($QS_f);
1012 &print_http_headers_multipart_end;
1023 <section id="mass-virtual-hosting">
1025 <title>Mass Virtual Hosting</title>
1028 <dt>Description:</dt>
1031 <p>The <directive type="section" module="core"
1032 >VirtualHost</directive> feature of Apache is nice
1033 and works great when you just have a few dozen
1034 virtual hosts. But when you are an ISP and have hundreds of
1035 virtual hosts, this feature is suboptimal.</p>
1041 <p>To provide this feature we map the remote web page or even
1042 the complete remote web area to our namespace using the
1043 <dfn>Proxy Throughput</dfn> feature (flag <code>[P]</code>):</p>
1049 www.vhost1.dom:80 /path/to/docroot/vhost1
1050 www.vhost2.dom:80 /path/to/docroot/vhost2
1052 www.vhostN.dom:80 /path/to/docroot/vhostN
1060 # use the canonical hostname on redirects, etc.
1064 # add the virtual host in front of the CLF-format
1065 CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
1068 # enable the rewriting engine in the main server
1071 # define two maps: one for fixing the URL and one which defines
1072 # the available virtual hosts with their corresponding
1074 RewriteMap lowercase int:tolower
1075 RewriteMap vhost txt:/path/to/vhost.map
1077 # Now do the actual virtual host mapping
1078 # via a huge and complicated single rule:
1080 # 1. make sure we don't map for common locations
1081 RewriteCond %{REQUEST_URI} !^/commonurl1/.*
1082 RewriteCond %{REQUEST_URI} !^/commonurl2/.*
1084 RewriteCond %{REQUEST_URI} !^/commonurlN/.*
1086 # 2. make sure we have a Host header, because
1087 # currently our approach only supports
1088 # virtual hosting through this header
1089 RewriteCond %{HTTP_HOST} !^$
1091 # 3. lowercase the hostname
1092 RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$
1094 # 4. lookup this hostname in vhost.map and
1095 # remember it only when it is a path
1096 # (and not "NONE" from above)
1097 RewriteCond ${vhost:%1} ^(/.*)$
1099 # 5. finally we can map the URL to its docroot location
1100 # and remember the virtual host for logging purposes
1101 RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}]
1109 <section id="host-deny">
1111 <title>Host Deny</title>
1114 <dt>Description:</dt>
1117 <p>How can we forbid a list of externally configured hosts
1118 from using our server?</p>
1124 <p>For Apache >= 1.3b6:</p>
1128 RewriteMap hosts-deny txt:/path/to/hosts.deny
1129 RewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
1130 RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
1131 RewriteRule ^/.* - [F]
1134 <p>For Apache <= 1.3b6:</p>
1138 RewriteMap hosts-deny txt:/path/to/hosts.deny
1139 RewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
1140 RewriteRule !^NOT-FOUND/.* - [F]
1141 RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
1142 RewriteRule !^NOT-FOUND/.* - [F]
1143 RewriteRule ^NOT-FOUND/(.*)$ /$1
1150 ## ATTENTION! This is a map, not a list, even when we treat it as such.
1151 ## mod_rewrite parses it for key/value pairs, so at least a
1152 ## dummy value "-" must be present for each entry.
1164 <section id="proxy-deny">
1166 <title>Proxy Deny</title>
1169 <dt>Description:</dt>
1172 <p>How can we forbid a certain host or even a user of a
1173 special host from using the Apache proxy?</p>
1179 <p>We first have to make sure <module>mod_rewrite</module>
1180 is below(!) <module>mod_proxy</module> in the Configuration
1181 file when compiling the Apache web server. This way it gets
1182 called <em>before</em> <module>mod_proxy</module>. Then we
1183 configure the following for a host-dependent deny...</p>
1186 RewriteCond %{REMOTE_HOST} <strong>^badhost\.mydomain\.com$</strong>
1187 RewriteRule !^http://[^/.]\.mydomain.com.* - [F]
1190 <p>...and this one for a user@host-dependent deny:</p>
1193 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>^badguy@badhost\.mydomain\.com$</strong>
1194 RewriteRule !^http://[^/.]\.mydomain.com.* - [F]
1201 <section id="special-authentication">
1203 <title>Special Authentication Variant</title>
1206 <dt>Description:</dt>
1209 <p>Sometimes very special authentication is needed, for
1210 instance authentication which checks for a set of
1211 explicitly configured users. Only these should receive
1212 access and without explicit prompting (which would occur
1213 when using Basic Auth via <module>mod_auth_basic</module>).</p>
1219 <p>We use a list of rewrite conditions to exclude all except
1223 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend1@client1.quux-corp\.com$</strong>
1224 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend2</strong>@client2.quux-corp\.com$
1225 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend3</strong>@client3.quux-corp\.com$
1226 RewriteRule ^/~quux/only-for-friends/ - [F]
1233 <section id="referer-deflector">
1235 <title>Referer-based Deflector</title>
1238 <dt>Description:</dt>
1241 <p>How can we program a flexible URL Deflector which acts
1242 on the "Referer" HTTP header and can be configured with as
1243 many referring pages as we like?</p>
1249 <p>Use the following really tricky ruleset...</p>
1252 RewriteMap deflector txt:/path/to/deflector.map
1254 RewriteCond %{HTTP_REFERER} !=""
1255 RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
1256 RewriteRule ^.* %{HTTP_REFERER} [R,L]
1258 RewriteCond %{HTTP_REFERER} !=""
1259 RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
1260 RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
1263 <p>... in conjunction with a corresponding rewrite
1271 http://www.badguys.com/bad/index.html -
1272 http://www.badguys.com/bad/index2.html -
1273 http://www.badguys.com/bad/index3.html http://somewhere.com/
1276 <p>This automatically redirects the request back to the
1277 referring page (when "<code>-</code>" is used as the value
1278 in the map) or to a specific URL (when an URL is specified
1279 in the map as the second argument).</p>