2 @setfilename ../../info/url
3 @settitle URL Programmer's Manual
8 @c @setchapternewpage odd
13 %\global\baselineskip 30pt % for printing in double space
15 @dircategory Emacs lisp libraries
17 * URL: (url). URL loading package.
21 This is the manual for the @code{url} Emacs Lisp library.
23 Copyright @copyright{} 1993-1999, 2002, 2004-2012 Free Software Foundation, Inc.
26 Permission is granted to copy, distribute and/or modify this document
27 under the terms of the GNU Free Documentation License, Version 1.3 or
28 any later version published by the Free Software Foundation; with no
29 Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
30 and with the Back-Cover Texts as in (a) below. A copy of the license
31 is included in the section entitled ``GNU Free Documentation License''.
33 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
34 modify this GNU manual. Buying copies from the FSF supports it in
35 developing GNU and promoting software freedom.''
41 @title URL Programmer's Manual
42 @subtitle First Edition, URL Version 2.0
43 @author William M. Perry @email{wmperry@@gnu.org}
44 @author David Love @email{fx@@gnu.org}
46 @vskip 0pt plus 1filll
60 * Introduction:: About the @code{url} library.
61 * URI Parsing:: Parsing (and unparsing) URIs.
62 * Retrieving URLs:: How to use this package to retrieve a URL.
63 * Supported URL Types:: Descriptions of URL types currently supported.
64 * General Facilities:: URLs can be cached, accessed via a gateway
65 and tracked in a history list.
66 * Customization:: Variables you can alter.
67 * GNU Free Documentation License:: The license for this documentation.
77 @cindex uniform resource identifier
78 @cindex uniform resource locator
80 A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
81 name, such as an Internet address, that identifies some name or
82 resource. The format of URIs is described in RFC 3986, which updates
83 and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
84 @dfn{Uniform Resource Locator} (URL) is an older but still-common
85 term, which basically refers to a URI corresponding to a resource that
86 can be accessed (usually over a network) in a specific way.
88 Here are some examples of URIs (taken from RFC 3986):
91 ftp://ftp.is.co.za/rfc/rfc1808.txt
92 http://www.ietf.org/rfc/rfc2396.txt
93 ldap://[2001:db8::7]/c=GB?objectClass?one
94 mailto:John.Doe@@example.com
95 news:comp.infosystems.www.servers.unix
97 telnet://192.0.2.16:80/
98 urn:oasis:names:specification:docbook:dtd:xml:4.1.2
101 This manual describes the @code{url} library, an Emacs Lisp library
102 for parsing URIs and retrieving the resources to which they refer.
103 (The library is so-named for historical reasons; nowadays, the ``URI''
104 terminology is regarded as the more general one, and ``URL'' is
105 technically obsolete despite its widespread vernacular usage.)
110 A URI consists of several @dfn{components}, each having a different
111 meaning. For example, the URI
114 http://www.gnu.org/software/emacs/
118 specifies the scheme component @samp{http}, the hostname component
119 @samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
122 The format of URIs is specified by RFC 3986. The @code{url} library
123 provides the Lisp function @code{url-generic-parse-url}, a (mostly)
124 standard-compliant URI parser, as well as function
125 @code{url-recreate-url}, which converts a parsed URI back into a URI
128 @defun url-generic-parse-url uri-string
129 This function returns a parsed version of the string @var{uri-string}.
132 @defun url-recreate-url uri-obj
133 @cindex unparsing URLs
134 Given a parsed URI, this function returns the corresponding URI string.
138 The return value of @code{url-generic-parse-url}, and the argument
139 expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
140 structure whose slots hold the various components of the URI.
141 @xref{top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
142 details about CL structures. Most of the other functions in the
143 @code{url} library act on parsed URIs.
146 * Parsed URIs:: Format of parsed URI structures.
147 * URI Encoding:: Non-@acronym{ASCII} characters in URIs.
151 @section Parsed URI structures
153 Each parsed URI structure contains the following slots:
157 The URI scheme (a string, e.g.@: @code{http}). @xref{Supported URL
158 Types}, for a list of schemes that the @code{url} library knows how to
159 process. This slot can also be @code{nil}, if the URI is not fully
163 The user name (a string), or @code{nil}.
166 The user password (a string), or @code{nil}. The use of this URI
167 component is strongly discouraged; nowadays, passwords are transmitted
168 by other means, not as part of a URI.
171 The host name (a string), or @code{nil}. If present, this is
172 typically a domain name or IP address.
175 The port number (an integer), or @code{nil}. Omitting this component
176 usually means to use the ``standard'' port associated with the URI
180 The combination of the ``path'' and ``query'' components of the URI (a
181 string), or @code{nil}. If the query component is present, it is the
182 substring following the first @samp{?} character, and the path
183 component is the substring before the @samp{?}. The meaning of these
184 components is scheme-dependent; they do not necessarily refer to a
188 The fragment component (a string), or @code{nil}. The fragment
189 component specifies a ``secondary resource'', such as a section of a
193 This is @code{t} if the URI is fully specified, i.e.@: the
194 hierarchical components of the URI (the hostname and/or username
195 and/or password) are preceded by @samp{//}.
205 @findex url-attributes
207 These slots have accessors named @code{url-@var{part}}, where
208 @var{part} is the slot name. For example, the accessor for the
209 @code{host} slot is the function @code{url-host}. The @code{url-port}
210 accessor returns the default port for the URI scheme if the parsed
211 URI's @var{port} slot is @code{nil}.
213 The slots can be set using @code{setf}. For example:
216 (setf (url-port url) 80)
220 @section URI Encoding
222 @cindex percent encoding
223 The @code{url-generic-parse-url} parser does not obey RFC 3986 in
224 one respect: it allows non-@acronym{ASCII} characters in URI strings.
226 Strictly speaking, RFC 3986 compatible URIs may only consist of
227 @acronym{ASCII} characters; non-@acronym{ASCII} characters are
228 represented by converting them to UTF-8 byte sequences, and performing
229 @dfn{percent encoding} on the bytes. For example, the o-umlaut
230 character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
231 then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
232 @acronym{ASCII} characters must also be percent encoded when they
233 appear in URI components.)
235 The function @code{url-encode-url} can be used to convert a URI
236 string containing arbitrary characters to one that is properly
237 percent-encoded in accordance with RFC 3986.
239 @defun url-encode-url url-string
240 This function return a properly URI-encoded version of
241 @var{url-string}. It also performs @dfn{URI normalization},
242 e.g.@: converting the scheme component to lowercase if it was
243 previously uppercase.
246 To convert between a string containing arbitrary characters and a
247 percent-encoded all-@acronym{ASCII} string, use the functions
248 @code{url-hexify-string} and @code{url-unhex-string}:
250 @defun url-hexify-string string &optional allowed-chars
251 This function performs percent-encoding on @var{string}, and returns
254 If @var{string} is multibyte, it is first converted to a UTF-8 byte
255 string. Each byte corresponding to an allowed character is left
256 as-is, while all other bytes are converted to a three-character
257 sequence: @samp{%} followed by two upper-case hex digits.
259 @vindex url-unreserved-chars
260 @cindex unreserved characters
261 The allowed characters are specified by @var{allowed-chars}. If this
262 argument is @code{nil}, the allowed characters are those specified as
263 @dfn{unreserved characters} by RFC 3986 (see the variable
264 @code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
265 be a vector whose @var{n}-th element is non-@code{nil} if character
269 @defun url-unhex-string string &optional allow-newlines
270 This function replaces percent-encoding sequences in @var{string} with
271 their character equivalents, and returns the resulting string.
273 If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
274 carriage returns and line feeds, which are normally forbidden in URIs.
277 @node Retrieving URLs
278 @chapter Retrieving URLs
280 The @code{url} library defines the following three functions for
281 retrieving the data specified by a URL. The actual retrieval protocol
282 depends on the URL's URI scheme, and is performed by lower-level
283 scheme-specific functions. (Those lower-level functions are not
284 documented here, and generally should not be called directly.)
286 In each of these functions, the @var{url} argument can be either a
287 string or a parsed URL structure. If it is a string, that string is
288 passed through @code{url-encode-url} before using it, to ensure that
289 it is properly URI-encoded (@pxref{URI Encoding}).
291 @defun url-retrieve-synchronously url
292 This function synchronously retrieves the data specified by @var{url},
293 and returns a buffer containing the data. The return value is
294 @code{nil} if there is no data associated with the URL (as is the case
295 for @code{dired}, @code{info}, and @code{mailto} URLs).
298 @defun url-retrieve url callback &optional cbargs silent no-cookies
299 This function retrieves @var{url} asynchronously, calling the function
300 @var{callback} when the object has been completely retrieved. The
301 return value is the buffer into which the data will be inserted, or
302 @code{nil} if the process has already completed.
304 The callback function is called this way:
307 (apply @var{callback} @var{status} @var{cbargs})
311 where @var{status} is a plist representing what happened during the
312 retrieval, with most recent events first, or an empty list if no
313 events have occurred. Each pair in the plist is one of:
316 @item (:redirect @var{redirected-to})
317 This means that the request was redirected to the URL
320 @item (:error (@var{error-symbol} . @var{data}))
321 This means that an error occurred. If so desired, the error can be
322 signaled with @code{(signal @var{error-symbol} @var{data})}.
325 When the callback function is called, the current buffer is the one
326 containing the retrieved data (if any). The buffer also contains any
327 MIME headers associated with the data retrieval.
329 If the optional argument @var{silent} is non-@code{nil}, progress
330 messages are suppressed. If the optional argument @var{no-cookies} is
331 non-@code{nil}, cookies are not stored or sent.
334 @defun url-queue-retrieve url callback &optional cbargs silent no-cookies
335 This function acts like @code{url-retrieve}, but with limits on the
336 number of concurrently-running network processes. The option
337 @code{url-queue-parallel-processes} controls the number of concurrent
338 processes, and the option @code{url-queue-timeout} sets a timeout in
341 To use this function, you must @code{(require 'url-queue)}.
344 @vindex url-queue-parallel-processes
345 @defopt url-queue-parallel-processes
346 The value of this option is an integer specifying the maximum number
347 of concurrent @code{url-queue-retrieve} network processes. If the
348 number of @code{url-queue-retrieve} calls is larger than this number,
349 later ones are queued until ealier ones are finished.
352 @vindex url-queue-timeout
353 @defopt url-queue-timeout
354 The value of this option is a number specifying the maximum lifetime
355 of a @code{url-queue-retrieve} network process, once it is started.
356 If a process is not finished by then, it is killed and removed from
360 @node Supported URL Types
361 @chapter Supported URL Types
363 This chapter describes functions and variables affecting URL retrieval
364 for specific schemes.
367 * http/https:: Hypertext Transfer Protocol.
368 * file/ftp:: Local files and FTP archives.
369 * info:: Emacs "Info" pages.
370 * mailto:: Sending email.
371 * news/nntp/snews:: Usenet news.
372 * rlogin/telnet/tn3270:: Remote host connectivity.
373 * irc:: Internet Relay Chat.
374 * data:: Embedded data URLs.
375 * nfs:: Networked File System
376 * ldap:: Lightweight Directory Access Protocol
377 * man:: Unix man pages.
381 @section @code{http} and @code{https}
383 The @code{http} scheme refers to the Hypertext Transfer Protocol. The
384 @code{url} library supports HTTP version 1.1, specified in RFC 2616.
385 Its default port is 80.
387 The @code{https} scheme is a secure version of @code{http}, with
388 transmission via SSL. It is defined in RFC 2069, and its default port
389 is 443. When using @code{https}, the @code{url} library performs SSL
390 encryption via the @code{ssl} library, by forcing the @code{ssl}
391 gateway method to be used. @xref{Gateways in general}.
393 @defopt url-honor-refresh-requests
394 If this option is non-@code{nil} (the default), the @code{url} library
395 honors the HTTP @samp{Refresh} header, which is used by servers to
396 direct clients to reload documents from the same URL or a or different
397 one. If the value is @code{nil}, the @samp{Refresh} header is
398 ignored; any other value means to ask the user on each request.
403 * HTTP language/coding::
405 * Dealing with HTTP documents::
411 @defopt url-cookie-file
412 The file in which cookies are stored, defaulting to @file{cookies} in
413 the directory specified by @code{url-configuration-directory}.
416 @defopt url-cookie-confirmation
417 Specifies whether confirmation is require to accept cookies.
420 @defopt url-cookie-multiple-line
421 Specifies whether to put all cookies for the server on one line in the
422 HTTP request to satisfy broken servers like
423 @url{http://www.hotmail.com}.
426 @defopt url-cookie-trusted-urls
427 A list of regular expressions matching URLs from which to accept
431 @defopt url-cookie-untrusted-urls
432 A list of regular expressions matching URLs from which to reject
436 @defopt url-cookie-save-interval
437 The number of seconds between automatic saves of cookies to disk.
442 @node HTTP language/coding
443 @subsection Language and Encoding Preferences
445 HTTP allows clients to express preferences for the language and
446 encoding of documents which servers may honor. For each of these
447 variables, the value is a string; it can specify a single choice, or
448 it can be a comma-separated list.
450 Normally, this list is ordered by descending preference. However, each
451 element can be followed by @samp{;q=@var{priority}} to specify its
452 preference level, a decimal number from 0 to 1; e.g., for
453 @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
454 en;q=0.7"}}. An element that has no @samp{;q} specification has
457 @defopt url-mime-charset-string
458 @cindex character sets
459 @cindex coding systems
460 This variable specifies a preference for character sets when documents
461 can be served in more than one encoding.
463 HTTP allows specifying a series of MIME charsets which indicate your
464 preferred character set encodings, e.g., Latin-9 or Big5, and these
465 can be weighted. The default series is generated automatically from
466 the associated MIME types of all defined coding systems, sorted by the
467 coding system priority specified in Emacs. @xref{Recognize Coding, ,
468 Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
471 @defopt url-mime-language-string
472 @cindex language preferences
473 A string specifying the preferred language when servers can serve
474 files in several languages. Use RFC 1766 abbreviations, e.g.,
475 @samp{en} for English, @samp{de} for German.
477 The string can be @code{"*"} to get the first available language (as
478 opposed to the default).
481 @node HTTP URL Options
482 @subsection HTTP URL Options
484 HTTP supports an @samp{OPTIONS} method describing things supported by
487 @defun url-http-options url
488 Returns a property list describing options available for URL. The
489 property list members are:
493 A list of symbols specifying what HTTP methods the resource
498 A list of numbers specifying what DAV protocol/schema versions are
503 A list of supported DASL search types supported (string form).
506 A list of the units available for use in partial document fetches.
510 The @dfn{Platform For Privacy Protection} description for the resource.
511 Currently this is just the raw header contents.
516 @node Dealing with HTTP documents
517 @subsection Dealing with HTTP documents
519 HTTP URLs are retrieved into a buffer containing the HTTP headers
520 followed by the body. Since the headers are quasi-MIME, they may be
521 processed using the MIME library. @xref{Top,, Emacs MIME,
522 emacs-mime, The Emacs MIME Manual}.
525 @section file and ftp
528 @cindex File Transfer Protocol
529 @cindex compressed files
532 The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
533 @code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
534 Such URLs have the form
537 ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
538 file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
542 If the URL specifies a local file, it is retrieved by reading the file
543 contents in the usual way. If it specifies a remote file, it is
544 retrieved using the Ange-FTP package. @xref{Remote Files,,, emacs,
545 The GNU Emacs Manual}.
547 When retrieving a compressed file, it is automatically uncompressed
548 if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
549 @file{.bz2}, or @file{.xz}. (The list of supported suffixes is
550 hard-coded, and cannot be altered by customizing
551 @code{jka-compr-compression-info-list}.)
553 @defopt url-directory-index-file
554 This option specifies the filename to look for when a @code{file} or
555 @code{ftp} URL specifies a directory. The default is
556 @file{index.html}. If this file exists and is readable, it is viewed.
557 Otherwise, Emacs visits the directory using Dired.
564 @findex Info-goto-node
566 The @code{info} scheme is non-standard. Such URLs have the form
569 info:@var{file}#@var{node}
573 and are retrieved by invoking @code{Info-goto-node} with argument
574 @samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
575 @samp{Top} node is opened.
582 A @code{mailto} URL specifies an email message to be sent to a given
583 email address. For example, @samp{mailto:foo@@bar.com} specifies
584 sending a message to @samp{foo@@bar.com}. The ``retrieval method''
585 for such URLs is to open a mail composition buffer in which the
586 appropriate content (e.g.@: the recipient address) has been filled in.
588 As defined in RFC 2368, a @code{mailto} URL has the form
591 @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
595 where an arbitrary number of @var{header}s can be added. If the
596 @var{header} is @samp{body}, then @var{contents} is put in the message
597 body; otherwise, a @var{header} header field is created with
598 @var{contents} as its contents. Note that the @code{url} library does
599 not perform any checking of @var{header} or @var{contents}, so you
600 should check them before sending the message.
602 @defopt url-mail-command
603 @vindex mail-user-agent
604 The value of this variable is the function called whenever url needs
605 to send mail. This should normally be left its default, which is the
606 standard mail-composition command @code{compose-mail}. @xref{Sending
607 Mail,,, emacs, The GNU Emacs Manual}.
610 If the document containing the @code{mailto} URL itself possessed a
611 known URL, Emacs automatically inserts an @samp{X-Url-From} header
612 field into the mail buffer, specifying that URL.
614 @node news/nntp/snews
615 @section @code{news}, @code{nntp} and @code{snews}
622 The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
623 1738, are used for reading Usenet newsgroups. For compatibility with
624 non-standard-compliant news clients, the @code{url} library allows
625 host and port fields to be included in @code{news} URLs, even though
626 they are properly only allowed for @code{nntp} and @code{snews}.
628 @code{news} and @code{nntp} URLs have the following form:
631 @item news:@var{newsgroup}
632 Retrieves a list of messages in @var{newsgroup};
633 @item news:@var{message-id}
634 Retrieves the message with the given @var{message-id};
636 Retrieves a list of all available newsgroups;
637 @item nntp://@var{host}:@var{port}/@var{newsgroup}
638 @itemx nntp://@var{host}:@var{port}/@var{message-id}
639 @itemx nntp://@var{host}:@var{port}/*
640 Similar to the @samp{news} versions.
643 The default port for @code{nntp} (and @code{news}) is 119. The
644 difference between an @code{nntp} URL and a @code{news} URL is that an
645 @code{nttp} URL may specify an article by its number. The
646 @samp{snews} scheme is the same as @samp{nntp}, except that it is
647 tunneled through SSL and has default port 563.
649 These URLs are retrieved via the Gnus package.
651 @cindex environment variable
653 @defopt url-news-server
654 This variable specifies the default news server from which to fetch
655 news, if no server was specified in the URL. The default value,
656 @code{nil}, means to use the server specified by the standard
657 environment variable @samp{NNTPSERVER}, or @samp{news} if that
658 environment variable is unset.
661 @node rlogin/telnet/tn3270
662 @section rlogin, telnet and tn3270
666 @cindex terminal emulation
667 @findex terminal-emulator
669 These URL schemes are defined in RFC 1738, and are used for logging in
670 via a terminal emulator. They have the form
673 telnet://@var{user}:@var{password}@@@var{host}:@var{port}
677 but the @var{password} component is ignored.
679 To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
680 @code{telnet} or @code{tn3270} (the program names and arguments are
681 hardcoded) session is run in a @code{terminal-emulator} buffer.
682 Well-known ports are used if the URL does not specify a port.
687 @cindex Internet Relay Chat
692 The @code{irc} scheme is defined in the Internet Draft at
693 @url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
694 was never approved as an RFC). Such URLs have the form
697 irc://@var{host}:@var{port}/@var{target},@var{needpass}
701 and are retrieved by opening an @acronym{IRC} session using the
702 function specified by @code{url-irc-function}.
704 @defopt url-irc-function
705 The value of this option is a function, which is called to open an IRC
706 connection for @code{irc} URLs. This function must take five
707 arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
708 @var{password}. The @var{channel} argument specifies the channel to
709 join immediately, and may be @code{nil}.
711 The default is @code{url-irc-rcirc}, which uses the Rcirc package.
712 Other options are @code{url-irc-erc} (which uses ERC) and
713 @code{url-irc-zenirc} (which uses ZenIRC).
720 The @code{data} scheme, defined in RFC 2397, contains MIME data in
721 the URL itself. Such URLs have the form
724 data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
728 @var{media-type} is a MIME @samp{Content-Type} string, possibly
729 including parameters. It defaults to
730 @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
731 omitted but the charset parameter supplied. If @samp{;base64} is
732 present, the @var{data} are base64-encoded.
737 @cindex Network File System
740 The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
741 except that it points to a file on a remote host that is handled by an
742 NFS automounter on the local host. Such URLs have the form
745 nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
748 @defvar url-nfs-automounter-directory-spec
750 A string saying how to invoke the NFS automounter. Certain @samp{%}
751 sequences are recognized:
755 The hostname of the NFS server;
757 The port number of the NFS server;
759 The username to use to authenticate;
761 The password to use to authenticate;
763 The filename on the remote server;
768 Each can be used any number of times.
773 @cindex Lightweight Directory Access Protocol
775 The LDAP scheme is defined in RFC 2255.
779 @cindex @command{man}
780 @cindex Unix man pages
783 The @code{man} scheme is a non-standard one. Such URLs have the form
786 @samp{man:@var{page-spec}}
790 and are retrieved by passing @var{page-spec} to the Lisp function
793 @node General Facilities
794 @chapter General Facilities
799 * Gateways in general::
804 @section Disk Caching
806 @cindex Persistent Cache
809 The disk cache stores retrieved documents locally, whence they can be
810 retrieved more quickly. When requesting a URL that is in the cache,
811 the library checks to see if the page has changed since it was last
812 retrieved from the remote machine. If not, the local copy is used,
813 saving the transmission over the network.
814 @cindex Cleaning the cache
815 @cindex Clearing the cache
816 @cindex Cache cleaning
817 Currently the cache isn't cleared automatically.
818 @c Running the @code{clean-cache} shell script
819 @c fist is recommended, to allow for future cleaning of the cache. This
820 @c shell script will remove all files that have not been accessed since it
821 @c was last run. To keep the cache pared down, it is recommended that this
822 @c script be run from @i{at} or @i{cron} (see the manual pages for
823 @c crontab(5) or at(1) for more information)
825 @defopt url-automatic-caching
826 Setting this variable non-@code{nil} causes documents to be cached
830 @defopt url-cache-directory
831 This variable specifies the
832 directory to store the cache files. It defaults to sub-directory
833 @file{cache} of @code{url-configuration-directory}.
836 @defopt url-cache-creation-function
837 The cache relies on a scheme for mapping URLs to files in the cache.
838 This variable names a function which sets the type of cache to use.
839 It takes a URL as argument and returns the absolute file name of the
840 corresponding cache file. The two supplied possibilities are
841 @code{url-cache-create-filename-using-md5} and
842 @code{url-cache-create-filename-human-readable}.
845 @defun url-cache-create-filename-using-md5 url
846 Creates a cache file name from @var{url} using MD5 hashing.
847 This is creates entries with very few cache collisions and is fast.
850 (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
851 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
855 @defun url-cache-create-filename-human-readable url
856 Creates a cache file name from @var{url} more obviously connected to
857 @var{url} than for @code{url-cache-create-filename-using-md5}, but
858 more likely to conflict with other files.
860 (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
861 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
865 @defun url-cache-expired
866 This function returns non-nil if a cache entry has expired (or is absent).
867 The arguments are a URL and optional expiration delay in seconds
868 (default @var{url-cache-expire-time}).
871 @defopt url-cache-expire-time
872 This variable is the default number of seconds to use for the
873 expire-time argument of the function @code{url-cache-expired}.
876 @defun url-fetch-from-cache
877 This function takes a URL as its argument and returns a buffer
878 containing the data cached for that URL.
881 @c Fixme: never actually used currently?
882 @c @defopt url-standalone-mode
883 @c @cindex Relying on cache
884 @c @cindex Cache only mode
885 @c @cindex Standalone mode
886 @c If this variable is non-@code{nil}, the library relies solely on the
887 @c cache for fetching documents and avoids checking if they have changed
888 @c on remote servers.
891 @c With a large cache of documents on the local disk, it can be very handy
892 @c when traveling, or any other time the network connection is not active
893 @c (a laptop with a dial-on-demand PPP connection, etc). Emacs/W3 can rely
894 @c solely on its cache, and avoid checking to see if the page has changed
895 @c on the remote server. In the case of a dial-on-demand PPP connection,
896 @c this will keep the phone line free as long as possible, only bringing up
897 @c the PPP connection when asking for a page that is not located in the
898 @c cache. This is very useful for demonstrations as well.
901 @section Proxies and Gatewaying
903 @c fixme: check/document url-ns stuff
904 @cindex proxy servers
906 @cindex environment variables
908 Proxy servers are commonly used to provide gateways through firewalls
909 or as caches serving some more-or-less local network. Each protocol
910 (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
911 conventionally configured commonly amongst different programs through
912 environment variables of the form @code{@var{protocol}_proxy}, where
913 @var{protocol} is one of the supported network protocols (@code{http},
914 @code{ftp} etc.). The library recognizes such variables in either
915 upper or lower case. Their values are of one of the forms:
917 @item @code{@var{host}:@var{port}}
919 @item Simply a host name.
923 The @code{NO_PROXY} environment variable specifies URLs that should be
924 excluded from proxying (on servers that should be contacted directly).
925 This should be a comma-separated list of hostnames, domain names, or a
926 mixture of both. Asterisks can be used as wildcards, but other
927 clients may not support that. Domain names may be indicated by a
928 leading dot. For example:
930 NO_PROXY="*.aventail.com,home.com,.seanet.com"
932 @noindent says to contact all machines in the @samp{aventail.com} and
933 @samp{seanet.com} domains directly, as well as the machine named
934 @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
935 and @code{no_proxy} are also tried, in that order.
937 Proxies may also be specified directly in Lisp.
939 @defopt url-proxy-services
940 This variable is an alist of URL schemes and proxy servers that
941 gateway them. The items are of the form @w{@code{(@var{scheme}
942 . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
943 gatewayed through @var{portnumber} on the specified @var{host}. An
944 exception is the pseudo scheme @code{"no_proxy"}, which is paired with
945 a regexp matching host names not to be proxied. This variable is
946 initialized from the environment as above.
949 (setq url-proxy-services
950 '(("http" . "proxy.aventail.com:80")
951 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
955 @node Gateways in general
956 @section Gateways in General
960 The library provides a general gateway layer through which all
961 networking passes. It can both control access to the network and
962 provide access through gateways in firewalls. This may make direct
963 connections in some cases and pass through some sort of gateway in
964 others.@footnote{Proxies (which only operate over HTTP) are
965 implemented using this.} The library's basic function responsible for
966 making connections is @code{url-open-stream}.
968 @defun url-open-stream name buffer host service
969 @cindex opening a stream
970 @cindex stream, opening
971 Open a stream to @var{host}, possibly via a gateway. The other
972 arguments are as for @code{open-network-stream}. This will not make a
973 connection if @code{url-gateway-unplugged} is non-@code{nil}.
976 @defvar url-gateway-local-host-regexp
977 This is a regular expression that matches local hosts that do not
978 require the use of a gateway. If @code{nil}, all connections are made
982 @defvar url-gateway-method
983 This variable controls which gateway method is used. It may be useful
984 to bind it temporarily in some applications. It has values taken from
985 a list of symbols. Possible values are:
989 @cindex @command{telnet}
990 Use this method if you must first telnet and log into a gateway host,
991 and then run telnet from that host to connect to outside machines.
994 @cindex @command{rlogin}
995 This method is identical to @code{telnet}, but uses @command{rlogin}
996 to log into the remote machine without having to send the username and
997 password over the wire every time.
1001 Use if the firewall has a @sc{socks} gateway running on it. The
1002 @sc{socks} v5 protocol is defined in RFC 1928.
1005 @c This probably shouldn't be documented
1006 @c Fixme: why not? -- fx
1009 This method uses Emacs's builtin networking directly. This is the
1010 default. It can be used only if there is no firewall blocking access.
1014 The following variables control the gateway methods.
1016 @defopt url-gateway-telnet-host
1017 The gateway host to telnet to. Once logged in there, you then telnet
1018 out to the hosts you want to connect to.
1020 @defopt url-gateway-telnet-parameters
1021 This should be a list of parameters to pass to the @command{telnet} program.
1023 @defopt url-gateway-telnet-password-prompt
1024 This is a regular expression that matches the password prompt when
1027 @defopt url-gateway-telnet-login-prompt
1028 This is a regular expression that matches the username prompt when
1031 @defopt url-gateway-telnet-user-name
1032 The username to log in with.
1034 @defopt url-gateway-telnet-password
1035 The password to send when logging in.
1037 @defopt url-gateway-prompt-pattern
1038 This is a regular expression that matches the shell prompt.
1041 @defopt url-gateway-rlogin-host
1042 Host to @samp{rlogin} to before telnetting out.
1044 @defopt url-gateway-rlogin-parameters
1045 Parameters to pass to @samp{rsh}.
1047 @defopt url-gateway-rlogin-user-name
1048 User name to use when logging in to the gateway.
1050 @defopt url-gateway-prompt-pattern
1051 This is a regular expression that matches the shell prompt.
1054 @defopt socks-server
1055 This specifies the default server, it takes the form
1056 @w{@code{("Default server" @var{server} @var{port} @var{version})}}
1057 where @var{version} can be either 4 or 5.
1059 @defvar socks-password
1060 If this is @code{nil} then you will be asked for the password,
1061 otherwise it will be used as the password for authenticating you to
1062 the @sc{socks} server.
1064 @defvar socks-username
1065 This is the username to use when authenticating yourself to the
1066 @sc{socks} server. By default this is your login name.
1068 @defvar socks-timeout
1069 This controls how long, in seconds, to wait for responses from the
1070 @sc{socks} server; it is 5 by default.
1072 @c fixme: these have been effectively commented-out in the code
1073 @c @defopt socks-server-aliases
1074 @c This a list of server aliases. It is a list of aliases of the form
1075 @c @var{(alias hostname port version)}.
1077 @c @defopt socks-network-aliases
1078 @c This a list of network aliases. Each entry in the list takes the form
1079 @c @var{(alias (network))} where @var{alias} is a string that names the
1080 @c @var{network}. The networks can contain a pair (not a dotted pair) of
1081 @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
1082 @c address and a netmask, a domain name or a unique hostname or @sc{ip}
1085 @c @defopt socks-redirection-rules
1086 @c This a list of redirection rules. Each rule take the form
1087 @c @var{(Destination network Connection type)} where @var{Destination
1088 @c network} is a network alias from @code{socks-network-aliases} and
1089 @c @var{Connection type} can be @code{nil} in which case a direct
1090 @c connection is used, or it can be an alias from
1091 @c @code{socks-server-aliases} in which case that server is used as a
1094 @defopt socks-nslookup-program
1095 @cindex @command{nslookup}
1096 This the @samp{nslookup} program. It is @code{"nslookup"} by default.
1100 * Suppressing network connections::
1102 @c * Broken hostname resolution::
1104 @node Suppressing network connections
1105 @subsection Suppressing Network Connections
1107 @cindex network connections, suppressing
1108 @cindex suppressing network connections
1111 In some circumstances it is desirable to suppress making network
1112 connections. A typical case is when rendering HTML in a mail user
1113 agent, when external URLs should not be activated, particularly to
1114 avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
1115 like. To arrange this, bind the following variable for the duration
1118 @defvar url-gateway-unplugged
1119 If this variable is non-@code{nil} new network connections are never
1120 opened by the URL library.
1123 @c @node Broken hostname resolution
1124 @c @subsection Broken Hostname Resolution
1126 @c @cindex hostname resolver
1127 @c @cindex resolver, hostname
1128 @c Some C libraries do not include the hostname resolver routines in
1129 @c their static libraries. If Emacs was linked statically, and was not
1130 @c linked with the resolver libraries, it will not be able to get to any
1131 @c machines off the local network. This is characterized by being able
1132 @c to reach someplace with a raw ip number, but not its hostname
1133 @c (@url{http://129.79.254.191/} works, but
1134 @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1135 @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1136 @c rebuilt linked against the resolver library, it can use the external
1137 @c @command{nslookup} program instead.
1139 @c @defopt url-gateway-broken-resolution
1140 @c @cindex @code{nslookup} program
1141 @c @cindex program, @code{nslookup}
1142 @c If non-@code{nil}, this variable says to use the program specified by
1143 @c @code{url-gateway-nslookup-program} program to do hostname resolution.
1146 @c @defopt url-gateway-nslookup-program
1147 @c The name of the program to do hostname lookup if Emacs can't do it
1148 @c directly. This program should expect a single argument on the command
1149 @c line---the hostname to resolve---and should produce output similar to
1150 @c the standard Unix @command{nslookup} program:
1152 @c Name: www.cs.indiana.edu
1153 @c Address: 129.79.254.191
1160 @findex url-do-setup
1161 The library can maintain a global history list tracking URLs accessed.
1162 URL completion can be done from it. The history mechanism is set up
1163 automatically via @code{url-do-setup} when it is configured to be on.
1164 Note that the size of the history list is currently not limited.
1166 @vindex url-history-hash-table
1167 The history ``list'' is actually a hash table,
1168 @code{url-history-hash-table}. It contains access times keyed by URL
1169 strings. The times are in the format returned by @code{current-time}.
1171 @defun url-history-update-url url time
1172 This function updates the history table with an entry for @var{url}
1173 accessed at the given @var{time}.
1176 @defopt url-history-track
1177 If non-@code{nil}, the library will keep track of all the URLs
1178 accessed. If it is @code{t}, the list is saved to disk at the end of
1179 each Emacs session. The default is @code{nil}.
1182 @defopt url-history-file
1183 The file storing the history list between sessions. It defaults to
1184 @file{history} in @code{url-configuration-directory}.
1187 @defopt url-history-save-interval
1188 @findex url-history-setup-save-timer
1189 The number of seconds between automatic saves of the history list.
1190 Default is one hour. Note that if you change this variable directly,
1191 rather than using Custom, after @code{url-do-setup} has been run, you
1192 need to run the function @code{url-history-setup-save-timer}.
1195 @defun url-history-parse-history &optional fname
1196 Parses the history file @var{fname} (default @code{url-history-file})
1197 and sets up the history list.
1200 @defun url-history-save-history &optional fname
1201 Saves the current history to file @var{fname} (default
1202 @code{url-history-file}).
1205 @defun url-completion-function string predicate function
1206 You can use this function to do completion of URLs from the history.
1210 @chapter Customization
1212 @cindex environment variables
1213 The following environment variables affect the @code{url} library's
1214 operation at startup.
1219 @vindex url-temporary-directory
1220 If this is defined, @var{url-temporary-directory} is initialized from
1224 The following user options affect the general operation of
1227 @defopt url-configuration-directory
1228 @cindex configuration files
1229 The value of this variable specifies the name of the directory where
1230 the @code{url} library stores its various configuration files, cache
1233 The default value specifies a subdirectory named @file{url/} in the
1234 standard Emacs user data directory specified by the variable
1235 @code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
1236 the old default was @file{~/.url}, and this directory is used instead
1242 Specifies the types of debug messages which are logged to
1243 the @code{*URL-DEBUG*} buffer.
1244 @code{t} means log all messages.
1245 A number means log all messages and show them with @code{message}.
1246 It may also be a list of the types of messages to be logged.
1248 @defopt url-personal-mail-address
1250 @defopt url-privacy-level
1252 @defopt url-uncompressor-alist
1254 @defopt url-passwd-entry-func
1256 @defopt url-standalone-mode
1258 @defopt url-bad-port-list
1260 @defopt url-max-password-attempts
1262 @defopt url-temporary-directory
1264 @defopt url-show-status
1266 @defopt url-confirmation-func
1267 The function to use for asking yes or no functions. This is normally
1268 either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1269 function taking a single argument (the prompt) and returning @code{t}
1270 only if an affirmative answer is given.
1272 @defopt url-gateway-method
1273 @c fixme: describe gatewaying
1274 A symbol specifying the type of gateway support to use for connections
1275 from the local machine. The supported methods are:
1279 Run telnet in a subprocess to connect;
1281 Rlogin to another machine to connect;
1283 Connect through a socks server;
1291 @node GNU Free Documentation License
1292 @appendix GNU Free Documentation License
1293 @include doclicense.texi
1295 @node Function Index
1296 @unnumbered Command and Function Index
1299 @node Variable Index
1300 @unnumbered Variable Index
1304 @unnumbered Concept Index