2 @setfilename ../../info/url.info
3 @settitle URL Programmer's Manual
5 @documentencoding UTF-8
10 @c @setchapternewpage odd
15 %\global\baselineskip 30pt % for printing in double space
17 @dircategory Emacs lisp libraries
19 * URL: (url). URL loading package.
23 This is the manual for the @code{url} Emacs Lisp library.
25 Copyright @copyright{} 1993--1999, 2002, 2004--2014 Free Software
29 Permission is granted to copy, distribute and/or modify this document
30 under the terms of the GNU Free Documentation License, Version 1.3 or
31 any later version published by the Free Software Foundation; with no
32 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
33 and with the Back-Cover Texts as in (a) below. A copy of the license
34 is included in the section entitled ``GNU Free Documentation License''.
36 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
37 modify this GNU manual.''
43 @title URL Programmer's Manual
44 @subtitle First Edition, URL Version 2.0
45 @author William M. Perry @email{wmperry@@gnu.org}
46 @author David Love @email{fx@@gnu.org}
48 @vskip 0pt plus 1filll
62 * Introduction:: About the @code{url} library.
63 * URI Parsing:: Parsing (and unparsing) URIs.
64 * Retrieving URLs:: How to use this package to retrieve a URL.
65 * Supported URL Types:: Descriptions of URL types currently supported.
66 * General Facilities:: URLs can be cached, accessed via a gateway
67 and tracked in a history list.
68 * Customization:: Variables you can alter.
69 * GNU Free Documentation License:: The license for this documentation.
79 @cindex uniform resource identifier
80 @cindex uniform resource locator
82 A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
83 name, such as an Internet address, that identifies some name or
84 resource. The format of URIs is described in RFC 3986, which updates
85 and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
86 @dfn{Uniform Resource Locator} (URL) is an older but still-common
87 term, which basically refers to a URI corresponding to a resource that
88 can be accessed (usually over a network) in a specific way.
90 Here are some examples of URIs (taken from RFC 3986):
93 ftp://ftp.is.co.za/rfc/rfc1808.txt
94 http://www.ietf.org/rfc/rfc2396.txt
95 ldap://[2001:db8::7]/c=GB?objectClass?one
96 mailto:John.Doe@@example.com
97 news:comp.infosystems.www.servers.unix
99 telnet://192.0.2.16:80/
100 urn:oasis:names:specification:docbook:dtd:xml:4.1.2
103 This manual describes the @code{url} library, an Emacs Lisp library
104 for parsing URIs and retrieving the resources to which they refer.
105 (The library is so-named for historical reasons; nowadays, the ``URI''
106 terminology is regarded as the more general one, and ``URL'' is
107 technically obsolete despite its widespread vernacular usage.)
112 A URI consists of several @dfn{components}, each having a different
113 meaning. For example, the URI
116 http://www.gnu.org/software/emacs/
120 specifies the scheme component @samp{http}, the hostname component
121 @samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
124 The format of URIs is specified by RFC 3986. The @code{url} library
125 provides the Lisp function @code{url-generic-parse-url}, a (mostly)
126 standard-compliant URI parser, as well as function
127 @code{url-recreate-url}, which converts a parsed URI back into a URI
130 @defun url-generic-parse-url uri-string
131 This function returns a parsed version of the string @var{uri-string}.
134 @defun url-recreate-url uri-obj
135 @cindex unparsing URLs
136 Given a parsed URI, this function returns the corresponding URI string.
140 The return value of @code{url-generic-parse-url}, and the argument
141 expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
142 structure whose slots hold the various components of the URI@.
143 @xref{Top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
144 details about CL structures. Most of the other functions in the
145 @code{url} library act on parsed URIs.
148 * Parsed URIs:: Format of parsed URI structures.
149 * URI Encoding:: Non-@acronym{ASCII} characters in URIs.
153 @section Parsed URI structures
155 Each parsed URI structure contains the following slots:
159 The URI scheme (a string, e.g., @code{http}). @xref{Supported URL
160 Types}, for a list of schemes that the @code{url} library knows how to
161 process. This slot can also be @code{nil}, if the URI is not fully
165 The user name (a string), or @code{nil}.
168 The user password (a string), or @code{nil}. The use of this URI
169 component is strongly discouraged; nowadays, passwords are transmitted
170 by other means, not as part of a URI.
173 The host name (a string), or @code{nil}. If present, this is
174 typically a domain name or IP address.
177 The port number (an integer), or @code{nil}. Omitting this component
178 usually means to use the ``standard'' port associated with the URI
182 The combination of the ``path'' and ``query'' components of the URI (a
183 string), or @code{nil}. If the query component is present, it is the
184 substring following the first @samp{?} character, and the path
185 component is the substring before the @samp{?}. The meaning of these
186 components is scheme-dependent; they do not necessarily refer to a
190 The fragment component (a string), or @code{nil}. The fragment
191 component specifies a ``secondary resource'', such as a section of a
195 This is @code{t} if the URI is fully specified, i.e., the
196 hierarchical components of the URI (the hostname and/or username
197 and/or password) are preceded by @samp{//}.
207 @findex url-attributes
209 These slots have accessors named @code{url-@var{part}}, where
210 @var{part} is the slot name. For example, the accessor for the
211 @code{host} slot is the function @code{url-host}. The @code{url-port}
212 accessor returns the default port for the URI scheme if the parsed
213 URI's @var{port} slot is @code{nil}.
215 The slots can be set using @code{setf}. For example:
218 (setf (url-port url) 80)
222 @section URI Encoding
224 @cindex percent encoding
225 The @code{url-generic-parse-url} parser does not obey RFC 3986 in
226 one respect: it allows non-@acronym{ASCII} characters in URI strings.
228 Strictly speaking, RFC 3986 compatible URIs may only consist of
229 @acronym{ASCII} characters; non-@acronym{ASCII} characters are
230 represented by converting them to UTF-8 byte sequences, and performing
231 @dfn{percent encoding} on the bytes. For example, the o-umlaut
232 character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
233 then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
234 @acronym{ASCII} characters must also be percent encoded when they
235 appear in URI components.)
237 The function @code{url-encode-url} can be used to convert a URI
238 string containing arbitrary characters to one that is properly
239 percent-encoded in accordance with RFC 3986.
241 @defun url-encode-url url-string
242 This function return a properly URI-encoded version of
243 @var{url-string}. It also performs @dfn{URI normalization},
244 e.g., converting the scheme component to lowercase if it was
245 previously uppercase.
248 To convert between a string containing arbitrary characters and a
249 percent-encoded all-@acronym{ASCII} string, use the functions
250 @code{url-hexify-string} and @code{url-unhex-string}:
252 @defun url-hexify-string string &optional allowed-chars
253 This function performs percent-encoding on @var{string}, and returns
256 If @var{string} is multibyte, it is first converted to a UTF-8 byte
257 string. Each byte corresponding to an allowed character is left
258 as-is, while all other bytes are converted to a three-character
259 sequence: @samp{%} followed by two upper-case hex digits.
261 @vindex url-unreserved-chars
262 @cindex unreserved characters
263 The allowed characters are specified by @var{allowed-chars}. If this
264 argument is @code{nil}, the allowed characters are those specified as
265 @dfn{unreserved characters} by RFC 3986 (see the variable
266 @code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
267 be a vector whose @var{n}-th element is non-@code{nil} if character
271 @defun url-unhex-string string &optional allow-newlines
272 This function replaces percent-encoding sequences in @var{string} with
273 their character equivalents, and returns the resulting string.
275 If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
276 carriage returns and line feeds, which are normally forbidden in URIs.
279 @node Retrieving URLs
280 @chapter Retrieving URLs
282 The @code{url} library defines the following three functions for
283 retrieving the data specified by a URL@. The actual retrieval protocol
284 depends on the URL's URI scheme, and is performed by lower-level
285 scheme-specific functions. (Those lower-level functions are not
286 documented here, and generally should not be called directly.)
288 In each of these functions, the @var{url} argument can be either a
289 string or a parsed URL structure. If it is a string, that string is
290 passed through @code{url-encode-url} before using it, to ensure that
291 it is properly URI-encoded (@pxref{URI Encoding}).
293 @defun url-retrieve-synchronously url
294 This function synchronously retrieves the data specified by @var{url},
295 and returns a buffer containing the data. The return value is
296 @code{nil} if there is no data associated with the URL (as is the case
297 for @code{dired}, @code{info}, and @code{mailto} URLs).
300 @defun url-retrieve url callback &optional cbargs silent no-cookies
301 This function retrieves @var{url} asynchronously, calling the function
302 @var{callback} when the object has been completely retrieved. The
303 return value is the buffer into which the data will be inserted, or
304 @code{nil} if the process has already completed.
306 The callback function is called this way:
309 (apply @var{callback} @var{status} @var{cbargs})
313 where @var{status} is a plist representing what happened during the
314 retrieval, with most recent events first, or an empty list if no
315 events have occurred. Each pair in the plist is one of:
318 @item (:redirect @var{redirected-to})
319 This means that the request was redirected to the URL
322 @item (:error (@var{error-symbol} . @var{data}))
323 This means that an error occurred. If so desired, the error can be
324 signaled with @code{(signal @var{error-symbol} @var{data})}.
327 When the callback function is called, the current buffer is the one
328 containing the retrieved data (if any). The buffer also contains any
329 MIME headers associated with the data retrieval.
331 If the optional argument @var{silent} is non-@code{nil}, progress
332 messages are suppressed. If the optional argument @var{no-cookies} is
333 non-@code{nil}, cookies are not stored or sent.
336 @defun url-queue-retrieve url callback &optional cbargs silent no-cookies
337 This function acts like @code{url-retrieve}, but with limits on the
338 number of concurrently-running network processes. The option
339 @code{url-queue-parallel-processes} controls the number of concurrent
340 processes, and the option @code{url-queue-timeout} sets a timeout in
343 To use this function, you must @code{(require 'url-queue)}.
346 @vindex url-queue-parallel-processes
347 @defopt url-queue-parallel-processes
348 The value of this option is an integer specifying the maximum number
349 of concurrent @code{url-queue-retrieve} network processes. If the
350 number of @code{url-queue-retrieve} calls is larger than this number,
351 later ones are queued until earlier ones are finished.
354 @vindex url-queue-timeout
355 @defopt url-queue-timeout
356 The value of this option is a number specifying the maximum lifetime
357 of a @code{url-queue-retrieve} network process, once it is started.
358 If a process is not finished by then, it is killed and removed from
362 @node Supported URL Types
363 @chapter Supported URL Types
365 This chapter describes functions and variables affecting URL retrieval
366 for specific schemes.
369 * http/https:: Hypertext Transfer Protocol.
370 * file/ftp:: Local files and FTP archives.
371 * info:: Emacs "Info" pages.
372 * mailto:: Sending email.
373 * news/nntp/snews:: Usenet news.
374 * rlogin/telnet/tn3270:: Remote host connectivity.
375 * irc:: Internet Relay Chat.
376 * data:: Embedded data URLs.
377 * nfs:: Networked File System
378 * ldap:: Lightweight Directory Access Protocol
379 * man:: Unix man pages.
383 @section @code{http} and @code{https}
385 The @code{http} scheme refers to the Hypertext Transfer Protocol. The
386 @code{url} library supports HTTP version 1.1, specified in RFC 2616.
387 Its default port is 80.
389 The @code{https} scheme is a secure version of @code{http}, with
390 transmission via SSL@. It is defined in RFC 2069, and its default port
391 is 443. When using @code{https}, the @code{url} library performs SSL
392 encryption via the @code{ssl} library, by forcing the @code{ssl}
393 gateway method to be used. @xref{Gateways in general}.
395 @defopt url-honor-refresh-requests
396 If this option is non-@code{nil} (the default), the @code{url} library
397 honors the HTTP @samp{Refresh} header, which is used by servers to
398 direct clients to reload documents from the same URL or a or different
399 one. If the value is @code{nil}, the @samp{Refresh} header is
400 ignored; any other value means to ask the user on each request.
405 * HTTP language/coding::
407 * Dealing with HTTP documents::
413 @findex url-cookie-delete
414 @defun url-cookie-list
415 This command creates a @file{*url cookies*} buffer listing the current
416 cookies, if there are any. You can remove a cookie using the
417 @kbd{C-k} (@code{url-cookie-delete}) command.
420 @defopt url-cookie-file
421 The file in which cookies are stored, defaulting to @file{cookies} in
422 the directory specified by @code{url-configuration-directory}.
425 @defopt url-cookie-confirmation
426 Specifies whether confirmation is require to accept cookies.
429 @defopt url-cookie-multiple-line
430 Specifies whether to put all cookies for the server on one line in the
431 HTTP request to satisfy broken servers like
432 @url{http://www.hotmail.com}.
435 @defopt url-cookie-trusted-urls
436 A list of regular expressions matching URLs from which to accept
440 @defopt url-cookie-untrusted-urls
441 A list of regular expressions matching URLs from which to reject
445 @defopt url-cookie-save-interval
446 The number of seconds between automatic saves of cookies to disk.
451 @node HTTP language/coding
452 @subsection Language and Encoding Preferences
454 HTTP allows clients to express preferences for the language and
455 encoding of documents which servers may honor. For each of these
456 variables, the value is a string; it can specify a single choice, or
457 it can be a comma-separated list.
459 Normally, this list is ordered by descending preference. However, each
460 element can be followed by @samp{;q=@var{priority}} to specify its
461 preference level, a decimal number from 0 to 1; e.g., for
462 @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
463 en;q=0.7"}}. An element that has no @samp{;q} specification has
466 @defopt url-mime-charset-string
467 @cindex character sets
468 @cindex coding systems
469 This variable specifies a preference for character sets when documents
470 can be served in more than one encoding.
472 HTTP allows specifying a series of MIME charsets which indicate your
473 preferred character set encodings, e.g., Latin-9 or Big5, and these
474 can be weighted. The default series is generated automatically from
475 the associated MIME types of all defined coding systems, sorted by the
476 coding system priority specified in Emacs. @xref{Recognize Coding, ,
477 Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
480 @defopt url-mime-language-string
481 @cindex language preferences
482 A string specifying the preferred language when servers can serve
483 files in several languages. Use RFC 1766 abbreviations, e.g.,
484 @samp{en} for English, @samp{de} for German.
486 The string can be @code{"*"} to get the first available language (as
487 opposed to the default).
490 @node HTTP URL Options
491 @subsection HTTP URL Options
493 HTTP supports an @samp{OPTIONS} method describing things supported by
496 @defun url-http-options url
497 Returns a property list describing options available for URL@. The
498 property list members are:
502 A list of symbols specifying what HTTP methods the resource
507 A list of numbers specifying what DAV protocol/schema versions are
512 A list of supported DASL search types supported (string form).
515 A list of the units available for use in partial document fetches.
519 The @dfn{Platform For Privacy Protection} description for the resource.
520 Currently this is just the raw header contents.
525 @node Dealing with HTTP documents
526 @subsection Dealing with HTTP documents
528 HTTP URLs are retrieved into a buffer containing the HTTP headers
529 followed by the body. Since the headers are quasi-MIME, they may be
530 processed using the MIME library. @xref{Top,, Emacs MIME,
531 emacs-mime, The Emacs MIME Manual}.
534 @section file and ftp
537 @cindex File Transfer Protocol
538 @cindex compressed files
541 The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
542 @code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
543 Such URLs have the form
546 ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
547 file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
551 If the URL specifies a local file, it is retrieved by reading the file
552 contents in the usual way. If it specifies a remote file, it is
553 retrieved using the Ange-FTP package. @xref{Remote Files,,, emacs,
554 The GNU Emacs Manual}.
556 When retrieving a compressed file, it is automatically uncompressed
557 if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
558 @file{.bz2}, or @file{.xz}. (The list of supported suffixes is
559 hard-coded, and cannot be altered by customizing
560 @code{jka-compr-compression-info-list}.)
562 @defopt url-directory-index-file
563 This option specifies the filename to look for when a @code{file} or
564 @code{ftp} URL specifies a directory. The default is
565 @file{index.html}. If this file exists and is readable, it is viewed.
566 Otherwise, Emacs visits the directory using Dired.
573 @findex Info-goto-node
575 The @code{info} scheme is non-standard. Such URLs have the form
578 info:@var{file}#@var{node}
582 and are retrieved by invoking @code{Info-goto-node} with argument
583 @samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
584 @samp{Top} node is opened.
591 A @code{mailto} URL specifies an email message to be sent to a given
592 email address. For example, @samp{mailto:foo@@bar.com} specifies
593 sending a message to @samp{foo@@bar.com}. The ``retrieval method''
594 for such URLs is to open a mail composition buffer in which the
595 appropriate content (e.g., the recipient address) has been filled in.
597 As defined in RFC 2368, a @code{mailto} URL has the form
600 @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
604 where an arbitrary number of @var{header}s can be added. If the
605 @var{header} is @samp{body}, then @var{contents} is put in the message
606 body; otherwise, a @var{header} header field is created with
607 @var{contents} as its contents. Note that the @code{url} library does
608 not perform any checking of @var{header} or @var{contents}, so you
609 should check them before sending the message.
611 @defopt url-mail-command
612 @vindex mail-user-agent
613 The value of this variable is the function called whenever url needs
614 to send mail. This should normally be left its default, which is the
615 standard mail-composition command @code{compose-mail}. @xref{Sending
616 Mail,,, emacs, The GNU Emacs Manual}.
619 If the document containing the @code{mailto} URL itself possessed a
620 known URL, Emacs automatically inserts an @samp{X-Url-From} header
621 field into the mail buffer, specifying that URL.
623 @node news/nntp/snews
624 @section @code{news}, @code{nntp} and @code{snews}
631 The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
632 1738, are used for reading Usenet newsgroups. For compatibility with
633 non-standard-compliant news clients, the @code{url} library allows
634 host and port fields to be included in @code{news} URLs, even though
635 they are properly only allowed for @code{nntp} and @code{snews}.
637 @code{news} and @code{nntp} URLs have the following form:
640 @item news:@var{newsgroup}
641 Retrieves a list of messages in @var{newsgroup};
642 @item news:@var{message-id}
643 Retrieves the message with the given @var{message-id};
645 Retrieves a list of all available newsgroups;
646 @item nntp://@var{host}:@var{port}/@var{newsgroup}
647 @itemx nntp://@var{host}:@var{port}/@var{message-id}
648 @itemx nntp://@var{host}:@var{port}/*
649 Similar to the @samp{news} versions.
652 The default port for @code{nntp} (and @code{news}) is 119. The
653 difference between an @code{nntp} URL and a @code{news} URL is that an
654 @code{nttp} URL may specify an article by its number. The
655 @samp{snews} scheme is the same as @samp{nntp}, except that it is
656 tunneled through SSL and has default port 563.
658 These URLs are retrieved via the Gnus package.
660 @cindex environment variable
662 @defopt url-news-server
663 This variable specifies the default news server from which to fetch
664 news, if no server was specified in the URL@. The default value,
665 @code{nil}, means to use the server specified by the standard
666 environment variable @samp{NNTPSERVER}, or @samp{news} if that
667 environment variable is unset.
670 @node rlogin/telnet/tn3270
671 @section rlogin, telnet and tn3270
675 @cindex terminal emulation
676 @findex terminal-emulator
678 These URL schemes are defined in RFC 1738, and are used for logging in
679 via a terminal emulator. They have the form
682 telnet://@var{user}:@var{password}@@@var{host}:@var{port}
686 but the @var{password} component is ignored.
688 To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
689 @code{telnet} or @code{tn3270} (the program names and arguments are
690 hardcoded) session is run in a @code{terminal-emulator} buffer.
691 Well-known ports are used if the URL does not specify a port.
696 @cindex Internet Relay Chat
701 The @code{irc} scheme is defined in the Internet Draft at
702 @url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
703 was never approved as an RFC). Such URLs have the form
706 irc://@var{host}:@var{port}/@var{target},@var{needpass}
710 and are retrieved by opening an @acronym{IRC} session using the
711 function specified by @code{url-irc-function}.
713 @defopt url-irc-function
714 The value of this option is a function, which is called to open an IRC
715 connection for @code{irc} URLs. This function must take five
716 arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
717 @var{password}. The @var{channel} argument specifies the channel to
718 join immediately, and may be @code{nil}.
720 The default is @code{url-irc-rcirc}, which uses the Rcirc package.
721 Other options are @code{url-irc-erc} (which uses ERC) and
722 @code{url-irc-zenirc} (which uses ZenIRC).
729 The @code{data} scheme, defined in RFC 2397, contains MIME data in
730 the URL itself. Such URLs have the form
733 data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
737 @var{media-type} is a MIME @samp{Content-Type} string, possibly
738 including parameters. It defaults to
739 @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
740 omitted but the charset parameter supplied. If @samp{;base64} is
741 present, the @var{data} are base64-encoded.
746 @cindex Network File System
749 The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
750 except that it points to a file on a remote host that is handled by an
751 NFS automounter on the local host. Such URLs have the form
754 nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
757 @defvar url-nfs-automounter-directory-spec
759 A string saying how to invoke the NFS automounter. Certain @samp{%}
760 sequences are recognized:
764 The hostname of the NFS server;
766 The port number of the NFS server;
768 The username to use to authenticate;
770 The password to use to authenticate;
772 The filename on the remote server;
777 Each can be used any number of times.
782 @cindex Lightweight Directory Access Protocol
784 The LDAP scheme is defined in RFC 2255.
788 @cindex @command{man}
789 @cindex Unix man pages
792 The @code{man} scheme is a non-standard one. Such URLs have the form
795 @samp{man:@var{page-spec}}
799 and are retrieved by passing @var{page-spec} to the Lisp function
802 @node General Facilities
803 @chapter General Facilities
808 * Gateways in general::
813 @section Disk Caching
815 @cindex Persistent Cache
818 The disk cache stores retrieved documents locally, whence they can be
819 retrieved more quickly. When requesting a URL that is in the cache,
820 the library checks to see if the page has changed since it was last
821 retrieved from the remote machine. If not, the local copy is used,
822 saving the transmission over the network.
823 @cindex Cleaning the cache
824 @cindex Clearing the cache
825 @cindex Cache cleaning
826 Currently the cache isn't cleared automatically.
827 @c Running the @code{clean-cache} shell script
828 @c fist is recommended, to allow for future cleaning of the cache. This
829 @c shell script will remove all files that have not been accessed since it
830 @c was last run. To keep the cache pared down, it is recommended that this
831 @c script be run from @i{at} or @i{cron} (see the manual pages for
832 @c crontab(5) or at(1) for more information)
834 @defopt url-automatic-caching
835 Setting this variable non-@code{nil} causes documents to be cached
839 @defopt url-cache-directory
840 This variable specifies the
841 directory to store the cache files. It defaults to sub-directory
842 @file{cache} of @code{url-configuration-directory}.
845 @defopt url-cache-creation-function
846 The cache relies on a scheme for mapping URLs to files in the cache.
847 This variable names a function which sets the type of cache to use.
848 It takes a URL as argument and returns the absolute file name of the
849 corresponding cache file. The two supplied possibilities are
850 @code{url-cache-create-filename-using-md5} and
851 @code{url-cache-create-filename-human-readable}.
854 @defun url-cache-create-filename-using-md5 url
855 Creates a cache file name from @var{url} using MD5 hashing.
856 This is creates entries with very few cache collisions and is fast.
859 (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
860 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
864 @defun url-cache-create-filename-human-readable url
865 Creates a cache file name from @var{url} more obviously connected to
866 @var{url} than for @code{url-cache-create-filename-using-md5}, but
867 more likely to conflict with other files.
869 (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
870 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
874 @defun url-cache-expired
875 This function returns non-@code{nil} if a cache entry has expired (or is absent).
876 The arguments are a URL and optional expiration delay in seconds
877 (default @var{url-cache-expire-time}).
880 @defopt url-cache-expire-time
881 This variable is the default number of seconds to use for the
882 expire-time argument of the function @code{url-cache-expired}.
885 @defun url-fetch-from-cache
886 This function takes a URL as its argument and returns a buffer
887 containing the data cached for that URL.
890 @c Fixme: never actually used currently?
891 @c @defopt url-standalone-mode
892 @c @cindex Relying on cache
893 @c @cindex Cache only mode
894 @c @cindex Standalone mode
895 @c If this variable is non-@code{nil}, the library relies solely on the
896 @c cache for fetching documents and avoids checking if they have changed
897 @c on remote servers.
900 @c With a large cache of documents on the local disk, it can be very handy
901 @c when traveling, or any other time the network connection is not active
902 @c (a laptop with a dial-on-demand PPP connection, etc.). Emacs/W3 can rely
903 @c solely on its cache, and avoid checking to see if the page has changed
904 @c on the remote server. In the case of a dial-on-demand PPP connection,
905 @c this will keep the phone line free as long as possible, only bringing up
906 @c the PPP connection when asking for a page that is not located in the
907 @c cache. This is very useful for demonstrations as well.
910 @section Proxies and Gatewaying
912 @c fixme: check/document url-ns stuff
913 @cindex proxy servers
915 @cindex environment variables
917 Proxy servers are commonly used to provide gateways through firewalls
918 or as caches serving some more-or-less local network. Each protocol
919 (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
920 conventionally configured commonly amongst different programs through
921 environment variables of the form @code{@var{protocol}_proxy}, where
922 @var{protocol} is one of the supported network protocols (@code{http},
923 @code{ftp} etc.). The library recognizes such variables in either
924 upper or lower case. Their values are of one of the forms:
926 @item @code{@var{host}:@var{port}}
928 @item Simply a host name.
932 The @code{NO_PROXY} environment variable specifies URLs that should be
933 excluded from proxying (on servers that should be contacted directly).
934 This should be a comma-separated list of hostnames, domain names, or a
935 mixture of both. Asterisks can be used as wildcards, but other
936 clients may not support that. Domain names may be indicated by a
937 leading dot. For example:
939 NO_PROXY="*.aventail.com,home.com,.seanet.com"
941 @noindent says to contact all machines in the @samp{aventail.com} and
942 @samp{seanet.com} domains directly, as well as the machine named
943 @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
944 and @code{no_proxy} are also tried, in that order.
946 Proxies may also be specified directly in Lisp.
948 @defopt url-proxy-services
949 This variable is an alist of URL schemes and proxy servers that
950 gateway them. The items are of the form @w{@code{(@var{scheme}
951 . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
952 gatewayed through @var{portnumber} on the specified @var{host}. An
953 exception is the pseudo scheme @code{"no_proxy"}, which is paired with
954 a regexp matching host names not to be proxied. This variable is
955 initialized from the environment as above.
958 (setq url-proxy-services
959 '(("http" . "proxy.aventail.com:80")
960 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
964 @node Gateways in general
965 @section Gateways in General
969 The library provides a general gateway layer through which all
970 networking passes. It can both control access to the network and
971 provide access through gateways in firewalls. This may make direct
972 connections in some cases and pass through some sort of gateway in
973 others.@footnote{Proxies (which only operate over HTTP) are
974 implemented using this.} The library's basic function responsible for
975 making connections is @code{url-open-stream}.
977 @defun url-open-stream name buffer host service
978 @cindex opening a stream
979 @cindex stream, opening
980 Open a stream to @var{host}, possibly via a gateway. The other
981 arguments are as for @code{open-network-stream}. This will not make a
982 connection if @code{url-gateway-unplugged} is non-@code{nil}.
985 @defvar url-gateway-local-host-regexp
986 This is a regular expression that matches local hosts that do not
987 require the use of a gateway. If @code{nil}, all connections are made
991 @defvar url-gateway-method
992 This variable controls which gateway method is used. It may be useful
993 to bind it temporarily in some applications. It has values taken from
994 a list of symbols. Possible values are:
998 @cindex @command{telnet}
999 Use this method if you must first telnet and log into a gateway host,
1000 and then run telnet from that host to connect to outside machines.
1003 @cindex @command{rlogin}
1004 This method is identical to @code{telnet}, but uses @command{rlogin}
1005 to log into the remote machine without having to send the username and
1006 password over the wire every time.
1010 Use if the firewall has a @sc{socks} gateway running on it. The
1011 @sc{socks} v5 protocol is defined in RFC 1928.
1014 @c This probably shouldn't be documented
1015 @c Fixme: why not? -- fx
1018 This method uses Emacs's builtin networking directly. This is the
1019 default. It can be used only if there is no firewall blocking access.
1023 The following variables control the gateway methods.
1025 @defopt url-gateway-telnet-host
1026 The gateway host to telnet to. Once logged in there, you then telnet
1027 out to the hosts you want to connect to.
1029 @defopt url-gateway-telnet-parameters
1030 This should be a list of parameters to pass to the @command{telnet} program.
1032 @defopt url-gateway-telnet-password-prompt
1033 This is a regular expression that matches the password prompt when
1036 @defopt url-gateway-telnet-login-prompt
1037 This is a regular expression that matches the username prompt when
1040 @defopt url-gateway-telnet-user-name
1041 The username to log in with.
1043 @defopt url-gateway-telnet-password
1044 The password to send when logging in.
1046 @defopt url-gateway-prompt-pattern
1047 This is a regular expression that matches the shell prompt.
1050 @defopt url-gateway-rlogin-host
1051 Host to @samp{rlogin} to before telnetting out.
1053 @defopt url-gateway-rlogin-parameters
1054 Parameters to pass to @samp{rsh}.
1056 @defopt url-gateway-rlogin-user-name
1057 User name to use when logging in to the gateway.
1059 @defopt url-gateway-prompt-pattern
1060 This is a regular expression that matches the shell prompt.
1063 @defopt socks-server
1064 This specifies the default server, it takes the form
1065 @w{@code{("Default server" @var{server} @var{port} @var{version})}}
1066 where @var{version} can be either 4 or 5.
1068 @defvar socks-password
1069 If this is @code{nil} then you will be asked for the password,
1070 otherwise it will be used as the password for authenticating you to
1071 the @sc{socks} server.
1073 @defvar socks-username
1074 This is the username to use when authenticating yourself to the
1075 @sc{socks} server. By default this is your login name.
1077 @defvar socks-timeout
1078 This controls how long, in seconds, to wait for responses from the
1079 @sc{socks} server; it is 5 by default.
1081 @c fixme: these have been effectively commented-out in the code
1082 @c @defopt socks-server-aliases
1083 @c This a list of server aliases. It is a list of aliases of the form
1084 @c @var{(alias hostname port version)}.
1086 @c @defopt socks-network-aliases
1087 @c This a list of network aliases. Each entry in the list takes the form
1088 @c @var{(alias (network))} where @var{alias} is a string that names the
1089 @c @var{network}. The networks can contain a pair (not a dotted pair) of
1090 @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
1091 @c address and a netmask, a domain name or a unique hostname or @sc{ip}
1094 @c @defopt socks-redirection-rules
1095 @c This a list of redirection rules. Each rule take the form
1096 @c @var{(Destination network Connection type)} where @var{Destination
1097 @c network} is a network alias from @code{socks-network-aliases} and
1098 @c @var{Connection type} can be @code{nil} in which case a direct
1099 @c connection is used, or it can be an alias from
1100 @c @code{socks-server-aliases} in which case that server is used as a
1103 @defopt socks-nslookup-program
1104 @cindex @command{nslookup}
1105 This the @samp{nslookup} program. It is @code{"nslookup"} by default.
1109 * Suppressing network connections::
1111 @c * Broken hostname resolution::
1113 @node Suppressing network connections
1114 @subsection Suppressing Network Connections
1116 @cindex network connections, suppressing
1117 @cindex suppressing network connections
1120 In some circumstances it is desirable to suppress making network
1121 connections. A typical case is when rendering HTML in a mail user
1122 agent, when external URLs should not be activated, particularly to
1123 avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
1124 like. To arrange this, bind the following variable for the duration
1127 @defvar url-gateway-unplugged
1128 If this variable is non-@code{nil} new network connections are never
1129 opened by the URL library.
1132 @c @node Broken hostname resolution
1133 @c @subsection Broken Hostname Resolution
1135 @c @cindex hostname resolver
1136 @c @cindex resolver, hostname
1137 @c Some C libraries do not include the hostname resolver routines in
1138 @c their static libraries. If Emacs was linked statically, and was not
1139 @c linked with the resolver libraries, it will not be able to get to any
1140 @c machines off the local network. This is characterized by being able
1141 @c to reach someplace with a raw ip number, but not its hostname
1142 @c (@url{http://129.79.254.191/} works, but
1143 @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1144 @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1145 @c rebuilt linked against the resolver library, it can use the external
1146 @c @command{nslookup} program instead.
1148 @c @defopt url-gateway-broken-resolution
1149 @c @cindex @code{nslookup} program
1150 @c @cindex program, @code{nslookup}
1151 @c If non-@code{nil}, this variable says to use the program specified by
1152 @c @code{url-gateway-nslookup-program} program to do hostname resolution.
1155 @c @defopt url-gateway-nslookup-program
1156 @c The name of the program to do hostname lookup if Emacs can't do it
1157 @c directly. This program should expect a single argument on the command
1158 @c line---the hostname to resolve---and should produce output similar to
1159 @c the standard Unix @command{nslookup} program:
1161 @c Name: www.cs.indiana.edu
1162 @c Address: 129.79.254.191
1169 @findex url-do-setup
1170 The library can maintain a global history list tracking URLs accessed.
1171 URL completion can be done from it. The history mechanism is set up
1172 automatically via @code{url-do-setup} when it is configured to be on.
1173 Note that the size of the history list is currently not limited.
1175 @vindex url-history-hash-table
1176 The history ``list'' is actually a hash table,
1177 @code{url-history-hash-table}. It contains access times keyed by URL
1178 strings. The times are in the format returned by @code{current-time}.
1180 @defun url-history-update-url url time
1181 This function updates the history table with an entry for @var{url}
1182 accessed at the given @var{time}.
1185 @defopt url-history-track
1186 If non-@code{nil}, the library will keep track of all the URLs
1187 accessed. If it is @code{t}, the list is saved to disk at the end of
1188 each Emacs session. The default is @code{nil}.
1191 @defopt url-history-file
1192 The file storing the history list between sessions. It defaults to
1193 @file{history} in @code{url-configuration-directory}.
1196 @defopt url-history-save-interval
1197 @findex url-history-setup-save-timer
1198 The number of seconds between automatic saves of the history list.
1199 Default is one hour. Note that if you change this variable directly,
1200 rather than using Custom, after @code{url-do-setup} has been run, you
1201 need to run the function @code{url-history-setup-save-timer}.
1204 @defun url-history-parse-history &optional fname
1205 Parses the history file @var{fname} (default @code{url-history-file})
1206 and sets up the history list.
1209 @defun url-history-save-history &optional fname
1210 Saves the current history to file @var{fname} (default
1211 @code{url-history-file}).
1214 @defun url-completion-function string predicate function
1215 You can use this function to do completion of URLs from the history.
1219 @chapter Customization
1221 @cindex environment variables
1222 The following environment variables affect the @code{url} library's
1223 operation at startup.
1228 @vindex url-temporary-directory
1229 If this is defined, @var{url-temporary-directory} is initialized from
1233 The following user options affect the general operation of
1236 @defopt url-configuration-directory
1237 @cindex configuration files
1238 The value of this variable specifies the name of the directory where
1239 the @code{url} library stores its various configuration files, cache
1242 The default value specifies a subdirectory named @file{url/} in the
1243 standard Emacs user data directory specified by the variable
1244 @code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
1245 the old default was @file{~/.url}, and this directory is used instead
1251 Specifies the types of debug messages which are logged to
1252 the @file{*URL-DEBUG*} buffer.
1253 @code{t} means log all messages.
1254 A number means log all messages and show them with @code{message}.
1255 It may also be a list of the types of messages to be logged.
1257 @defopt url-personal-mail-address
1259 @defopt url-privacy-level
1261 @defopt url-uncompressor-alist
1263 @defopt url-passwd-entry-func
1265 @defopt url-standalone-mode
1267 @defopt url-bad-port-list
1269 @defopt url-max-password-attempts
1271 @defopt url-temporary-directory
1273 @defopt url-show-status
1275 @defopt url-confirmation-func
1276 The function to use for asking yes or no functions. This is normally
1277 either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1278 function taking a single argument (the prompt) and returning @code{t}
1279 only if an affirmative answer is given.
1281 @defopt url-gateway-method
1282 @c fixme: describe gatewaying
1283 A symbol specifying the type of gateway support to use for connections
1284 from the local machine. The supported methods are:
1288 Run telnet in a subprocess to connect;
1290 Rlogin to another machine to connect;
1292 Connect through a socks server;
1300 @node GNU Free Documentation License
1301 @appendix GNU Free Documentation License
1302 @include doclicense.texi
1304 @node Function Index
1305 @unnumbered Command and Function Index
1308 @node Variable Index
1309 @unnumbered Variable Index
1313 @unnumbered Concept Index