1 .\" (C) Copyright 1999-2000 David A. Wheeler (dwheeler@dwheeler.com)
3 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
5 .\" Fragments of this document are directly derived from IETF standards.
6 .\" For those fragments which are directly derived from such standards,
7 .\" the following notice applies, which is the standard copyright and
8 .\" rights announcement of The Internet Society:
10 .\" Copyright (C) The Internet Society (1998). All Rights Reserved.
11 .\" This document and translations of it may be copied and furnished to
12 .\" others, and derivative works that comment on or otherwise explain it
13 .\" or assist in its implementation may be prepared, copied, published
14 .\" and distributed, in whole or in part, without restriction of any
15 .\" kind, provided that the above copyright notice and this paragraph are
16 .\" included on all such copies and derivative works. However, this
17 .\" document itself may not be modified in any way, such as by removing
18 .\" the copyright notice or references to the Internet Society or other
19 .\" Internet organizations, except as needed for the purpose of
20 .\" developing Internet standards in which case the procedures for
21 .\" copyrights defined in the Internet Standards process must be
22 .\" followed, or as required to translate it into languages other than English.
24 .\" Modified Fri Jul 25 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com)
25 .\" Modified Fri Aug 21 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com)
26 .\" Modified Tue Mar 14 2000 by David A. Wheeler (dwheeler@dwheeler.com)
28 .TH uri 7 (date) "Linux man-pages (unreleased)"
30 uri, url, urn \- uniform resource identifier (URI), including a URL or URN
32 .SY "\fIURI\fP \fR=\fP"
36 .RB [\~\[dq] # \[dq]\~\c
40 .SY "\fIabsoluteURI\fP \fR=\fP"
43 .RI (\~ hierarchical_part
48 .SY "\fIrelativeURI\fP \fR=\fP"
54 .RB [\~\[dq] ? \[dq]\~\c
58 .SY "\fIscheme\fP \fR=\fP"
63 .RB \[dq] gopher \[dq]
65 .RB \[dq] mailto \[dq]
69 .RB \[dq] telnet \[dq]
79 .RB \[dq] whatis \[dq]
87 .SY "\fIhierarchical_part\fP \fR=\fP"
91 .RB [\~\[dq] ? \[dq]\~\c
95 .SY "\fInet_path\fP \fR=\fP"
96 .RB \[dq] // \[dq]\~\c
98 .RI [\~ absolute_path \~]
101 .SY "\fIabsolute_path\fP \fR=\fP"
102 .RB \[dq] / \[dq]\~\c
106 .SY "\fIrelative_path\fP \fR=\fP"
108 .RI [\~ absolute_path \~]
111 A Uniform Resource Identifier (URI) is a short string of characters
112 identifying an abstract or physical resource (for example, a web page).
113 A Uniform Resource Locator (URL) is a URI
114 that identifies a resource through its primary access
115 mechanism (e.g., its network "location"), rather than
116 by name or some other attribute of that resource.
117 A Uniform Resource Name (URN) is a URI
118 that must remain globally unique and persistent even when
119 the resource ceases to exist or becomes unavailable.
121 URIs are the standard way to name hypertext link destinations
122 for tools such as web browsers.
123 The string "http://www.kernel.org" is a URL (and thus it
125 Many people use the term URL loosely as a synonym for URI
126 (though technically URLs are a subset of URIs).
128 URIs can be absolute or relative.
129 An absolute identifier refers to a resource independent of
130 context, while a relative
131 identifier refers to a resource by describing the difference
132 from the current context.
133 Within a relative path reference, the complete path segments "." and
134 ".." have special meanings: "the current hierarchy level" and "the
135 level above this hierarchy level", respectively, just like they do in
137 A path segment which contains a colon
138 character can't be used as the first segment of a relative URI path
139 (e.g., "this:that"), because it would be mistaken for a scheme name;
140 precede such segments with ./ (e.g., "./this:that").
141 Note that descendants of MS-DOS (e.g., Microsoft Windows) replace
142 devicename colons with the vertical bar ("|") in URIs, so "C:" becomes "C|".
144 A fragment identifier,
146 refers to a particular named portion (fragment) of a resource;
147 text after a \[aq]#\[aq] identifies the fragment.
148 A URI beginning with \[aq]#\[aq]
149 refers to that fragment in the current resource.
151 There are many different URI schemes, each with specific
152 additional rules and meanings, but they are intentionally made to be
153 as similar as possible.
154 For example, many URL schemes
155 permit the authority to be the following format, called here an
157 (square brackets show what's optional):
159 .IR "ip_server = " [ user " [ : " password " ] @ ] " host " [ : " port ]
161 This format allows you to optionally insert a username,
162 a user plus password, and/or a port number.
165 is the name of the host computer, either its name as determined by DNS
166 or an IP address (numbers separated by periods).
168 <http://fred:fredpassword@example.com:8080/>
169 logs into a web server on host example.com
170 as fred (using fredpassword) using port 8080.
171 Avoid including a password in a URI if possible because of the many
172 security risks of having a password written down.
173 If the URL supplies a username but no password, and the remote
174 server requests a password, the program interpreting the URL
175 should request one from the user.
177 Here are some of the most common schemes in use on UNIX-like systems
178 that are understood by many tools.
179 Note that many tools using URIs also have internal schemes or specialized
180 schemes; see those tools' documentation for information on those schemes.
182 .B "http \- Web (HTTP) server"
184 .RI http:// ip_server / path
186 .RI http:// ip_server / path ? query
188 This is a URL accessing a web (HTTP) server.
189 The default port is 80.
190 If the path refers to a directory, the web server will choose what
191 to return; usually if there is a file named "index.html" or "index.htm"
192 its content is returned, otherwise, a list of the files in the current
193 directory (with appropriate links) is generated and returned.
194 An example is <http://lwn.net>.
196 A query can be given in the archaic "isindex" format, consisting of a
197 word or phrase and not including an equal sign (=).
198 A query can also be in the longer "GET" format, which has one or more
199 query entries of the form
201 separated by the ampersand character (&).
204 can be repeated more than once, though it's up to the web server
205 and its application programs to determine if there's any meaning to that.
206 There is an unfortunate interaction with HTML/XML/SGML and
207 the GET query format; when such URIs with more than one key
208 are embedded in SGML/XML documents (including HTML), the ampersand
209 (&) has to be rewritten as &.
210 Note that not all queries use this format; larger forms
211 may be too long to store as a URI, so they use a different
212 interaction mechanism (called POST) which does
213 not include the data in the URI.
214 See the Common Gateway Interface specification at
215 .UR http://www.w3.org\:/CGI
217 for more information.
219 .B "ftp \- File Transfer Protocol (FTP)"
221 .RI ftp:// ip_server / path
223 This is a URL accessing a file through the file transfer protocol (FTP).
224 The default port (for control) is 21.
225 If no username is included, the username "anonymous" is supplied, and
226 in that case many clients provide as the password the requestor's
227 Internet email address.
229 <ftp://ftp.is.co.za/rfc/rfc1808.txt>.
231 .B "gopher \- Gopher server"
233 .RI gopher:// ip_server / "gophertype selector"
235 .RI gopher:// ip_server / "gophertype selector" %09 search
237 .RI gopher:// ip_server / "gophertype selector" %09 search %09 gopher+_string
240 The default gopher port is 70.
242 is a single-character field to denote the
243 Gopher type of the resource to
244 which the URL refers.
245 The entire path may also be empty, in
246 which case the delimiting "/" is also optional and the gophertype
250 is the Gopher selector string.
251 In the Gopher protocol,
252 Gopher selector strings are a sequence of octets which may contain
253 any octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal
254 (US-ASCII character LF), and 0D (US-ASCII character CR).
256 .B "mailto \- Email address"
258 .RI mailto: email-address
260 This is an email address, usually of the form
261 .IR name @ hostname .
264 for more information on the correct format of an email address.
265 Note that any % character must be rewritten as %25.
266 An example is <mailto:dwheeler@dwheeler.com>.
268 .B "news \- Newsgroup or News message"
270 .RI news: newsgroup-name
276 is a period-delimited hierarchical name, such as
277 "comp.infosystems.www.misc".
278 If <newsgroup-name> is "*" (as in <news:*>), it is used to refer
279 to "all available news groups".
280 An example is <news:comp.lang.ada>.
284 corresponds to the Message-ID of
285 .UR http://www.ietf.org\:/rfc\:/rfc1036.txt
288 without the enclosing "<"
289 and ">"; it takes the form
290 .IR unique @ full_domain_name .
291 A message identifier may be distinguished from a news group name by the
292 presence of the "@" character.
294 .B "telnet \- Telnet login"
296 .RI telnet:// ip_server /
298 The Telnet URL scheme is used to designate interactive text services that
299 may be accessed by the Telnet protocol.
300 The final "/" character may be omitted.
301 The default port is 23.
302 An example is <telnet://melvyl.ucop.edu/>.
304 .B "file \- Normal file"
306 .RI file:// ip_server / path_segments
308 .RI file: path_segments
310 This represents a file or directory accessible locally.
313 can be the string "localhost" or the empty
314 string; this is interpreted as "the machine from which the URL is
316 If the path is to a directory, the viewer should display the
317 directory's contents with links to each containee;
318 not all viewers currently do this.
319 KDE supports generated files through the URL <file:/cgi-bin>.
320 If the given file isn't found, browser writers may want to try to expand
321 the filename via filename globbing
327 The second format (e.g., <file:/etc/passwd>)
328 is a correct format for referring to
330 However, older standards did not permit this format,
331 and some programs don't recognize this as a URI.
332 A more portable syntax is to use an empty string as the server name,
334 <file:///etc/passwd>; this form does the same thing
335 and is easily recognized by pattern matchers and older programs as a URI.
336 Note that if you really mean to say "start from the current location", don't
337 specify the scheme at all; use a relative address like <../test.txt>,
338 which has the side-effect of being scheme-independent.
339 An example of this scheme is <file:///etc/passwd>.
341 .B "man \- Man page documentation"
343 .RI man: command-name
345 .RI man: command-name ( section )
347 This refers to local online manual (man) reference pages.
348 The command name can optionally be followed by a
349 parenthesis and section number; see
351 for more information on the meaning of the section numbers.
352 This URI scheme is unique to UNIX-like systems (such as Linux)
353 and is not currently registered by the IETF.
354 An example is <man:ls(1)>.
356 .B "info \- Info page documentation"
358 .RI info: virtual-filename
360 .RI info: virtual-filename # nodename
362 .RI info:( virtual-filename )
364 .RI info:( virtual-filename ) nodename
366 This scheme refers to online info reference pages (generated from
368 a documentation format used by programs such as the GNU tools.
369 This URI scheme is unique to UNIX-like systems (such as Linux)
370 and is not currently registered by the IETF.
371 As of this writing, GNOME and KDE differ in their URI syntax
372 and do not accept the other's syntax.
373 The first two formats are the GNOME format; in nodenames all spaces
374 are written as underscores.
375 The second two formats are the KDE format;
376 spaces in nodenames must be written as spaces, even though this
377 is forbidden by the URI standards.
378 It's hoped that in the future most tools will understand all of these
379 formats and will always accept underscores for spaces in nodenames.
380 In both GNOME and KDE, if the form without the nodename is used the
381 nodename is assumed to be "Top".
382 Examples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>.
383 Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and GCC>.
385 .B "whatis \- Documentation search"
389 This scheme searches the database of short (one-line) descriptions of
390 commands and returns a list of descriptions containing that string.
391 Only complete word matches are returned.
394 This URI scheme is unique to UNIX-like systems (such as Linux)
395 and is not currently registered by the IETF.
397 .B "ghelp \- GNOME help documentation"
399 .RI ghelp: name-of-application
401 This loads GNOME help for the given application.
402 Note that not much documentation currently exists in this format.
404 .B "ldap \- Lightweight Directory Access Protocol"
408 .RI ldap:// hostport /
410 .RI ldap:// hostport / dn
412 .RI ldap:// hostport / dn ? attributes
414 .RI ldap:// hostport / dn ? attributes ? scope
416 .RI ldap:// hostport / dn ? attributes ? scope ? filter
418 .RI ldap:// hostport / dn ? attributes ? scope ? filter ? extensions
420 This scheme supports queries to the
421 Lightweight Directory Access Protocol (LDAP), a protocol for querying
422 a set of servers for hierarchically organized information
423 (such as people and computing resources).
425 .UR http://www.ietf.org\:/rfc\:/rfc2255.txt
428 for more information on the LDAP URL scheme.
429 The components of this URL are:
432 the LDAP server to query, written as a hostname optionally followed by
433 a colon and the port number.
434 The default LDAP port is TCP port 389.
435 If empty, the client determines which the LDAP server to use.
438 the LDAP Distinguished Name, which identifies
439 the base object of the LDAP search (see
440 .UR http://www.ietf.org\:/rfc\:/rfc2253.txt
446 a comma-separated list of attributes to be returned;
447 see RFC\ 2251 section 4.1.5.
448 If omitted, all attributes should be returned.
451 specifies the scope of the search, which can be one of
452 "base" (for a base object search), "one" (for a one-level search),
453 or "sub" (for a subtree search).
454 If scope is omitted, "base" is assumed.
457 specifies the search filter (subset of entries
459 If omitted, all entries should be returned.
461 .UR http://www.ietf.org\:/rfc\:/rfc2254.txt
467 a comma-separated list of type=value
468 pairs, where the =value portion may be omitted for options not
470 An extension prefixed with a \[aq]!\[aq] is critical
471 (must be supported to be valid), otherwise it is noncritical (optional).
473 LDAP queries are easiest to explain by example.
474 Here's a query that asks ldap.itd.umich.edu for information about
475 the University of Michigan in the U.S.:
478 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
481 To just get its postal address attribute, request:
484 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
487 To ask a host.com at port 6666 for information about the person
488 with common name (cn) "Babs Jensen" at University of Michigan, request:
491 ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
494 .B "wais \- Wide Area Information Servers"
496 .RI wais:// hostport / database
498 .RI wais:// hostport / database ? search
500 .RI wais:// hostport / database / wtype / wpath
502 This scheme designates a WAIS database, search, or document
504 .UR http://www.ietf.org\:/rfc\:/rfc1625.txt
507 for more information on WAIS).
508 Hostport is the hostname, optionally followed by a colon and port number
509 (the default port number is 210).
511 The first form designates a WAIS database for searching.
512 The second form designates a particular search of the WAIS database
514 The third form designates a particular document within a WAIS
515 database to be retrieved.
517 is the WAIS designation of the type of the object and
519 is the WAIS document-id.
523 There are many other URI schemes.
524 Most tools that accept URIs support a set of internal URIs
525 (e.g., Mozilla has the about: scheme for internal information,
526 and the GNOME help browser has the toc: scheme for various starting
528 There are many schemes that have been defined but are not as widely
529 used at the current time
531 The nntp: scheme is deprecated in favor of the news: scheme.
532 URNs are to be supported by the urn: scheme, with a hierarchical name space
533 (e.g., urn:ietf:... would identify IETF documents); at this time
534 URNs are not widely implemented.
535 Not all tools support all schemes.
536 .SS Character encoding
537 URIs use a limited number of characters so that they can be
538 typed in and used in a variety of situations.
540 The following characters are reserved, that is, they may appear in a
541 URI but their use is limited to their reserved purpose
542 (conflicting data must be escaped before forming the URI):
550 Unreserved characters may be included in a URI.
551 Unreserved characters
552 include uppercase and lowercase Latin letters,
553 decimal digits, and the following
554 limited set of punctuation marks and symbols:
558 \- _ . ! \[ti] * ' ( )
562 All other characters must be escaped.
563 An escaped octet is encoded as a character triplet, consisting of the
564 percent character "%" followed by the two hexadecimal digits
565 representing the octet code (you can use uppercase or lowercase letters
566 for the hexadecimal digits).
567 For example, a blank space must be escaped
568 as "%20", a tab character as "%09", and the "&" as "%26".
569 Because the percent "%" character always has the reserved purpose of
570 being the escape indicator, it must be escaped as "%25".
571 It is common practice to escape space characters as the plus symbol (+)
572 in query text; this practice isn't uniformly defined
573 in the relevant RFCs (which recommend %20 instead) but any tool accepting
574 URIs with query text should be prepared for them.
575 A URI is always shown in its "escaped" form.
577 Unreserved characters can be escaped without changing the semantics
578 of the URI, but this should not be done unless the URI is being used
579 in a context that does not allow the unescaped character to appear.
580 For example, "%7e" is sometimes used instead of "\[ti]" in an HTTP URL
581 path, but the two are equivalent for an HTTP URL.
583 For URIs which must handle characters outside the US ASCII character set,
584 the HTML 4.01 specification (section B.2) and
585 IETF RFC\~3986 (last paragraph of section 2.5)
586 recommend the following approach:
588 translate the character sequences into UTF-8 (IETF RFC\~3629)\[em]see
589 .BR utf\-8 (7)\[em]and
592 use the URI escaping mechanism, that is,
593 use the %HH encoding for unsafe octets.
595 When written, URIs should be placed inside double quotes
596 (e.g., "http://www.kernel.org"),
597 enclosed in angle brackets (e.g., <http://lwn.net>),
598 or placed on a line by themselves.
599 A warning for those who use double-quotes:
601 move extraneous punctuation (such as the period ending a sentence or the
603 inside a URI, since this will change the value of the URI.
604 Instead, use angle brackets instead, or
605 switch to a quoting system that never includes extraneous characters
606 inside quotation marks.
607 This latter system, called the 'new' or 'logical' quoting system by
608 "Hart's Rules" and the "Oxford Dictionary for Writers and Editors",
609 is preferred practice in Great Britain and in various European languages.
610 Older documents suggested inserting the prefix "URL:"
611 just before the URI, but this form has never caught on.
613 The URI syntax was designed to be unambiguous.
614 However, as URIs have become commonplace, traditional media
615 (television, radio, newspapers, billboards, etc.) have increasingly
616 used abbreviated URI references consisting of
617 only the authority and path portions of the identified resource
618 (e.g., <www.w3.org/Addressing>).
619 Such references are primarily
620 intended for human interpretation rather than machine, with the
621 assumption that context-based heuristics are sufficient to complete
622 the URI (e.g., hostnames beginning with "www" are likely to have
623 a URI prefix of "http://" and hostnames beginning with "ftp" likely
624 to have a prefix of "ftp://").
625 Many client implementations heuristically resolve these references.
627 change over time, particularly when new schemes are introduced.
628 Since an abbreviated URI has the same syntax as a relative URL path,
629 abbreviated URI references cannot be used where relative URIs are
630 permitted, and can be used only when there is no defined base
631 (such as in dialog boxes).
632 Don't use abbreviated URIs as hypertext links inside a document;
633 use the standard format as described here.
635 .UR http://www.ietf.org\:/rfc\:/rfc2396.txt
638 .UR http://www.w3.org\:/TR\:/REC\-html40
642 Any tool accepting URIs (e.g., a web browser) on a Linux system should
643 be able to handle (directly or indirectly) all of the
644 schemes described here, including the man: and info: schemes.
645 Handling them by invoking some other program is
646 fine and in fact encouraged.
648 Technically the fragment isn't part of the URI.
650 For information on how to embed URIs (including URLs) in a data format,
651 see documentation on that format.
652 HTML uses the format <A HREF="\fIuri\fP">
655 Texinfo files use the format @uref{\fIuri\fP}.
656 Man and mdoc have the recently added UR macro, or just include the
657 URI in the text (viewers should be able to detect :// as part of a URI).
659 The GNOME and KDE desktop environments currently vary in the URIs
660 they accept, in particular in their respective help browsers.
661 To list man pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and
662 to list info pages, GNOME uses <toc:info> while KDE uses <info:(dir)>
663 (the author of this man page prefers the KDE approach here, though a more
664 regular format would be even better).
665 In general, KDE uses <file:/cgi-bin/> as a prefix to a set of generated
667 KDE prefers documentation in HTML, accessed via the
668 <file:/cgi-bin/helpindex>.
669 GNOME prefers the ghelp scheme to store and find documentation.
670 Neither browser handles file: references to directories at the time
671 of this writing, making it difficult to refer to an entire directory with
673 As noted above, these environments differ in how they handle the
674 info: scheme, probably the most important variation.
675 It is expected that GNOME and KDE
676 will converge to common URI formats, and a future
677 version of this man page will describe the converged result.
678 Efforts to aid this convergence are encouraged.
680 A URI does not in itself pose a security threat.
681 There is no general guarantee that a URL, which at one time
682 located a given resource, will continue to do so.
684 guarantee that a URL will not locate a different resource at some
685 later point in time; such a guarantee can be
686 obtained only from the person(s) controlling that namespace and the
687 resource in question.
689 It is sometimes possible to construct a URL such that an attempt to
690 perform a seemingly harmless operation, such as the
691 retrieval of an entity associated with the resource, will in fact
692 cause a possibly damaging remote operation to occur.
694 is typically constructed by specifying a port number other than that
695 reserved for the network protocol in question.
696 The client unwittingly contacts a site that is in fact
697 running a different protocol.
698 The content of the URL contains instructions that, when
699 interpreted according to this other protocol, cause an unexpected
701 An example has been the use of a gopher URL to cause an
702 unintended or impersonating message to be sent via a SMTP server.
704 Caution should be used when using any URL that specifies a port
705 number other than the default for the protocol, especially when it is
706 a number within the reserved space.
708 Care should be taken when a URI contains escaped delimiters for a
709 given protocol (for example, CR and LF characters for telnet
710 protocols) that these are not unescaped before transmission.
711 This might violate the protocol, but avoids the potential for such
712 characters to be used to simulate an extra operation or parameter in
713 that protocol, which might lead to an unexpected and possibly harmful
714 remote operation to be performed.
716 It is clearly unwise to use a URI that contains a password which is
717 intended to be secret.
718 In particular, the use of a password within
719 the "userinfo" component of a URI is strongly recommended against except
720 in those rare cases where the "password" parameter is intended to be public.
722 Documentation may be placed in a variety of locations, so there
723 currently isn't a good URI scheme for general online documentation
724 in arbitrary formats.
725 References of the form
726 <file:///usr/doc/ZZZ> don't work because different distributions and
727 local installation requirements may place the files in different
729 (it may be in /usr/doc, or /usr/local/doc, or /usr/share,
731 Also, the directory ZZZ usually changes when a version changes
732 (though filename globbing could partially overcome this).
733 Finally, using the file: scheme doesn't easily support people
734 who dynamically load documentation from the Internet (instead of
735 loading the files onto a local filesystem).
736 A future URI scheme may be added (e.g., "userdoc:") to permit
737 programs to include cross-references to more detailed documentation
738 without having to know the exact location of that documentation.
739 Alternatively, a future version of the filesystem specification may
740 specify file locations sufficiently so that the file: scheme will
741 be able to locate documentation.
743 Many programs and file formats don't include a way to incorporate
744 or implement links using URIs.
746 Many programs can't handle all of these different URI formats; there
747 should be a standard mechanism to load an arbitrary URI that automatically
748 detects the users' environment (e.g., text or graphics,
749 desktop environment, local user preferences, and currently executing
750 tools) and invokes the right tool for any URI.
752 .\" David A. Wheeler (dwheeler@dwheeler.com) wrote this man page.
759 .UR http://www.ietf.org\:/rfc\:/rfc2255.txt