2 @comment node-name, next, previous, up
3 @chapter External Formats
5 External formats determine the coding of characters from/to sequences of
6 octets when exchanging data with the outside world. Examples of such
12 Character streams associated with files, sockets and process
13 input/output (See @ref{Stream External Formats} and @ref{Running
20 Foreign strings (See @ref{Foreign Types and Lisp Types})
23 Posix interface (See @ref{sb-posix})
26 Hostname- and protocol-related functions of the BSD-socket interface
27 (See @ref{Networking})
31 Technically, external formats in SBCL are named objects describing
32 coding of characters as well as policies in case de- or encoding is not
33 possible. Each external format has a canonical name and zero or more
34 aliases. User code mostly interacts with external formats by supplying
35 external format designators to functions that use external formats
39 * The Default External Format::
40 * External Format Designators::
41 * Character Coding Conditions::
42 * Converting between Strings and Octet Vectors::
43 * Supported External Formats::
46 @node The Default External Format
47 @section The Default External Format
48 @cindex The Default External Format
50 Most functions interacting with external formats use a default external
51 format if none is explicitly specified. In some cases, the default
52 external format is used unconditionally.
54 The default external format is UTF-8. It can be changed via
56 @var{sb-ext:*default-external-format*}
58 @var{sb-ext:*default-c-string-external-format*}
60 @node External Format Designators
61 @section External Format Designators
62 @cindex External Format Designators
65 @findex @cl{with-open-file}
66 In situations where an external format designator is required, such as
67 the @code{:external-format} argument in calls to @code{open} or
68 @code{with-open-file}, users may supply the name of an encoding to
69 denote the external format which is applying that encoding to Lisp
72 In addition to the basic encoding for an external format, options
73 controlling various special cases may be passed, by using a list (whose
74 first element must be an encoding name and whose rest is a plist) as an
75 external file format designator.
77 More specifically, external format designators can take the following
83 Designates the current default external format (See @ref{The Default
87 Designates the supported external format that has @var{keyword} as one
88 of its names. (See @ref{Supported External Formats}).
90 @item (@var{keyword} :replacement @var{replacement})
91 Designates an external format that is like the one designated by
92 @var{keyword} but does not signal an error in case a character or octet
93 sequence cannot be en- or decoded. Instead, it inserts @var{replacement}
94 at the position in question. @var{replacement} has to be a string
95 designator, that is a character or string.
99 (with-open-file (stream pathname :external-format '(:utf-8 :replacement #\?))
102 will read the first line of @var{pathname}, replacing any octet sequence
103 that is not valid in the UTF-8 external format with a question mark
108 @node Character Coding Conditions
109 @section Character Coding Conditions
110 @cindex Character Coding Conditions
112 De- or encoding characters using a given external format is not always
118 Decoding an octet vector using a given external format can fail if it
119 contains an octet or sequence of octets that does not have an
120 interpretation as a character according to the external format.
123 Conversely, a string may contain characters that a given external format
124 cannot encode. For example, the ASCII external format cannot encode the
125 character @code{#\รถ}.
129 Unless the external format governing the coding uses the
130 @code{:replacement} keyword, SBCL will signal (continuable) errors under
131 the above circumstances. The types of the condition signaled are not
132 currently exported or documented but will be in future SBCL versions.
134 @node Converting between Strings and Octet Vectors
135 @section Converting between Strings and Octet Vectors
136 @cindex Converting between Strings and Octet Vectors
138 To encode Lisp strings as octet vectors and decode octet vectors as Lisp
139 strings, the following SBCL-specific functions can be used:
141 @include fun-sb-ext-string-to-octets.texinfo
142 @include fun-sb-ext-octets-to-string.texinfo
144 @node Supported External Formats
145 @section Supported External Formats
146 @cindex Supported External Formats
148 The following table lists the external formats supported by SBCL in the
149 form of the respective canonical name followed by the list of aliases:
151 @include encodings.texi-temp