2 @c This is part of the GNU Emacs Lisp Reference Manual.
3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
4 @c See the file elisp.texi for copying conditions.
5 @setfilename ../info/strings
6 @node Strings and Characters, Lists, Numbers, Top
7 @comment node-name, next, previous, up
8 @chapter Strings and Characters
10 @cindex character arrays
14 A string in Emacs Lisp is an array that contains an ordered sequence
15 of characters. Strings are used as names of symbols, buffers, and
16 files, to send messages to users, to hold text being copied between
17 buffers, and for many other purposes. Because strings are so important,
18 Emacs Lisp has many functions expressly for manipulating them. Emacs
19 Lisp programs use strings more often than individual characters.
21 @xref{Strings of Events}, for special considerations for strings of
22 keyboard character events.
25 * Basics: String Basics. Basic properties of strings and characters.
26 * Predicates for Strings:: Testing whether an object is a string or char.
27 * Creating Strings:: Functions to allocate new strings.
28 * Text Comparison:: Comparing characters or strings.
29 * String Conversion:: Converting characters or strings and vice versa.
30 * Formatting Strings:: @code{format}: Emacs's analog of @code{printf}.
31 * Character Case:: Case conversion functions.
32 * Case Table:: Customizing case conversion.
36 @section String and Character Basics
38 Strings in Emacs Lisp are arrays that contain an ordered sequence of
39 characters. Characters are represented in Emacs Lisp as integers;
40 whether an integer was intended as a character or not is determined only
41 by how it is used. Thus, strings really contain integers.
43 The length of a string (like any array) is fixed and independent of
44 the string contents, and cannot be altered. Strings in Lisp are
45 @emph{not} terminated by a distinguished character code. (By contrast,
46 strings in C are terminated by a character with @sc{ASCII} code 0.)
47 This means that any character, including the null character (@sc{ASCII}
48 code 0), is a valid element of a string.@refill
50 Since strings are considered arrays, you can operate on them with the
51 general array functions. (@xref{Sequences Arrays Vectors}.) For
52 example, you can access or change individual characters in a string
53 using the functions @code{aref} and @code{aset} (@pxref{Array
56 Each character in a string is stored in a single byte. Therefore,
57 numbers not in the range 0 to 255 are truncated when stored into a
58 string. This means that a string takes up much less memory than a
59 vector of the same length.
61 Sometimes key sequences are represented as strings. When a string is
62 a key sequence, string elements in the range 128 to 255 represent meta
63 characters (which are extremely large integers) rather than keyboard
64 events in the range 128 to 255.
66 Strings cannot hold characters that have the hyper, super or alt
67 modifiers; they can hold @sc{ASCII} control characters, but no other
68 control characters. They do not distinguish case in @sc{ASCII} control
69 characters. @xref{Character Type}, for more information about
70 representation of meta and other modifiers for keyboard input
73 Like a buffer, a string can contain text properties for the characters
74 in it, as well as the characters themselves. @xref{Text Properties}.
76 @xref{Text}, for information about functions that display strings or
77 copy them into buffers. @xref{Character Type}, and @ref{String Type},
78 for information about the syntax of characters and strings.
80 @node Predicates for Strings
81 @section The Predicates for Strings
83 For more information about general sequence and array predicates,
84 see @ref{Sequences Arrays Vectors}, and @ref{Arrays}.
87 This function returns @code{t} if @var{object} is a string, @code{nil}
91 @defun char-or-string-p object
92 This function returns @code{t} if @var{object} is a string or a
93 character (i.e., an integer), @code{nil} otherwise.
96 @node Creating Strings
97 @section Creating Strings
99 The following functions create strings, either from scratch, or by
100 putting strings together, or by taking them apart.
102 @defun make-string count character
103 This function returns a string made up of @var{count} repetitions of
104 @var{character}. If @var{count} is negative, an error is signaled.
113 Other functions to compare with this one include @code{char-to-string}
114 (@pxref{String Conversion}), @code{make-vector} (@pxref{Vectors}), and
115 @code{make-list} (@pxref{Building Lists}).
118 @defun substring string start &optional end
119 This function returns a new string which consists of those characters
120 from @var{string} in the range from (and including) the character at the
121 index @var{start} up to (but excluding) the character at the index
122 @var{end}. The first character is at index zero.
126 (substring "abcdefg" 0 3)
132 Here the index for @samp{a} is 0, the index for @samp{b} is 1, and the
133 index for @samp{c} is 2. Thus, three letters, @samp{abc}, are copied
134 from the string @code{"abcdefg"}. The index 3 marks the character
135 position up to which the substring is copied. The character whose index
136 is 3 is actually the fourth character in the string.
138 A negative number counts from the end of the string, so that @minus{}1
139 signifies the index of the last character of the string. For example:
143 (substring "abcdefg" -3 -1)
149 In this example, the index for @samp{e} is @minus{}3, the index for
150 @samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1.
151 Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded.
153 When @code{nil} is used as an index, it stands for the length of the
158 (substring "abcdefg" -3 nil)
163 Omitting the argument @var{end} is equivalent to specifying @code{nil}.
164 It follows that @code{(substring @var{string} 0)} returns a copy of all
169 (substring "abcdefg" 0)
175 But we recommend @code{copy-sequence} for this purpose (@pxref{Sequence
178 A @code{wrong-type-argument} error is signaled if either @var{start} or
179 @var{end} is not an integer or @code{nil}. An @code{args-out-of-range}
180 error is signaled if @var{start} indicates a character following
181 @var{end}, or if either integer is out of range for @var{string}.
183 Contrast this function with @code{buffer-substring} (@pxref{Buffer
184 Contents}), which returns a string containing a portion of the text in
185 the current buffer. The beginning of a string is at index 0, but the
186 beginning of a buffer is at index 1.
189 @defun concat &rest sequences
190 @cindex copying strings
191 @cindex concatenating strings
192 This function returns a new string consisting of the characters in the
193 arguments passed to it. The arguments may be strings, lists of numbers,
194 or vectors of numbers; they are not themselves changed. If
195 @code{concat} receives no arguments, it returns an empty string.
198 (concat "abc" "-def")
200 (concat "abc" (list 120 (+ 256 121)) [122])
202 ;; @r{@code{nil} is an empty sequence.}
203 (concat "abc" nil "-def")
205 (concat "The " "quick brown " "fox.")
206 @result{} "The quick brown fox."
212 The second example above shows how characters stored in strings are
213 taken modulo 256. In other words, each character in the string is
216 The @code{concat} function always constructs a new string that is
217 not @code{eq} to any existing string.
219 When an argument is an integer (not a sequence of integers), it is
220 converted to a string of digits making up the decimal printed
221 representation of the integer. This special case exists for
222 compatibility with Mocklisp, and we don't recommend you take advantage
223 of it. If you want to convert an integer to digits in this way, use
224 @code{format} (@pxref{Formatting Strings}) or @code{number-to-string}
225 (@pxref{String Conversion}).
236 For information about other concatenation functions, see the
237 description of @code{mapconcat} in @ref{Mapping Functions},
238 @code{vconcat} in @ref{Vectors}, and @code{append} in @ref{Building
242 @node Text Comparison
243 @section Comparison of Characters and Strings
244 @cindex string equality
246 @defun char-equal character1 character2
247 This function returns @code{t} if the arguments represent the same
248 character, @code{nil} otherwise. This function ignores differences
249 in case if @code{case-fold-search} is non-@code{nil}.
254 (char-to-string (+ 256 ?x))
256 (char-equal ?x (+ 256 ?x))
261 @defun string= string1 string2
262 This function returns @code{t} if the characters of the two strings
263 match exactly; case is significant.
266 (string= "abc" "abc")
268 (string= "abc" "ABC")
275 @defun string-equal string1 string2
276 @code{string-equal} is another name for @code{string=}.
279 @cindex lexical comparison
280 @defun string< string1 string2
281 @c (findex string< causes problems for permuted index!!)
282 This function compares two strings a character at a time. First it
283 scans both the strings at once to find the first pair of corresponding
284 characters that do not match. If the lesser character of those two is
285 the character from @var{string1}, then @var{string1} is less, and this
286 function returns @code{t}. If the lesser character is the one from
287 @var{string2}, then @var{string1} is greater, and this function returns
288 @code{nil}. If the two strings match entirely, the value is @code{nil}.
290 Pairs of characters are compared by their @sc{ASCII} codes. Keep in
291 mind that lower case letters have higher numeric values in the
292 @sc{ASCII} character set than their upper case counterparts; numbers and
293 many punctuation characters have a lower numeric value than upper case
298 (string< "abc" "abd")
300 (string< "abd" "abc")
302 (string< "123" "abc")
307 When the strings have different lengths, and they match up to the
308 length of @var{string1}, then the result is @code{t}. If they match up
309 to the length of @var{string2}, the result is @code{nil}. A string of
310 no characters is less than any other string.
328 @defun string-lessp string1 string2
329 @code{string-lessp} is another name for @code{string<}.
332 See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for
333 a way to compare text in buffers. The function @code{string-match},
334 which matches a regular expression against a string, can be used
335 for a kind of string comparison; see @ref{Regexp Search}.
337 @node String Conversion
338 @comment node-name, next, previous, up
339 @section Conversion of Characters and Strings
340 @cindex conversion of strings
342 This section describes functions for conversions between characters,
343 strings and integers. @code{format} and @code{prin1-to-string}
344 (@pxref{Output Functions}) can also convert Lisp objects into strings.
345 @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
346 string representation of a Lisp object into an object.
348 @xref{Documentation}, for functions that produce textual descriptions
349 of text characters and general input events
350 (@code{single-key-description} and @code{text-char-description}). These
351 functions are used primarily for making help messages.
353 @defun char-to-string character
354 @cindex character to string
355 This function returns a new string with a length of one character.
356 The value of @var{character}, modulo 256, is used to initialize the
357 element of the string.
359 This function is similar to @code{make-string} with an integer argument
360 of 1. (@xref{Creating Strings}.) This conversion can also be done with
361 @code{format} using the @samp{%c} format specification.
362 (@xref{Formatting Strings}.)
367 (char-to-string (+ 256 ?x))
374 @defun string-to-char string
375 @cindex string to character
376 This function returns the first character in @var{string}. If the
377 string is empty, the function returns 0. The value is also 0 when the
378 first character of @var{string} is the null character, @sc{ASCII} code
382 (string-to-char "ABC")
384 (string-to-char "xyz")
388 (string-to-char "\000")
392 This function may be eliminated in the future if it does not seem useful
396 @defun number-to-string number
397 @cindex integer to string
398 @cindex integer to decimal
399 This function returns a string consisting of the printed
400 representation of @var{number}, which may be an integer or a floating
401 point number. The value starts with a sign if the argument is
405 (number-to-string 256)
407 (number-to-string -23)
409 (number-to-string -23.5)
413 @cindex int-to-string
414 @code{int-to-string} is a semi-obsolete alias for this function.
416 See also the function @code{format} in @ref{Formatting Strings}.
419 @defun string-to-number string
420 @cindex string to number
421 This function returns the numeric value of the characters in
422 @var{string}, read in base ten. It skips spaces and tabs at the
423 beginning of @var{string}, then reads as much of @var{string} as it can
424 interpret as a number. (On some systems it ignores other whitespace at
425 the beginning, not just spaces and tabs.) If the first character after
426 the ignored whitespace is not a digit or a minus sign, this function
430 (string-to-number "256")
432 (string-to-number "25 is a perfect square.")
434 (string-to-number "X256")
436 (string-to-number "-4.5")
440 @findex string-to-int
441 @code{string-to-int} is an obsolete alias for this function.
444 @node Formatting Strings
445 @comment node-name, next, previous, up
446 @section Formatting Strings
447 @cindex formatting strings
448 @cindex strings, formatting them
450 @dfn{Formatting} means constructing a string by substitution of
451 computed values at various places in a constant string. This string
452 controls how the other values are printed as well as where they appear;
453 it is called a @dfn{format string}.
455 Formatting is often useful for computing messages to be displayed. In
456 fact, the functions @code{message} and @code{error} provide the same
457 formatting feature described here; they differ from @code{format} only
458 in how they use the result of formatting.
460 @defun format string &rest objects
461 This function returns a new string that is made by copying
462 @var{string} and then replacing any format specification
463 in the copy with encodings of the corresponding @var{objects}. The
464 arguments @var{objects} are the computed values to be formatted.
467 @cindex @samp{%} in format
468 @cindex format specification
469 A format specification is a sequence of characters beginning with a
470 @samp{%}. Thus, if there is a @samp{%d} in @var{string}, the
471 @code{format} function replaces it with the printed representation of
472 one of the values to be formatted (one of the arguments @var{objects}).
477 (format "The value of fill-column is %d." fill-column)
478 @result{} "The value of fill-column is 72."
482 If @var{string} contains more than one format specification, the
483 format specifications correspond with successive values from
484 @var{objects}. Thus, the first format specification in @var{string}
485 uses the first such value, the second format specification uses the
486 second such value, and so on. Any extra format specifications (those
487 for which there are no corresponding values) cause unpredictable
488 behavior. Any extra values to be formatted are ignored.
490 Certain format specifications require values of particular types.
491 However, no error is signaled if the value actually supplied fails to
492 have the expected type. Instead, the output is likely to be
495 Here is a table of valid format specifications:
499 Replace the specification with the printed representation of the object,
500 made without quoting. Thus, strings are represented by their contents
501 alone, with no @samp{"} characters, and symbols appear without @samp{\}
504 If there is no corresponding object, the empty string is used.
507 Replace the specification with the printed representation of the object,
508 made with quoting. Thus, strings are enclosed in @samp{"} characters,
509 and @samp{\} characters appear where necessary before special characters.
511 If there is no corresponding object, the empty string is used.
514 @cindex integer to octal
515 Replace the specification with the base-eight representation of an
519 Replace the specification with the base-ten representation of an
523 @cindex integer to hexadecimal
524 Replace the specification with the base-sixteen representation of an
528 Replace the specification with the character which is the value given.
531 Replace the specification with the exponential notation for a floating
535 Replace the specification with the decimal-point notation for a floating
539 Replace the specification with notation for a floating point number,
540 using either exponential notation or decimal-point notation whichever
544 A single @samp{%} is placed in the string. This format specification is
545 unusual in that it does not use a value. For example, @code{(format "%%
546 %d" 30)} returns @code{"% 30"}.
549 Any other format character results in an @samp{Invalid format
552 Here are several examples:
556 (format "The name of this buffer is %s." (buffer-name))
557 @result{} "The name of this buffer is strings.texi."
559 (format "The buffer object prints as %s." (current-buffer))
560 @result{} "The buffer object prints as #<buffer strings.texi>."
562 (format "The octal value of %d is %o,
563 and the hex value is %x." 18 18 18)
564 @result{} "The octal value of 18 is 22,
565 and the hex value is 12."
569 @cindex numeric prefix
572 All the specification characters allow an optional numeric prefix
573 between the @samp{%} and the character. The optional numeric prefix
574 defines the minimum width for the object. If the printed representation
575 of the object contains fewer characters than this, then it is padded.
576 The padding is on the left if the prefix is positive (or starts with
577 zero) and on the right if the prefix is negative. The padding character
578 is normally a space, but if the numeric prefix starts with a zero, zeros
579 are used for padding.
582 (format "%06d is padded on the left with zeros" 123)
583 @result{} "000123 is padded on the left with zeros"
585 (format "%-6d is padded on the right" 123)
586 @result{} "123 is padded on the right"
589 @code{format} never truncates an object's printed representation, no
590 matter what width you specify. Thus, you can use a numeric prefix to
591 specify a minimum spacing between columns with no risk of losing
594 In the following three examples, @samp{%7s} specifies a minimum width
595 of 7. In the first case, the string inserted in place of @samp{%7s} has
596 only 3 letters, so 4 blank spaces are inserted for padding. In the
597 second case, the string @code{"specification"} is 13 letters wide but is
598 not truncated. In the third case, the padding is on the right.
602 (format "The word `%7s' actually has %d letters in it."
603 "foo" (length "foo"))
604 @result{} "The word ` foo' actually has 3 letters in it."
608 (format "The word `%7s' actually has %d letters in it."
609 "specification" (length "specification"))
610 @result{} "The word `specification' actually has 13 letters in it."
614 (format "The word `%-7s' actually has %d letters in it."
615 "foo" (length "foo"))
616 @result{} "The word `foo ' actually has 3 letters in it."
621 @comment node-name, next, previous, up
622 @section Character Case
625 @cindex character case
627 The character case functions change the case of single characters or
628 of the contents of strings. The functions convert only alphabetic
629 characters (the letters @samp{A} through @samp{Z} and @samp{a} through
630 @samp{z}); other characters are not altered. The functions do not
631 modify the strings that are passed to them as arguments.
633 The examples below use the characters @samp{X} and @samp{x} which have
634 @sc{ASCII} codes 88 and 120 respectively.
636 @defun downcase string-or-char
637 This function converts a character or a string to lower case.
639 When the argument to @code{downcase} is a string, the function creates
640 and returns a new string in which each letter in the argument that is
641 upper case is converted to lower case. When the argument to
642 @code{downcase} is a character, @code{downcase} returns the
643 corresponding lower case character. This value is an integer. If the
644 original character is lower case, or is not a letter, then the value
645 equals the original character.
648 (downcase "The cat in the hat")
649 @result{} "the cat in the hat"
656 @defun upcase string-or-char
657 This function converts a character or a string to upper case.
659 When the argument to @code{upcase} is a string, the function creates
660 and returns a new string in which each letter in the argument that is
661 lower case is converted to upper case.
663 When the argument to @code{upcase} is a character, @code{upcase}
664 returns the corresponding upper case character. This value is an integer.
665 If the original character is upper case, or is not a letter, then the
666 value equals the original character.
669 (upcase "The cat in the hat")
670 @result{} "THE CAT IN THE HAT"
677 @defun capitalize string-or-char
678 @cindex capitalization
679 This function capitalizes strings or characters. If
680 @var{string-or-char} is a string, the function creates and returns a new
681 string, whose contents are a copy of @var{string-or-char} in which each
682 word has been capitalized. This means that the first character of each
683 word is converted to upper case, and the rest are converted to lower
686 The definition of a word is any sequence of consecutive characters that
687 are assigned to the word constituent syntax class in the current syntax
688 table (@xref{Syntax Class Table}).
690 When the argument to @code{capitalize} is a character, @code{capitalize}
691 has the same result as @code{upcase}.
694 (capitalize "The cat in the hat")
695 @result{} "The Cat In The Hat"
697 (capitalize "THE 77TH-HATTED CAT")
698 @result{} "The 77th-Hatted Cat"
708 @section The Case Table
710 You can customize case conversion by installing a special @dfn{case
711 table}. A case table specifies the mapping between upper case and lower
712 case letters. It affects both the string and character case conversion
713 functions (see the previous section) and those that apply to text in the
714 buffer (@pxref{Case Changes}). You need a case table if you are using a
715 language which has letters other than the standard @sc{ASCII} letters.
717 A case table is a list of this form:
720 (@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences})
724 where each element is either @code{nil} or a string of length 256. The
725 element @var{downcase} says how to map each character to its lower-case
726 equivalent. The element @var{upcase} maps each character to its
727 upper-case equivalent. If lower and upper case characters are in
728 one-to-one correspondence, use @code{nil} for @var{upcase}; then Emacs
729 deduces the upcase table from @var{downcase}.
731 For some languages, upper and lower case letters are not in one-to-one
732 correspondence. There may be two different lower case letters with the
733 same upper case equivalent. In these cases, you need to specify the
734 maps for both directions.
736 The element @var{canonicalize} maps each character to a canonical
737 equivalent; any two characters that are related by case-conversion have
738 the same canonical equivalent character.
740 The element @var{equivalences} is a map that cyclicly permutes each
741 equivalence class (of characters with the same canonical equivalent).
742 (For ordinary @sc{ASCII}, this would map @samp{a} into @samp{A} and
743 @samp{A} into @samp{a}, and likewise for each set of equivalent
746 When you construct a case table, you can provide @code{nil} for
747 @var{canonicalize}; then Emacs fills in this string from @var{upcase}
748 and @var{downcase}. You can also provide @code{nil} for
749 @var{equivalences}; then Emacs fills in this string from
750 @var{canonicalize}. In a case table that is actually in use, those
751 components are non-@code{nil}. Do not try to specify @var{equivalences}
752 without also specifying @var{canonicalize}.
754 Each buffer has a case table. Emacs also has a @dfn{standard case
755 table} which is copied into each buffer when you create the buffer.
756 Changing the standard case table doesn't affect any existing buffers.
758 Here are the functions for working with case tables:
760 @defun case-table-p object
761 This predicate returns non-@code{nil} if @var{object} is a valid case
765 @defun set-standard-case-table table
766 This function makes @var{table} the standard case table, so that it will
767 apply to any buffers created subsequently.
770 @defun standard-case-table
771 This returns the standard case table.
774 @defun current-case-table
775 This function returns the current buffer's case table.
778 @defun set-case-table table
779 This sets the current buffer's case table to @var{table}.
782 The following three functions are convenient subroutines for packages
783 that define non-@sc{ASCII} character sets. They modify a string
784 @var{downcase-table} provided as an argument; this should be a string to
785 be used as the @var{downcase} part of a case table. They also modify
786 the standard syntax table. @xref{Syntax Tables}.
788 @defun set-case-syntax-pair uc lc downcase-table
789 This function specifies a pair of corresponding letters, one upper case
793 @defun set-case-syntax-delims l r downcase-table
794 This function makes characters @var{l} and @var{r} a matching pair of
795 case-invariant delimiters.
798 @defun set-case-syntax char syntax downcase-table
799 This function makes @var{char} case-invariant, with syntax
803 @deffn Command describe-buffer-case-table
804 This command displays a description of the contents of the current
810 You can load the library @file{iso-syntax} to set up the standard syntax
811 table and define a case table for the 8-bit ISO Latin 1 character set.