lispref/strings.texi

   1 @c -*-texinfo-*-
   2 @c This is part of the GNU Emacs Lisp Reference Manual.
   3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
   4 @c See the file elisp.texi for copying conditions.
   5 @setfilename ../info/strings
   6 @node Strings and Characters, Lists, Numbers, Top
   7 @comment  node-name,  next,  previous,  up
   8 @chapter Strings and Characters
   9 @cindex strings
  10 @cindex character arrays
  11 @cindex characters
  12 @cindex bytes
  13
  14   A string in Emacs Lisp is an array that contains an ordered sequence
  15 of characters.  Strings are used as names of symbols, buffers, and
  16 files, to send messages to users, to hold text being copied between
  17 buffers, and for many other purposes.  Because strings are so important,
  18 Emacs Lisp has many functions expressly for manipulating them.  Emacs
  19 Lisp programs use strings more often than individual characters.
  20
  21   @xref{Strings of Events}, for special considerations for strings of
  22 keyboard character events.
  23
  24 @menu
  25 * Basics: String Basics.      Basic properties of strings and characters.
  26 * Predicates for Strings::    Testing whether an object is a string or char.
  27 * Creating Strings::          Functions to allocate new strings.
  28 * Text Comparison::           Comparing characters or strings.
  29 * String Conversion::         Converting characters or strings and vice versa.
  30 * Formatting Strings::        @code{format}: Emacs's analog of @code{printf}.
  31 * Character Case::            Case conversion functions.
  32 * Case Table::                Customizing case conversion.
  33 @end menu
  34
  35 @node String Basics
  36 @section String and Character Basics
  37
  38   Strings in Emacs Lisp are arrays that contain an ordered sequence of
  39 characters.  Characters are represented in Emacs Lisp as integers;
  40 whether an integer was intended as a character or not is determined only
  41 by how it is used.  Thus, strings really contain integers.
  42
  43   The length of a string (like any array) is fixed and independent of
  44 the string contents, and cannot be altered.  Strings in Lisp are
  45 @emph{not} terminated by a distinguished character code.  (By contrast,
  46 strings in C are terminated by a character with @sc{ASCII} code 0.)
  47 This means that any character, including the null character (@sc{ASCII}
  48 code 0), is a valid element of a string.@refill
  49
  50   Since strings are considered arrays, you can operate on them with the
  51 general array functions.  (@xref{Sequences Arrays Vectors}.)  For
  52 example, you can access or change individual characters in a string
  53 using the functions @code{aref} and @code{aset} (@pxref{Array
  54 Functions}).
  55
  56   Each character in a string is stored in a single byte.  Therefore,
  57 numbers not in the range 0 to 255 are truncated when stored into a
  58 string.  This means that a string takes up much less memory than a
  59 vector of the same length.
  60
  61   Sometimes key sequences are represented as strings.  When a string is
  62 a key sequence, string elements in the range 128 to 255 represent meta
  63 characters (which are extremely large integers) rather than keyboard
  64 events in the range 128 to 255.
  65
  66   Strings cannot hold characters that have the hyper, super or alt
  67 modifiers; they can hold @sc{ASCII} control characters, but no other
  68 control characters.  They do not distinguish case in @sc{ASCII} control
  69 characters.  @xref{Character Type}, for more information about
  70 representation of meta and other modifiers for keyboard input
  71 characters.
  72
  73   Like a buffer, a string can contain text properties for the characters
  74 in it, as well as the characters themselves.  @xref{Text Properties}.
  75
  76   @xref{Text}, for information about functions that display strings or
  77 copy them into buffers.  @xref{Character Type}, and @ref{String Type},
  78 for information about the syntax of characters and strings.
  79
  80 @node Predicates for Strings
  81 @section The Predicates for Strings
  82
  83 For more information about general sequence and array predicates,
  84 see @ref{Sequences Arrays Vectors}, and @ref{Arrays}.
  85
  86 @defun stringp object
  87   This function returns @code{t} if @var{object} is a string, @code{nil}
  88 otherwise.
  89 @end defun
  90
  91 @defun char-or-string-p object
  92   This function returns @code{t} if @var{object} is a string or a
  93 character (i.e., an integer), @code{nil} otherwise.
  94 @end defun
  95
  96 @node Creating Strings
  97 @section Creating Strings
  98
  99   The following functions create strings, either from scratch, or by
 100 putting strings together, or by taking them apart.
 101
 102 @defun make-string count character
 103   This function returns a string made up of @var{count} repetitions of
 104 @var{character}.  If @var{count} is negative, an error is signaled.
 105
 106 @example
 107 (make-string 5 ?x)
 108      @result{} "xxxxx"
 109 (make-string 0 ?x)
 110      @result{} ""
 111 @end example
 112
 113   Other functions to compare with this one include @code{char-to-string}
 114 (@pxref{String Conversion}), @code{make-vector} (@pxref{Vectors}), and
 115 @code{make-list} (@pxref{Building Lists}).
 116 @end defun
 117
 118 @defun substring string start &optional end
 119   This function returns a new string which consists of those characters
 120 from @var{string} in the range from (and including) the character at the
 121 index @var{start} up to (but excluding) the character at the index
 122 @var{end}.  The first character is at index zero.
 123
 124 @example
 125 @group
 126 (substring "abcdefg" 0 3)
 127      @result{} "abc"
 128 @end group
 129 @end example
 130
 131 @noindent
 132 Here the index for @samp{a} is 0, the index for @samp{b} is 1, and the
 133 index for @samp{c} is 2.  Thus, three letters, @samp{abc}, are copied
 134 from the string @code{"abcdefg"}.  The index 3 marks the character
 135 position up to which the substring is copied.  The character whose index
 136 is 3 is actually the fourth character in the string.
 137
 138 A negative number counts from the end of the string, so that @minus{}1
 139 signifies the index of the last character of the string.  For example:
 140
 141 @example
 142 @group
 143 (substring "abcdefg" -3 -1)
 144      @result{} "ef"
 145 @end group
 146 @end example
 147
 148 @noindent
 149 In this example, the index for @samp{e} is @minus{}3, the index for
 150 @samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1.
 151 Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded.
 152
 153 When @code{nil} is used as an index, it stands for the length of the
 154 string.  Thus,
 155
 156 @example
 157 @group
 158 (substring "abcdefg" -3 nil)
 159      @result{} "efg"
 160 @end group
 161 @end example
 162
 163 Omitting the argument @var{end} is equivalent to specifying @code{nil}.
 164 It follows that @code{(substring @var{string} 0)} returns a copy of all
 165 of @var{string}.
 166
 167 @example
 168 @group
 169 (substring "abcdefg" 0)
 170      @result{} "abcdefg"
 171 @end group
 172 @end example
 173
 174 @noindent
 175 But we recommend @code{copy-sequence} for this purpose (@pxref{Sequence
 176 Functions}).
 177
 178 A @code{wrong-type-argument} error is signaled if either @var{start} or
 179 @var{end} is not an integer or @code{nil}.  An @code{args-out-of-range}
 180 error is signaled if @var{start} indicates a character following
 181 @var{end}, or if either integer is out of range for @var{string}.
 182
 183 Contrast this function with @code{buffer-substring} (@pxref{Buffer
 184 Contents}), which returns a string containing a portion of the text in
 185 the current buffer.  The beginning of a string is at index 0, but the
 186 beginning of a buffer is at index 1.
 187 @end defun
 188
 189 @defun concat &rest sequences
 190 @cindex copying strings
 191 @cindex concatenating strings
 192 This function returns a new string consisting of the characters in the
 193 arguments passed to it.  The arguments may be strings, lists of numbers,
 194 or vectors of numbers; they are not themselves changed.  If
 195 @code{concat} receives no arguments, it returns an empty string.
 196
 197 @example
 198 (concat "abc" "-def")
 199      @result{} "abc-def"
 200 (concat "abc" (list 120 (+ 256 121)) [122])
 201      @result{} "abcxyz"
 202 ;; @r{@code{nil} is an empty sequence.}
 203 (concat "abc" nil "-def")
 204      @result{} "abc-def"
 205 (concat "The " "quick brown " "fox.")
 206      @result{} "The quick brown fox."
 207 (concat)
 208      @result{} ""
 209 @end example
 210
 211 @noindent
 212 The second example above shows how characters stored in strings are
 213 taken modulo 256.  In other words, each character in the string is
 214 stored in one byte.
 215
 216 The @code{concat} function always constructs a new string that is
 217 not @code{eq} to any existing string.
 218
 219 When an argument is an integer (not a sequence of integers), it is
 220 converted to a string of digits making up the decimal printed
 221 representation of the integer.  This special case exists for
 222 compatibility with Mocklisp, and we don't recommend you take advantage
 223 of it.  If you want to convert an integer to digits in this way, use
 224 @code{format} (@pxref{Formatting Strings}) or @code{number-to-string}
 225 (@pxref{String Conversion}).
 226
 227 @example
 228 @group
 229 (concat 137)
 230      @result{} "137"
 231 (concat 54 321)
 232      @result{} "54321"
 233 @end group
 234 @end example
 235
 236 For information about other concatenation functions, see the
 237 description of @code{mapconcat} in @ref{Mapping Functions},
 238 @code{vconcat} in @ref{Vectors}, and @code{append} in @ref{Building
 239 Lists}.
 240 @end defun
 241
 242 @node Text Comparison
 243 @section Comparison of Characters and Strings
 244 @cindex string equality
 245
 246 @defun char-equal character1 character2
 247 This function returns @code{t} if the arguments represent the same
 248 character, @code{nil} otherwise.  This function ignores differences
 249 in case if @code{case-fold-search} is non-@code{nil}.
 250
 251 @example
 252 (char-equal ?x ?x)
 253      @result{} t
 254 (char-to-string (+ 256 ?x))
 255      @result{} "x"
 256 (char-equal ?x  (+ 256 ?x))
 257      @result{} t
 258 @end example
 259 @end defun
 260
 261 @defun string= string1 string2
 262 This function returns @code{t} if the characters of the two strings
 263 match exactly; case is significant.
 264
 265 @example
 266 (string= "abc" "abc")
 267      @result{} t
 268 (string= "abc" "ABC")
 269      @result{} nil
 270 (string= "ab" "ABC")
 271      @result{} nil
 272 @end example
 273 @end defun
 274
 275 @defun string-equal string1 string2
 276 @code{string-equal} is another name for @code{string=}.
 277 @end defun
 278
 279 @cindex lexical comparison
 280 @defun string< string1 string2
 281 @c (findex string< causes problems for permuted index!!)
 282 This function compares two strings a character at a time.  First it
 283 scans both the strings at once to find the first pair of corresponding
 284 characters that do not match.  If the lesser character of those two is
 285 the character from @var{string1}, then @var{string1} is less, and this
 286 function returns @code{t}.  If the lesser character is the one from
 287 @var{string2}, then @var{string1} is greater, and this function returns
 288 @code{nil}.  If the two strings match entirely, the value is @code{nil}.
 289
 290 Pairs of characters are compared by their @sc{ASCII} codes.  Keep in
 291 mind that lower case letters have higher numeric values in the
 292 @sc{ASCII} character set than their upper case counterparts; numbers and
 293 many punctuation characters have a lower numeric value than upper case
 294 letters.
 295
 296 @example
 297 @group
 298 (string< "abc" "abd")
 299      @result{} t
 300 (string< "abd" "abc")
 301      @result{} nil
 302 (string< "123" "abc")
 303      @result{} t
 304 @end group
 305 @end example
 306
 307 When the strings have different lengths, and they match up to the
 308 length of @var{string1}, then the result is @code{t}.  If they match up
 309 to the length of @var{string2}, the result is @code{nil}.  A string of
 310 no characters is less than any other string.
 311
 312 @example
 313 @group
 314 (string< "" "abc")
 315      @result{} t
 316 (string< "ab" "abc")
 317      @result{} t
 318 (string< "abc" "")
 319      @result{} nil
 320 (string< "abc" "ab")
 321      @result{} nil
 322 (string< "" "")
 323      @result{} nil
 324 @end group
 325 @end example
 326 @end defun
 327
 328 @defun string-lessp string1 string2
 329 @code{string-lessp} is another name for @code{string<}.
 330 @end defun
 331
 332   See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for
 333 a way to compare text in buffers.  The function @code{string-match},
 334 which matches a regular expression against a string, can be used
 335 for a kind of string comparison; see @ref{Regexp Search}.
 336
 337 @node String Conversion
 338 @comment  node-name,  next,  previous,  up
 339 @section Conversion of Characters and Strings
 340 @cindex conversion of strings
 341
 342   This section describes functions for conversions between characters,
 343 strings and integers.  @code{format} and @code{prin1-to-string}
 344 (@pxref{Output Functions}) can also convert Lisp objects into strings.
 345 @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
 346 string representation of a Lisp object into an object.
 347
 348   @xref{Documentation}, for functions that produce textual descriptions
 349 of text characters and general input events
 350 (@code{single-key-description} and @code{text-char-description}).  These
 351 functions are used primarily for making help messages.
 352
 353 @defun char-to-string character
 354 @cindex character to string
 355   This function returns a new string with a length of one character.
 356 The value of @var{character}, modulo 256, is used to initialize the
 357 element of the string.
 358
 359 This function is similar to @code{make-string} with an integer argument
 360 of 1.  (@xref{Creating Strings}.)  This conversion can also be done with
 361 @code{format} using the @samp{%c} format specification.
 362 (@xref{Formatting Strings}.)
 363
 364 @example
 365 (char-to-string ?x)
 366      @result{} "x"
 367 (char-to-string (+ 256 ?x))
 368      @result{} "x"
 369 (make-string 1 ?x)
 370      @result{} "x"
 371 @end example
 372 @end defun
 373
 374 @defun string-to-char string
 375 @cindex string to character
 376   This function returns the first character in @var{string}.  If the
 377 string is empty, the function returns 0.  The value is also 0 when the
 378 first character of @var{string} is the null character, @sc{ASCII} code
 379 0.
 380
 381 @example
 382 (string-to-char "ABC")
 383      @result{} 65
 384 (string-to-char "xyz")
 385      @result{} 120
 386 (string-to-char "")
 387      @result{} 0
 388 (string-to-char "\000")
 389      @result{} 0
 390 @end example
 391
 392 This function may be eliminated in the future if it does not seem useful
 393 enough to retain.
 394 @end defun
 395
 396 @defun number-to-string number
 397 @cindex integer to string
 398 @cindex integer to decimal
 399 This function returns a string consisting of the printed
 400 representation of @var{number}, which may be an integer or a floating
 401 point number.  The value starts with a sign if the argument is
 402 negative.
 403
 404 @example
 405 (number-to-string 256)
 406      @result{} "256"
 407 (number-to-string -23)
 408      @result{} "-23"
 409 (number-to-string -23.5)
 410      @result{} "-23.5"
 411 @end example
 412
 413 @cindex int-to-string
 414 @code{int-to-string} is a semi-obsolete alias for this function.
 415
 416 See also the function @code{format} in @ref{Formatting Strings}.
 417 @end defun
 418
 419 @defun string-to-number string
 420 @cindex string to number
 421 This function returns the numeric value of the characters in
 422 @var{string}, read in base ten.  It skips spaces and tabs at the
 423 beginning of @var{string}, then reads as much of @var{string} as it can
 424 interpret as a number.  (On some systems it ignores other whitespace at
 425 the beginning, not just spaces and tabs.)  If the first character after
 426 the ignored whitespace is not a digit or a minus sign, this function
 427 returns 0.
 428
 429 @example
 430 (string-to-number "256")
 431      @result{} 256
 432 (string-to-number "25 is a perfect square.")
 433      @result{} 25
 434 (string-to-number "X256")
 435      @result{} 0
 436 (string-to-number "-4.5")
 437      @result{} -4.5
 438 @end example
 439
 440 @findex string-to-int
 441 @code{string-to-int} is an obsolete alias for this function.
 442 @end defun
 443
 444 @node Formatting Strings
 445 @comment  node-name,  next,  previous,  up
 446 @section Formatting Strings
 447 @cindex formatting strings
 448 @cindex strings, formatting them
 449
 450   @dfn{Formatting} means constructing a string by substitution of
 451 computed values at various places in a constant string.  This string
 452 controls how the other values are printed as well as where they appear;
 453 it is called a @dfn{format string}.
 454
 455   Formatting is often useful for computing messages to be displayed.  In
 456 fact, the functions @code{message} and @code{error} provide the same
 457 formatting feature described here; they differ from @code{format} only
 458 in how they use the result of formatting.
 459
 460 @defun format string &rest objects
 461   This function returns a new string that is made by copying
 462 @var{string} and then replacing any format specification
 463 in the copy with encodings of the corresponding @var{objects}.  The
 464 arguments @var{objects} are the computed values to be formatted.
 465 @end defun
 466
 467 @cindex @samp{%} in format
 468 @cindex format specification
 469   A format specification is a sequence of characters beginning with a
 470 @samp{%}.  Thus, if there is a @samp{%d} in @var{string}, the
 471 @code{format} function replaces it with the printed representation of
 472 one of the values to be formatted (one of the arguments @var{objects}).
 473 For example:
 474
 475 @example
 476 @group
 477 (format "The value of fill-column is %d." fill-column)
 478      @result{} "The value of fill-column is 72."
 479 @end group
 480 @end example
 481
 482   If @var{string} contains more than one format specification, the
 483 format specifications correspond with successive values from
 484 @var{objects}.  Thus, the first format specification in @var{string}
 485 uses the first such value, the second format specification uses the
 486 second such value, and so on.  Any extra format specifications (those
 487 for which there are no corresponding values) cause unpredictable
 488 behavior.  Any extra values to be formatted are ignored.
 489
 490   Certain format specifications require values of particular types.
 491 However, no error is signaled if the value actually supplied fails to
 492 have the expected type.  Instead, the output is likely to be
 493 meaningless.
 494
 495   Here is a table of valid format specifications:
 496
 497 @table @samp
 498 @item %s
 499 Replace the specification with the printed representation of the object,
 500 made without quoting.  Thus, strings are represented by their contents
 501 alone, with no @samp{"} characters, and symbols appear without @samp{\}
 502 characters.
 503
 504 If there is no corresponding object, the empty string is used.
 505
 506 @item %S
 507 Replace the specification with the printed representation of the object,
 508 made with quoting.  Thus, strings are enclosed in @samp{"} characters,
 509 and @samp{\} characters appear where necessary before special characters.
 510
 511 If there is no corresponding object, the empty string is used.
 512
 513 @item %o
 514 @cindex integer to octal
 515 Replace the specification with the base-eight representation of an
 516 integer.
 517
 518 @item %d
 519 Replace the specification with the base-ten representation of an
 520 integer.
 521
 522 @item %x
 523 @cindex integer to hexadecimal
 524 Replace the specification with the base-sixteen representation of an
 525 integer.
 526
 527 @item %c
 528 Replace the specification with the character which is the value given.
 529
 530 @item %e
 531 Replace the specification with the exponential notation for a floating
 532 point number.
 533
 534 @item %f
 535 Replace the specification with the decimal-point notation for a floating
 536 point number.
 537
 538 @item %g
 539 Replace the specification with notation for a floating point number,
 540 using either exponential notation or decimal-point notation whichever
 541 is shorter.
 542
 543 @item %%
 544 A single @samp{%} is placed in the string.  This format specification is
 545 unusual in that it does not use a value.  For example, @code{(format "%%
 546 %d" 30)} returns @code{"% 30"}.
 547 @end table
 548
 549   Any other format character results in an @samp{Invalid format
 550 operation} error.
 551
 552   Here are several examples:
 553
 554 @example
 555 @group
 556 (format "The name of this buffer is %s." (buffer-name))
 557      @result{} "The name of this buffer is strings.texi."
 558
 559 (format "The buffer object prints as %s." (current-buffer))
 560      @result{} "The buffer object prints as #<buffer strings.texi>."
 561
 562 (format "The octal value of %d is %o,
 563          and the hex value is %x." 18 18 18)
 564      @result{} "The octal value of 18 is 22,
 565          and the hex value is 12."
 566 @end group
 567 @end example
 568
 569 @cindex numeric prefix
 570 @cindex field width
 571 @cindex padding
 572   All the specification characters allow an optional numeric prefix
 573 between the @samp{%} and the character.  The optional numeric prefix
 574 defines the minimum width for the object.  If the printed representation
 575 of the object contains fewer characters than this, then it is padded.
 576 The padding is on the left if the prefix is positive (or starts with
 577 zero) and on the right if the prefix is negative.  The padding character
 578 is normally a space, but if the numeric prefix starts with a zero, zeros
 579 are used for padding.
 580
 581 @example
 582 (format "%06d is padded on the left with zeros" 123)
 583      @result{} "000123 is padded on the left with zeros"
 584
 585 (format "%-6d is padded on the right" 123)
 586      @result{} "123    is padded on the right"
 587 @end example
 588
 589   @code{format} never truncates an object's printed representation, no
 590 matter what width you specify.  Thus, you can use a numeric prefix to
 591 specify a minimum spacing between columns with no risk of losing
 592 information.
 593
 594   In the following three examples, @samp{%7s} specifies a minimum width
 595 of 7.  In the first case, the string inserted in place of @samp{%7s} has
 596 only 3 letters, so 4 blank spaces are inserted for padding.  In the
 597 second case, the string @code{"specification"} is 13 letters wide but is
 598 not truncated.  In the third case, the padding is on the right.
 599
 600 @smallexample
 601 @group
 602 (format "The word `%7s' actually has %d letters in it."
 603         "foo" (length "foo"))
 604      @result{} "The word `    foo' actually has 3 letters in it."
 605 @end group
 606
 607 @group
 608 (format "The word `%7s' actually has %d letters in it."
 609         "specification" (length "specification"))
 610      @result{} "The word `specification' actually has 13 letters in it."
 611 @end group
 612
 613 @group
 614 (format "The word `%-7s' actually has %d letters in it."
 615         "foo" (length "foo"))
 616      @result{} "The word `foo    ' actually has 3 letters in it."
 617 @end group
 618 @end smallexample
 619
 620 @node Character Case
 621 @comment node-name, next, previous, up
 622 @section Character Case
 623 @cindex upper case
 624 @cindex lower case
 625 @cindex character case
 626
 627   The character case functions change the case of single characters or
 628 of the contents of strings.  The functions convert only alphabetic
 629 characters (the letters @samp{A} through @samp{Z} and @samp{a} through
 630 @samp{z}); other characters are not altered.  The functions do not
 631 modify the strings that are passed to them as arguments.
 632
 633   The examples below use the characters @samp{X} and @samp{x} which have
 634 @sc{ASCII} codes 88 and 120 respectively.
 635
 636 @defun downcase string-or-char
 637 This function converts a character or a string to lower case.
 638
 639 When the argument to @code{downcase} is a string, the function creates
 640 and returns a new string in which each letter in the argument that is
 641 upper case is converted to lower case.  When the argument to
 642 @code{downcase} is a character, @code{downcase} returns the
 643 corresponding lower case character.  This value is an integer.  If the
 644 original character is lower case, or is not a letter, then the value
 645 equals the original character.
 646
 647 @example
 648 (downcase "The cat in the hat")
 649      @result{} "the cat in the hat"
 650
 651 (downcase ?X)
 652      @result{} 120
 653 @end example
 654 @end defun
 655
 656 @defun upcase string-or-char
 657 This function converts a character or a string to upper case.
 658
 659 When the argument to @code{upcase} is a string, the function creates
 660 and returns a new string in which each letter in the argument that is
 661 lower case is converted to upper case.
 662
 663 When the argument to @code{upcase} is a character, @code{upcase}
 664 returns the corresponding upper case character.  This value is an integer.
 665 If the original character is upper case, or is not a letter, then the
 666 value equals the original character.
 667
 668 @example
 669 (upcase "The cat in the hat")
 670      @result{} "THE CAT IN THE HAT"
 671
 672 (upcase ?x)
 673      @result{} 88
 674 @end example
 675 @end defun
 676
 677 @defun capitalize string-or-char
 678 @cindex capitalization
 679 This function capitalizes strings or characters.  If
 680 @var{string-or-char} is a string, the function creates and returns a new
 681 string, whose contents are a copy of @var{string-or-char} in which each
 682 word has been capitalized.  This means that the first character of each
 683 word is converted to upper case, and the rest are converted to lower
 684 case.
 685
 686 The definition of a word is any sequence of consecutive characters that
 687 are assigned to the word constituent syntax class in the current syntax
 688 table (@xref{Syntax Class Table}).
 689
 690 When the argument to @code{capitalize} is a character, @code{capitalize}
 691 has the same result as @code{upcase}.
 692
 693 @example
 694 (capitalize "The cat in the hat")
 695      @result{} "The Cat In The Hat"
 696
 697 (capitalize "THE 77TH-HATTED CAT")
 698      @result{} "The 77th-Hatted Cat"
 699
 700 @group
 701 (capitalize ?x)
 702      @result{} 88
 703 @end group
 704 @end example
 705 @end defun
 706
 707 @node Case Table
 708 @section The Case Table
 709
 710   You can customize case conversion by installing a special @dfn{case
 711 table}.  A case table specifies the mapping between upper case and lower
 712 case letters.  It affects both the string and character case conversion
 713 functions (see the previous section) and those that apply to text in the
 714 buffer (@pxref{Case Changes}).  You need a case table if you are using a
 715 language which has letters other than the standard @sc{ASCII} letters.
 716
 717   A case table is a list of this form:
 718
 719 @example
 720 (@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences})
 721 @end example
 722
 723 @noindent
 724 where each element is either @code{nil} or a string of length 256.  The
 725 element @var{downcase} says how to map each character to its lower-case
 726 equivalent.  The element @var{upcase} maps each character to its
 727 upper-case equivalent.  If lower and upper case characters are in
 728 one-to-one correspondence, use @code{nil} for @var{upcase}; then Emacs
 729 deduces the upcase table from @var{downcase}.
 730
 731   For some languages, upper and lower case letters are not in one-to-one
 732 correspondence.  There may be two different lower case letters with the
 733 same upper case equivalent.  In these cases, you need to specify the
 734 maps for both directions.
 735
 736   The element @var{canonicalize} maps each character to a canonical
 737 equivalent; any two characters that are related by case-conversion have
 738 the same canonical equivalent character.
 739
 740   The element @var{equivalences} is a map that cyclicly permutes each
 741 equivalence class (of characters with the same canonical equivalent).
 742 (For ordinary @sc{ASCII}, this would map @samp{a} into @samp{A} and
 743 @samp{A} into @samp{a}, and likewise for each set of equivalent
 744 characters.)
 745
 746   When you construct a case table, you can provide @code{nil} for
 747 @var{canonicalize}; then Emacs fills in this string from @var{upcase}
 748 and @var{downcase}.  You can also provide @code{nil} for
 749 @var{equivalences}; then Emacs fills in this string from
 750 @var{canonicalize}.  In a case table that is actually in use, those
 751 components are non-@code{nil}.  Do not try to specify @var{equivalences}
 752 without also specifying @var{canonicalize}.
 753
 754   Each buffer has a case table.  Emacs also has a @dfn{standard case
 755 table} which is copied into each buffer when you create the buffer.
 756 Changing the standard case table doesn't affect any existing buffers.
 757
 758   Here are the functions for working with case tables:
 759
 760 @defun case-table-p object
 761 This predicate returns non-@code{nil} if @var{object} is a valid case
 762 table.
 763 @end defun
 764
 765 @defun set-standard-case-table table
 766 This function makes @var{table} the standard case table, so that it will
 767 apply to any buffers created subsequently.
 768 @end defun
 769
 770 @defun standard-case-table
 771 This returns the standard case table.
 772 @end defun
 773
 774 @defun current-case-table
 775 This function returns the current buffer's case table.
 776 @end defun
 777
 778 @defun set-case-table table
 779 This sets the current buffer's case table to @var{table}.
 780 @end defun
 781
 782   The following three functions are convenient subroutines for packages
 783 that define non-@sc{ASCII} character sets.  They modify a string
 784 @var{downcase-table} provided as an argument; this should be a string to
 785 be used as the @var{downcase} part of a case table.  They also modify
 786 the standard syntax table.  @xref{Syntax Tables}.
 787
 788 @defun set-case-syntax-pair uc lc downcase-table
 789 This function specifies a pair of corresponding letters, one upper case
 790 and one lower case.
 791 @end defun
 792
 793 @defun set-case-syntax-delims l r downcase-table
 794 This function makes characters @var{l} and @var{r} a matching pair of
 795 case-invariant delimiters.
 796 @end defun
 797
 798 @defun set-case-syntax char syntax downcase-table
 799 This function makes @var{char} case-invariant, with syntax
 800 @var{syntax}.
 801 @end defun
 802
 803 @deffn Command describe-buffer-case-table
 804 This command displays a description of the contents of the current
 805 buffer's case table.
 806 @end deffn
 807
 808 @cindex ISO Latin 1
 809 @pindex iso-syntax
 810 You can load the library @file{iso-syntax} to set up the standard syntax
 811 table and define a case table for the 8-bit ISO Latin 1 character set.