1 @node Character Handling, String and Array Utilities, Memory Allocation, Top
2 @c %MENU% Character testing and conversion functions
3 @chapter Character Handling
5 Programs that work with characters and strings often need to classify a
6 character---is it alphabetic, is it a digit, is it whitespace, and so
7 on---and perform case conversion operations on characters. The
8 functions in the header file @file{ctype.h} are provided for this
12 Since the choice of locale and character set can alter the
13 classifications of particular character codes, all of these functions
14 are affected by the current locale. (More precisely, they are affected
15 by the locale currently selected for character classification---the
16 @code{LC_CTYPE} category; see @ref{Locale Categories}.)
18 The @w{ISO C} standard specifies two different sets of functions. The
19 one set works on @code{char} type characters, the other one on
20 @code{wchar_t} wide character (@pxref{Extended Char Intro}).
23 * Classification of Characters:: Testing whether characters are
24 letters, digits, punctuation, etc.
26 * Case Conversion:: Case mapping, and the like.
27 * Classification of Wide Characters:: Character class determination for
29 * Using Wide Char Classes:: Notes on using the wide character
31 * Wide Character Case Conversion:: Mapping of wide characters.
34 @node Classification of Characters, Case Conversion, , Character Handling
35 @section Classification of Characters
36 @cindex character testing
37 @cindex classification of characters
38 @cindex predicates on characters
39 @cindex character predicates
41 This section explains the library functions for classifying characters.
42 For example, @code{isalpha} is the function to test for an alphabetic
43 character. It takes one argument, the character to test, and returns a
44 nonzero integer if the character is alphabetic, and zero otherwise. You
45 would use it like this:
49 printf ("The character `%c' is alphabetic.\n", c);
52 Each of the functions in this section tests for membership in a
53 particular class of characters; each has a name starting with @samp{is}.
54 Each of them takes one argument, which is a character to test, and
55 returns an @code{int} which is treated as a boolean value. The
56 character argument is passed as an @code{int}, and it may be the
57 constant value @code{EOF} instead of a real character.
59 The attributes of any given character can vary between locales.
60 @xref{Locales}, for more information on locales.@refill
62 These functions are declared in the header file @file{ctype.h}.
65 @cindex lower-case character
68 @deftypefun int islower (int @var{c})
69 Returns true if @var{c} is a lower-case letter. The letter need not be
70 from the Latin alphabet, any alphabet representable is valid.
73 @cindex upper-case character
76 @deftypefun int isupper (int @var{c})
77 Returns true if @var{c} is an upper-case letter. The letter need not be
78 from the Latin alphabet, any alphabet representable is valid.
81 @cindex alphabetic character
84 @deftypefun int isalpha (int @var{c})
85 Returns true if @var{c} is an alphabetic character (a letter). If
86 @code{islower} or @code{isupper} is true of a character, then
87 @code{isalpha} is also true.
89 In some locales, there may be additional characters for which
90 @code{isalpha} is true---letters which are neither upper case nor lower
91 case. But in the standard @code{"C"} locale, there are no such
92 additional characters.
95 @cindex digit character
96 @cindex decimal digit character
99 @deftypefun int isdigit (int @var{c})
100 Returns true if @var{c} is a decimal digit (@samp{0} through @samp{9}).
103 @cindex alphanumeric character
106 @deftypefun int isalnum (int @var{c})
107 Returns true if @var{c} is an alphanumeric character (a letter or
108 number); in other words, if either @code{isalpha} or @code{isdigit} is
109 true of a character, then @code{isalnum} is also true.
112 @cindex hexadecimal digit character
115 @deftypefun int isxdigit (int @var{c})
116 Returns true if @var{c} is a hexadecimal digit.
117 Hexadecimal digits include the normal decimal digits @samp{0} through
118 @samp{9} and the letters @samp{A} through @samp{F} and
119 @samp{a} through @samp{f}.
122 @cindex punctuation character
125 @deftypefun int ispunct (int @var{c})
126 Returns true if @var{c} is a punctuation character.
127 This means any printing character that is not alphanumeric or a space
131 @cindex whitespace character
134 @deftypefun int isspace (int @var{c})
135 Returns true if @var{c} is a @dfn{whitespace} character. In the standard
136 @code{"C"} locale, @code{isspace} returns true for only the standard
137 whitespace characters:
160 @cindex blank character
163 @deftypefun int isblank (int @var{c})
164 Returns true if @var{c} is a blank character; that is, a space or a tab.
165 This function is a GNU extension.
168 @cindex graphic character
171 @deftypefun int isgraph (int @var{c})
172 Returns true if @var{c} is a graphic character; that is, a character
173 that has a glyph associated with it. The whitespace characters are not
177 @cindex printing character
180 @deftypefun int isprint (int @var{c})
181 Returns true if @var{c} is a printing character. Printing characters
182 include all the graphic characters, plus the space (@samp{ }) character.
185 @cindex control character
188 @deftypefun int iscntrl (int @var{c})
189 Returns true if @var{c} is a control character (that is, a character that
190 is not a printing character).
193 @cindex ASCII character
196 @deftypefun int isascii (int @var{c})
197 Returns true if @var{c} is a 7-bit @code{unsigned char} value that fits
198 into the US/UK ASCII character set. This function is a BSD extension
199 and is also an SVID extension.
202 @node Case Conversion, Classification of Wide Characters, Classification of Characters, Character Handling
203 @section Case Conversion
204 @cindex character case conversion
205 @cindex case conversion of characters
206 @cindex converting case of characters
208 This section explains the library functions for performing conversions
209 such as case mappings on characters. For example, @code{toupper}
210 converts any character to upper case if possible. If the character
211 can't be converted, @code{toupper} returns it unchanged.
213 These functions take one argument of type @code{int}, which is the
214 character to convert, and return the converted character as an
215 @code{int}. If the conversion is not applicable to the argument given,
216 the argument is returned unchanged.
218 @strong{Compatibility Note:} In pre-@w{ISO C} dialects, instead of
219 returning the argument unchanged, these functions may fail when the
220 argument is not suitable for the conversion. Thus for portability, you
221 may need to write @code{islower(c) ? toupper(c) : c} rather than just
224 These functions are declared in the header file @file{ctype.h}.
229 @deftypefun int tolower (int @var{c})
230 If @var{c} is an upper-case letter, @code{tolower} returns the corresponding
231 lower-case letter. If @var{c} is not an upper-case letter,
232 @var{c} is returned unchanged.
237 @deftypefun int toupper (int @var{c})
238 If @var{c} is a lower-case letter, @code{toupper} returns the corresponding
239 upper-case letter. Otherwise @var{c} is returned unchanged.
244 @deftypefun int toascii (int @var{c})
245 This function converts @var{c} to a 7-bit @code{unsigned char} value
246 that fits into the US/UK ASCII character set, by clearing the high-order
247 bits. This function is a BSD extension and is also an SVID extension.
252 @deftypefun int _tolower (int @var{c})
253 This is identical to @code{tolower}, and is provided for compatibility
254 with the SVID. @xref{SVID}.@refill
259 @deftypefun int _toupper (int @var{c})
260 This is identical to @code{toupper}, and is provided for compatibility
265 @node Classification of Wide Characters, Using Wide Char Classes, Case Conversion, Character Handling
266 @section Character class determination for wide characters
268 The second amendment to @w{ISO C89} defines functions to classify wide
269 character. Although the original @w{ISO C89} standard already defined
270 the type @code{wchar_t} but no functions operating on them were defined.
272 The general design of the classification functions for wide characters
273 is more general. It allows to extend the set of available
274 classification beyond the set which is always available. The POSIX
275 standard specifies a way how the extension can be done and this is
276 already implemented in the GNU C library implementation of the
277 @code{localedef} program.
279 The character class functions are normally implemented using bitsets.
280 I.e., for the character in question the appropriate bitset is read from
281 a table and a test is performed whether a certain bit is set in this
282 bitset. Which bit is tested for is determined by the class.
284 For the wide character classification functions this is made visible.
285 There is a type representing the classification, a function to retrieve
286 this value for a specific class, and a function to test using the
287 classification value whether a given character is in this class. On top
288 of this the normal character classification functions as used for
289 @code{char} objects can be defined.
293 @deftp {Data type} wctype_t
294 The @code{wctype_t} can hold a value which represents a character class.
295 The ony defined way to generate such a value is by using the
296 @code{wctype} function.
299 This type is defined in @file{wctype.h}.
304 @deftypefun wctype_t wctype (const char *@var{property})
305 The @code{wctype} returns a value representing a class of wide
306 characters which is identified by the string @var{property}. Beside
307 some standard properties each locale can define its own ones. In case
308 no property with the given name is known for the current locale for the
309 @code{LC_CTYPE} category the function returns zero.
312 The properties known in every locale are:
314 @multitable @columnfractions .25 .25 .25 .25
316 @code{"alnum"} @tab @code{"alpha"} @tab @code{"cntrl"} @tab @code{"digit"}
318 @code{"graph"} @tab @code{"lower"} @tab @code{"print"} @tab @code{"punct"}
320 @code{"space"} @tab @code{"upper"} @tab @code{"xdigit"}
324 This function is declared in @file{wctype.h}.
327 To test the membership of a character to one of the non-standard classes
328 the @w{ISO C} standard defines a completely new function.
332 @deftypefun int iswctype (wint_t @var{wc}, wctype_t @var{desc})
333 This function returns a nonzero value if @var{wc} is in the character
334 class specified by @var{desc}. @var{desc} must previously be returned
335 by a successful call to @code{wctype}.
338 This function is declared in @file{wctype.h}.
341 The make it easier to use the commonly used classification functions
342 they are defined in the C library. There is no need to use
343 @code{wctype} is the property string is one of the known character
344 classes. In some situations it is desirable to construct the property
345 string and then it gets important that @code{wctype} can also handle the
348 @cindex alphanumeric character
351 @deftypefun int iswalnum (wint_t @var{wc})
352 This function returns a nonzero value if @var{wc} is an alphanumeric
353 character (a letter or number); in other words, if either @code{iswalpha}
354 or @code{iswdigit} is true of a character, then @code{iswalnum} is also
358 This function can be implemented using
361 iswctype (wc, wctype ("alnum"))
365 It is declared in @file{wctype.h}.
368 @cindex alphabetic character
371 @deftypefun int iswalpha (wint_t @var{wc})
372 Returns true if @var{wc} is an alphabetic character (a letter). If
373 @code{iswlower} or @code{iswupper} is true of a character, then
374 @code{iswalpha} is also true.
376 In some locales, there may be additional characters for which
377 @code{iswalpha} is true---letters which are neither upper case nor lower
378 case. But in the standard @code{"C"} locale, there are no such
379 additional characters.
382 This function can be implemented using
385 iswctype (wc, wctype ("alpha"))
389 It is declared in @file{wctype.h}.
392 @cindex control character
395 @deftypefun int iswcntrl (wint_t @var{wc})
396 Returns true if @var{wc} is a control character (that is, a character that
397 is not a printing character).
400 This function can be implemented using
403 iswctype (wc, wctype ("cntrl"))
407 It is declared in @file{wctype.h}.
410 @cindex digit character
413 @deftypefun int iswdigit (wint_t @var{wc})
414 Returns true if @var{wc} is a digit (e.g., @samp{0} through @samp{9}).
415 Please note that this function does not only return a nonzero value for
416 @emph{decimal} digits, but for all kinds of digits. A consequence is
417 that code like the following will @strong{not} work unconditionally for
422 while (iswctype (*wc))
430 This function can be implemented using
433 iswctype (wc, wctype ("digit"))
437 It is declared in @file{wctype.h}.
440 @cindex graphic character
443 @deftypefun int iswgraph (wint_t @var{wc})
444 Returns true if @var{wc} is a graphic character; that is, a character
445 that has a glyph associated with it. The whitespace characters are not
449 This function can be implemented using
452 iswctype (wc, wctype ("graph"))
456 It is declared in @file{wctype.h}.
459 @cindex lower-case character
462 @deftypefun int iswlower (wint_t @var{wc})
463 Returns true if @var{wc} is a lower-case letter. The letter need not be
464 from the Latin alphabet, any alphabet representable is valid.
467 This function can be implemented using
470 iswctype (wc, wctype ("lower"))
474 It is declared in @file{wctype.h}.
477 @cindex printing character
480 @deftypefun int iswprint (wint_t @var{wc})
481 Returns true if @var{wc} is a printing character. Printing characters
482 include all the graphic characters, plus the space (@samp{ }) character.
485 This function can be implemented using
488 iswctype (wc, wctype ("print"))
492 It is declared in @file{wctype.h}.
495 @cindex punctuation character
498 @deftypefun int iswpunct (wint_t @var{wc})
499 Returns true if @var{wc} is a punctuation character.
500 This means any printing character that is not alphanumeric or a space
504 This function can be implemented using
507 iswctype (wc, wctype ("punct"))
511 It is declared in @file{wctype.h}.
514 @cindex whitespace character
517 @deftypefun int iswspace (wint_t @var{wc})
518 Returns true if @var{wc} is a @dfn{whitespace} character. In the standard
519 @code{"C"} locale, @code{iswspace} returns true for only the standard
520 whitespace characters:
543 This function can be implemented using
546 iswctype (wc, wctype ("space"))
550 It is declared in @file{wctype.h}.
553 @cindex upper-case character
556 @deftypefun int iswupper (wint_t @var{wc})
557 Returns true if @var{wc} is an upper-case letter. The letter need not be
558 from the Latin alphabet, any alphabet representable is valid.
561 This function can be implemented using
564 iswctype (wc, wctype ("upper"))
568 It is declared in @file{wctype.h}.
571 @cindex hexadecimal digit character
574 @deftypefun int iswxdigit (wint_t @var{wc})
575 Returns true if @var{wc} is a hexadecimal digit.
576 Hexadecimal digits include the normal decimal digits @samp{0} through
577 @samp{9} and the letters @samp{A} through @samp{F} and
578 @samp{a} through @samp{f}.
581 This function can be implemented using
584 iswctype (wc, wctype ("xdigit"))
588 It is declared in @file{wctype.h}.
591 The GNu C library provides also a function which is not defined in the
592 @w{ISO C} standard but which is available as a version for single byte
595 @cindex blank character
598 @deftypefun int iswblank (wint_t @var{wc})
599 Returns true if @var{wc} is a blank character; that is, a space or a tab.
600 This function is a GNU extension. It is declared in @file{wchar.h}.
603 @node Using Wide Char Classes, Wide Character Case Conversion, Classification of Wide Characters, Character Handling
604 @section Notes on using the wide character classes
606 The first note is probably nothing astonishing but still occasionally a
607 cause of problems. The @code{isw@var{XXX}} functions can be implemented
608 using macros and in fact, the GNU C library does this. They are still
609 available as real functions but when the @file{wctype.h} header is
610 included the macros will be used. This is nothing new compared to the
611 @code{char} type versions of these functions.
613 The second notes covers something which is new. It can be best
614 illustrated by a (real-world) example. The first piece of code is an
615 excerpt from the original code. It is truncated a bit but the intention
620 is_in_class (int c, const char *class)
622 if (strcmp (class, "alnum") == 0)
624 if (strcmp (class, "alpha") == 0)
626 if (strcmp (class, "cntrl") == 0)
633 Now with the @code{wctype} and @code{iswctype} one could avoid the
634 @code{if} cascades. But rewriting the code as follows is wrong:
638 is_in_class (int c, const char *class)
640 wctype_t desc = wctype (class);
641 return desc ? iswctype ((wint_t) c, desc) : 0;
645 The problem is that it is not guarateed that the wide character
646 representation of a single-byte character can be found using casting.
647 In fact, usually this fails miserably. The correct solution for this
648 problem is to write the code as follows:
652 is_in_class (int c, const char *class)
654 wctype_t desc = wctype (class);
655 return desc ? iswctype (btowc (c), desc) : 0;
659 @xref{Converting a Character}, for more information on @code{btowc}.
660 Please note that this change probably does not improve the performance
661 of the program a lot since the @code{wctype} function still has to make
662 the string comparisons. But it gets really interesting if the
663 @code{is_in_class} function would be called more than once using the
664 same class name. In this case the variable @var{desc} could be computed
665 once and reused for all the calls. Therefore the above form of the
666 function is probably not the final one.
669 @node Wide Character Case Conversion, , Using Wide Char Classes, Character Handling
670 @section Mapping of wide characters.
672 As for the classification functions the @w{ISO C} standard also
673 generalizes the mapping functions. Instead of only allowing the two
674 standard mappings the locale can contain others. Again, the
675 @code{localedef} program already supports generating such locale data
680 @deftp {Data Type} wctrans_t
681 This data type is defined as a scalar type which can hold a value
682 representing the locale-dependent character mapping. There is no way to
683 construct such a value beside using the return value of the
684 @code{wctrans} function.
688 This type is defined in @file{wctype.h}.
693 @deftypefun wctrans_t wctrans (const char *@var{property})
694 The @code{wctrans} function has to be used to find out whether a named
695 mapping is defined in the current locale selected for the
696 @code{LC_CTYPE} category. If the returned value is non-zero it can
697 afterwards be used in calls to @code{towctrans}. If the return value is
698 zero no such mapping is known in the current locale.
700 Beside locale-specific mappings there are two mappings which are
701 guaranteed to be available in every locale:
703 @multitable @columnfractions .5 .5
705 @code{"tolower"} @tab @code{"toupper"}
710 This function is declared in @file{wctype.h}.
715 @deftypefun wint_t towctrans (wint_t @var{wc}, wctrans_t @var{desc})
716 The @code{towctrans} function maps the input character @var{wc}
717 according to the rules of the mapping for which @var{desc} is an
718 descriptor and returns the so found value. The @var{desc} value must be
719 obtained by a successful call to @code{wctrans}.
723 This function is declared in @file{wctype.h}.
726 The @w{ISO C} standard also defines for the generally available mappings
727 convenient shortcuts so that it is not necesary to call @code{wctrans}
732 @deftypefun wint_t towlower (wint_t @var{wc})
733 If @var{wc} is an upper-case letter, @code{towlower} returns the corresponding
734 lower-case letter. If @var{wc} is not an upper-case letter,
735 @var{wc} is returned unchanged.
738 @code{towlower} can be implemented using
741 towctrans (wc, wctrans ("tolower"))
746 This function is declared in @file{wctype.h}.
751 @deftypefun wint_t towupper (wint_t @var{wc})
752 If @var{wc} is a lower-case letter, @code{towupper} returns the corresponding
753 upper-case letter. Otherwise @var{wc} is returned unchanged.
756 @code{towupper} can be implemented using
759 towctrans (wc, wctrans ("toupper"))
764 This function is declared in @file{wctype.h}.
767 The same warnings given in the last section for the use of the wide
768 character classiffication function applies here. It is not possible to
769 simply cast a @code{char} type value to a @code{wint_t} and use it as an
770 argument for @code{towctrans} calls.