manual/locale.texi

   1 @node Locales, Message Translation, Character Set Handling, Top
   2 @c %MENU% The country and language can affect the behavior of library functions
   3 @chapter Locales and Internationalization
   4
   5 Different countries and cultures have varying conventions for how to
   6 communicate.  These conventions range from very simple ones, such as the
   7 format for representing dates and times, to very complex ones, such as
   8 the language spoken.
   9
  10 @cindex internationalization
  11 @cindex locales
  12 @dfn{Internationalization} of software means programming it to be able
  13 to adapt to the user's favorite conventions.  In @w{ISO C},
  14 internationalization works by means of @dfn{locales}.  Each locale
  15 specifies a collection of conventions, one convention for each purpose.
  16 The user chooses a set of conventions by specifying a locale (via
  17 environment variables).
  18
  19 All programs inherit the chosen locale as part of their environment.
  20 Provided the programs are written to obey the choice of locale, they
  21 will follow the conventions preferred by the user.
  22
  23 @menu
  24 * Effects of Locale::           Actions affected by the choice of
  25                                  locale.
  26 * Choosing Locale::             How the user specifies a locale.
  27 * Locale Categories::           Different purposes for which you can
  28                                  select a locale.
  29 * Setting the Locale::          How a program specifies the locale
  30                                  with library functions.
  31 * Standard Locales::            Locale names available on all systems.
  32 * Locale Information::          How to access the information for the locale.
  33 * Formatting Numbers::          A dedicated function to format numbers.
  34 @end menu
  35
  36 @node Effects of Locale, Choosing Locale,  , Locales
  37 @section What Effects a Locale Has
  38
  39 Each locale specifies conventions for several purposes, including the
  40 following:
  41
  42 @itemize @bullet
  43 @item
  44 What multibyte character sequences are valid, and how they are
  45 interpreted (@pxref{Character Set Handling}).
  46
  47 @item
  48 Classification of which characters in the local character set are
  49 considered alphabetic, and upper- and lower-case conversion conventions
  50 (@pxref{Character Handling}).
  51
  52 @item
  53 The collating sequence for the local language and character set
  54 (@pxref{Collation Functions}).
  55
  56 @item
  57 Formatting of numbers and currency amounts (@pxref{General Numeric}).
  58
  59 @item
  60 Formatting of dates and times (@pxref{Formatting Date and Time}).
  61
  62 @item
  63 What language to use for output, including error messages
  64 (@pxref{Message Translation}).
  65
  66 @item
  67 What language to use for user answers to yes-or-no questions.
  68
  69 @item
  70 What language to use for more complex user input.
  71 (The C library doesn't yet help you implement this.)
  72 @end itemize
  73
  74 Some aspects of adapting to the specified locale are handled
  75 automatically by the library subroutines.  For example, all your program
  76 needs to do in order to use the collating sequence of the chosen locale
  77 is to use @code{strcoll} or @code{strxfrm} to compare strings.
  78
  79 Other aspects of locales are beyond the comprehension of the library.
  80 For example, the library can't automatically translate your program's
  81 output messages into other languages.  The only way you can support
  82 output in the user's favorite language is to program this more or less
  83 by hand.  The C library provides functions to handle translations for
  84 multiple languages easily.
  85
  86 This chapter discusses the mechanism by which you can modify the current
  87 locale.  The effects of the current locale on specific library functions
  88 are discussed in more detail in the descriptions of those functions.
  89
  90 @node Choosing Locale, Locale Categories, Effects of Locale, Locales
  91 @section Choosing a Locale
  92
  93 The simplest way for the user to choose a locale is to set the
  94 environment variable @code{LANG}.  This specifies a single locale to use
  95 for all purposes.  For example, a user could specify a hypothetical
  96 locale named @samp{espana-castellano} to use the standard conventions of
  97 most of Spain.
  98
  99 The set of locales supported depends on the operating system you are
 100 using, and so do their names.  We can't make any promises about what
 101 locales will exist, except for one standard locale called @samp{C} or
 102 @samp{POSIX}.  Later we will describe how to construct locales XXX.
 103 @comment (@pxref{Building Locale Files}).
 104
 105 @cindex combining locales
 106 A user also has the option of specifying different locales for different
 107 purposes---in effect, choosing a mixture of multiple locales.
 108
 109 For example, the user might specify the locale @samp{espana-castellano}
 110 for most purposes, but specify the locale @samp{usa-english} for
 111 currency formatting.  This might make sense if the user is a
 112 Spanish-speaking American, working in Spanish, but representing monetary
 113 amounts in US dollars.
 114
 115 Note that both locales @samp{espana-castellano} and @samp{usa-english},
 116 like all locales, would include conventions for all of the purposes to
 117 which locales apply.  However, the user can choose to use each locale
 118 for a particular subset of those purposes.
 119
 120 @node Locale Categories, Setting the Locale, Choosing Locale, Locales
 121 @section Categories of Activities that Locales Affect
 122 @cindex categories for locales
 123 @cindex locale categories
 124
 125 The purposes that locales serve are grouped into @dfn{categories}, so
 126 that a user or a program can choose the locale for each category
 127 independently.  Here is a table of categories; each name is both an
 128 environment variable that a user can set, and a macro name that you can
 129 use as an argument to @code{setlocale}.
 130
 131 @vtable @code
 132 @comment locale.h
 133 @comment ISO
 134 @item LC_COLLATE
 135 This category applies to collation of strings (functions @code{strcoll}
 136 and @code{strxfrm}); see @ref{Collation Functions}.
 137
 138 @comment locale.h
 139 @comment ISO
 140 @item LC_CTYPE
 141 This category applies to classification and conversion of characters,
 142 and to multibyte and wide characters;
 143 see @ref{Character Handling}, and @ref{Character Set Handling}.
 144
 145 @comment locale.h
 146 @comment ISO
 147 @item LC_MONETARY
 148 This category applies to formatting monetary values; see @ref{General Numeric}.
 149
 150 @comment locale.h
 151 @comment ISO
 152 @item LC_NUMERIC
 153 This category applies to formatting numeric values that are not
 154 monetary; see @ref{General Numeric}.
 155
 156 @comment locale.h
 157 @comment ISO
 158 @item LC_TIME
 159 This category applies to formatting date and time values; see
 160 @ref{Formatting Date and Time}.
 161
 162 @comment locale.h
 163 @comment XOPEN
 164 @item LC_MESSAGES
 165 This category applies to selecting the language used in the user
 166 interface for message translation (@pxref{The Uniforum approach};
 167 @pxref{Message catalogs a la X/Open}).
 168
 169 @comment locale.h
 170 @comment ISO
 171 @item LC_ALL
 172 This is not an environment variable; it is only a macro that you can use
 173 with @code{setlocale} to set a single locale for all purposes.  Setting
 174 this environment variable overwrites all selections by the other
 175 @code{LC_*} variables or @code{LANG}.
 176
 177 @comment locale.h
 178 @comment ISO
 179 @item LANG
 180 If this environment variable is defined, its value specifies the locale
 181 to use for all purposes except as overridden by the variables above.
 182 @end vtable
 183
 184 @vindex LANGUAGE
 185 When developing the message translation functions it was felt that the
 186 functionality provided by the variables above is not sufficient.  E.g., it
 187 should be possible to specify more than one locale name.  For an example
 188 take a Swedish user who better speaks German than English, the programs
 189 messages by default are written in English.  Then it should be possible
 190 to specify that the first choice for the language is Swedish, the second
 191 choice is German, and if this also fails English is used.  This is
 192 possible with the variable @code{LANGUAGE}.  For further description of
 193 this GNU extension see @ref{Using gettextized software}.
 194
 195 @node Setting the Locale, Standard Locales, Locale Categories, Locales
 196 @section How Programs Set the Locale
 197
 198 A C program inherits its locale environment variables when it starts up.
 199 This happens automatically.  However, these variables do not
 200 automatically control the locale used by the library functions, because
 201 @w{ISO C} says that all programs start by default in the standard @samp{C}
 202 locale.  To use the locales specified by the environment, you must call
 203 @code{setlocale}.  Call it as follows:
 204
 205 @smallexample
 206 setlocale (LC_ALL, "");
 207 @end smallexample
 208
 209 @noindent
 210 to select a locale based on the user choice of the appropriate
 211 environment variables.
 212
 213 @cindex changing the locale
 214 @cindex locale, changing
 215 You can also use @code{setlocale} to specify a particular locale, for
 216 general use or for a specific category.
 217
 218 @pindex locale.h
 219 The symbols in this section are defined in the header file @file{locale.h}.
 220
 221 @comment locale.h
 222 @comment ISO
 223 @deftypefun {char *} setlocale (int @var{category}, const char *@var{locale})
 224 The function @code{setlocale} sets the current locale for
 225 category @var{category} to @var{locale}.
 226
 227 If @var{category} is @code{LC_ALL}, this specifies the locale for all
 228 purposes.  The other possible values of @var{category} specify an
 229 individual purpose (@pxref{Locale Categories}).
 230
 231 You can also use this function to find out the current locale by passing
 232 a null pointer as the @var{locale} argument.  In this case,
 233 @code{setlocale} returns a string that is the name of the locale
 234 currently selected for category @var{category}.
 235
 236 The string returned by @code{setlocale} can be overwritten by subsequent
 237 calls, so you should make a copy of the string (@pxref{Copying and
 238 Concatenation}) if you want to save it past any further calls to
 239 @code{setlocale}.  (The standard library is guaranteed never to call
 240 @code{setlocale} itself.)
 241
 242 You should not modify the string returned by @code{setlocale}.
 243 It might be the same string that was passed as an argument in a
 244 previous call to @code{setlocale}.
 245
 246 When you read the current locale for category @code{LC_ALL}, the value
 247 encodes the entire combination of selected locales for all categories.
 248 In this case, the value is not just a single locale name.  In fact, we
 249 don't make any promises about what it looks like.  But if you specify
 250 the same ``locale name'' with @code{LC_ALL} in a subsequent call to
 251 @code{setlocale}, it restores the same combination of locale selections.
 252
 253 To ensure to be able to use the string encoding the currently selected
 254 locale at a later time one has to make a copy of the string.  It is not
 255 guaranteed that the return value stays valid all the time.
 256
 257 When the @var{locale} argument is not a null pointer, the string returned
 258 by @code{setlocale} reflects the newly modified locale.
 259
 260 If you specify an empty string for @var{locale}, this means to read the
 261 appropriate environment variable and use its value to select the locale
 262 for @var{category}.
 263
 264 If a nonempty string is given for @var{locale} the locale with this name
 265 is used, if this is possible.
 266
 267 If you specify an invalid locale name, @code{setlocale} returns a null
 268 pointer and leaves the current locale unchanged.
 269 @end deftypefun
 270
 271 Here is an example showing how you might use @code{setlocale} to
 272 temporarily switch to a new locale.
 273
 274 @smallexample
 275 #include <stddef.h>
 276 #include <locale.h>
 277 #include <stdlib.h>
 278 #include <string.h>
 279
 280 void
 281 with_other_locale (char *new_locale,
 282                    void (*subroutine) (int),
 283                    int argument)
 284 @{
 285   char *old_locale, *saved_locale;
 286
 287   /* @r{Get the name of the current locale.}  */
 288   old_locale = setlocale (LC_ALL, NULL);
 289
 290   /* @r{Copy the name so it won't be clobbered by @code{setlocale}.} */
 291   saved_locale = strdup (old_locale);
 292   if (saved_locale == NULL)
 293     fatal ("Out of memory");
 294
 295   /* @r{Now change the locale and do some stuff with it.} */
 296   setlocale (LC_ALL, new_locale);
 297   (*subroutine) (argument);
 298
 299   /* @r{Restore the original locale.} */
 300   setlocale (LC_ALL, saved_locale);
 301   free (saved_locale);
 302 @}
 303 @end smallexample
 304
 305 @strong{Portability Note:} Some @w{ISO C} systems may define additional
 306 locale categories and future versions of the library will do so.  For
 307 portability, assume that any symbol beginning with @samp{LC_} might be
 308 defined in @file{locale.h}.
 309
 310 @node Standard Locales, Locale Information, Setting the Locale, Locales
 311 @section Standard Locales
 312
 313 The only locale names you can count on finding on all operating systems
 314 are these three standard ones:
 315
 316 @table @code
 317 @item "C"
 318 This is the standard C locale.  The attributes and behavior it provides
 319 are specified in the @w{ISO C} standard.  When your program starts up, it
 320 initially uses this locale by default.
 321
 322 @item "POSIX"
 323 This is the standard POSIX locale.  Currently, it is an alias for the
 324 standard C locale.
 325
 326 @item ""
 327 The empty name says to select a locale based on environment variables.
 328 @xref{Locale Categories}.
 329 @end table
 330
 331 Defining and installing named locales is normally a responsibility of
 332 the system administrator at your site (or the person who installed the
 333 GNU C library).  It is also possible for the user to create private
 334 locales.  All this will be discussed later when describing the tool to
 335 do so XXX.
 336 @comment (@pxref{Building Locale Files}).
 337
 338 If your program needs to use something other than the @samp{C} locale,
 339 it will be more portable if you use whatever locale the user specifies
 340 with the environment, rather than trying to specify some non-standard
 341 locale explicitly by name.  Remember, different machines might have
 342 different sets of locales installed.
 343
 344 @node Locale Information, Formatting Numbers, Standard Locales, Locales
 345 @section Accessing the Locale Information
 346
 347 There are several ways to access the locale information.  The simplest
 348 way is to let the C library itself do the work.  Several of the
 349 functions in this library access implicitly the locale data and use
 350 what information is available in the currently selected locale.  This is
 351 how the locale model is meant to work normally.
 352
 353 As an example take the @code{strftime} function which is meant to nicely
 354 format date and time information (@pxref{Formatting Date and Time}).
 355 Part of the standard information contained in the @code{LC_TIME}
 356 category are, e.g., the names of the months.  Instead of requiring the
 357 programmer to take care of providing the translations the
 358 @code{strftime} function does this all by itself.  When using @code{%A}
 359 in the format string this will be replaced by the appropriate weekday
 360 name of the locale currently selected for @code{LC_TIME}.  This is the
 361 easy part and wherever possible functions do things automatically as in
 362 this case.
 363
 364 But there are quite often situations when there is simply no functions
 365 to perform the task or it is simply not possible to do the work
 366 automatically.  For these cases it is necessary to access the
 367 information in the locale directly.  To do this the C library provides
 368 two functions: @code{localeconv} and @code{nl_langinfo}.  The former is
 369 part of @w{ISO C} and therefore portable, but has a brain-damaged
 370 interface.  The second is part of the Unix interface and is portable in
 371 as far as the system follows the Unix standards.
 372
 373 @menu
 374 * The Lame Way to Locale Data::   ISO C's @code{localeconv}.
 375 * The Elegant and Fast Way::      X/Open's @code{nl_langinfo}.
 376 @end menu
 377
 378 @node The Lame Way to Locale Data, The Elegant and Fast Way, ,Locale Information
 379 @subsection @code{localeconv}: It is portable but @dots{}
 380
 381 Together with the @code{setlocale} function the @w{ISO C} people
 382 invented @code{localeconv} function.  It is a masterpiece of misdesign.
 383 It is expensive to use, it is not extendable, and is not generally
 384 usable as it provides access only to the @code{LC_MONETARY} and
 385 @code{LC_NUMERIC} related information.  If it is applicable for a
 386 certain situation it should nevertheless be used since it is very
 387 portable.  In general it is better to use the function @code{strfmon}
 388 which can be used to format monetary amounts correctly according to the
 389 selected locale by implicitly using this information.
 390 @pindex locale.h
 391 @cindex monetary value formatting
 392 @cindex numeric value formatting
 393
 394 @comment locale.h
 395 @comment ISO
 396 @deftypefun {struct lconv *} localeconv (void)
 397 The @code{localeconv} function returns a pointer to a structure whose
 398 components contain information about how numeric and monetary values
 399 should be formatted in the current locale.
 400
 401 You should not modify the structure or its contents.  The structure might
 402 be overwritten by subsequent calls to @code{localeconv}, or by calls to
 403 @code{setlocale}, but no other function in the library overwrites this
 404 value.
 405 @end deftypefun
 406
 407 @comment locale.h
 408 @comment ISO
 409 @deftp {Data Type} {struct lconv}
 410 This is the data type of the value returned by @code{localeconv}.  Its
 411 elements are described in the following subsections.
 412 @end deftp
 413
 414 If a member of the structure @code{struct lconv} has type @code{char},
 415 and the value is @code{CHAR_MAX}, it means that the current locale has
 416 no value for that parameter.
 417
 418 @menu
 419 * General Numeric::             Parameters for formatting numbers and
 420                                  currency amounts.
 421 * Currency Symbol::             How to print the symbol that identifies an
 422                                  amount of money (e.g. @samp{$}).
 423 * Sign of Money Amount::        How to print the (positive or negative) sign
 424                                  for a monetary amount, if one exists.
 425 @end menu
 426
 427 @node General Numeric, Currency Symbol, , The Lame Way to Locale Data
 428 @subsubsection Generic Numeric Formatting Parameters
 429
 430 These are the standard members of @code{struct lconv}; there may be
 431 others.
 432
 433 @table @code
 434 @item char *decimal_point
 435 @itemx char *mon_decimal_point
 436 These are the decimal-point separators used in formatting non-monetary
 437 and monetary quantities, respectively.  In the @samp{C} locale, the
 438 value of @code{decimal_point} is @code{"."}, and the value of
 439 @code{mon_decimal_point} is @code{""}.
 440 @cindex decimal-point separator
 441
 442 @item char *thousands_sep
 443 @itemx char *mon_thousands_sep
 444 These are the separators used to delimit groups of digits to the left of
 445 the decimal point in formatting non-monetary and monetary quantities,
 446 respectively.  In the @samp{C} locale, both members have a value of
 447 @code{""} (the empty string).
 448
 449 @item char *grouping
 450 @itemx char *mon_grouping
 451 These are strings that specify how to group the digits to the left of
 452 the decimal point.  @code{grouping} applies to non-monetary quantities
 453 and @code{mon_grouping} applies to monetary quantities.  Use either
 454 @code{thousands_sep} or @code{mon_thousands_sep} to separate the digit
 455 groups.
 456 @cindex grouping of digits
 457
 458 Each string is made up of decimal numbers separated by semicolons.
 459 Successive numbers (from left to right) give the sizes of successive
 460 groups (from right to left, starting at the decimal point).  The last
 461 number in the string is used over and over for all the remaining groups.
 462
 463 If the last integer is @code{-1}, it means that there is no more
 464 grouping---or, put another way, any remaining digits form one large
 465 group without separators.
 466
 467 For example, if @code{grouping} is @code{"4;3;2"}, the correct grouping
 468 for the number @code{123456787654321} is @samp{12}, @samp{34},
 469 @samp{56}, @samp{78}, @samp{765}, @samp{4321}.  This uses a group of 4
 470 digits at the end, preceded by a group of 3 digits, preceded by groups
 471 of 2 digits (as many as needed).  With a separator of @samp{,}, the
 472 number would be printed as @samp{12,34,56,78,765,4321}.
 473
 474 A value of @code{"3"} indicates repeated groups of three digits, as
 475 normally used in the U.S.
 476
 477 In the standard @samp{C} locale, both @code{grouping} and
 478 @code{mon_grouping} have a value of @code{""}.  This value specifies no
 479 grouping at all.
 480
 481 @item char int_frac_digits
 482 @itemx char frac_digits
 483 These are small integers indicating how many fractional digits (to the
 484 right of the decimal point) should be displayed in a monetary value in
 485 international and local formats, respectively.  (Most often, both
 486 members have the same value.)
 487
 488 In the standard @samp{C} locale, both of these members have the value
 489 @code{CHAR_MAX}, meaning ``unspecified''.  The ISO standard doesn't say
 490 what to do when you find this the value; we recommend printing no
 491 fractional digits.  (This locale also specifies the empty string for
 492 @code{mon_decimal_point}, so printing any fractional digits would be
 493 confusing!)
 494 @end table
 495
 496 @node Currency Symbol, Sign of Money Amount, General Numeric, The Lame Way to Locale Data
 497 @subsubsection Printing the Currency Symbol
 498 @cindex currency symbols
 499
 500 These members of the @code{struct lconv} structure specify how to print
 501 the symbol to identify a monetary value---the international analog of
 502 @samp{$} for US dollars.
 503
 504 Each country has two standard currency symbols.  The @dfn{local currency
 505 symbol} is used commonly within the country, while the
 506 @dfn{international currency symbol} is used internationally to refer to
 507 that country's currency when it is necessary to indicate the country
 508 unambiguously.
 509
 510 For example, many countries use the dollar as their monetary unit, and
 511 when dealing with international currencies it's important to specify
 512 that one is dealing with (say) Canadian dollars instead of U.S. dollars
 513 or Australian dollars.  But when the context is known to be Canada,
 514 there is no need to make this explicit---dollar amounts are implicitly
 515 assumed to be in Canadian dollars.
 516
 517 @table @code
 518 @item char *currency_symbol
 519 The local currency symbol for the selected locale.
 520
 521 In the standard @samp{C} locale, this member has a value of @code{""}
 522 (the empty string), meaning ``unspecified''.  The ISO standard doesn't
 523 say what to do when you find this value; we recommend you simply print
 524 the empty string as you would print any other string found in the
 525 appropriate member.
 526
 527 @item char *int_curr_symbol
 528 The international currency symbol for the selected locale.
 529
 530 The value of @code{int_curr_symbol} should normally consist of a
 531 three-letter abbreviation determined by the international standard
 532 @cite{ISO 4217 Codes for the Representation of Currency and Funds},
 533 followed by a one-character separator (often a space).
 534
 535 In the standard @samp{C} locale, this member has a value of @code{""}
 536 (the empty string), meaning ``unspecified''.  We recommend you simply
 537 print the empty string as you would print any other string found in the
 538 appropriate member.
 539
 540 @item char p_cs_precedes
 541 @itemx char n_cs_precedes
 542 These members are @code{1} if the @code{currency_symbol} string should
 543 precede the value of a monetary amount, or @code{0} if the string should
 544 follow the value.  The @code{p_cs_precedes} member applies to positive
 545 amounts (or zero), and the @code{n_cs_precedes} member applies to
 546 negative amounts.
 547
 548 In the standard @samp{C} locale, both of these members have a value of
 549 @code{CHAR_MAX}, meaning ``unspecified''.  The ISO standard doesn't say
 550 what to do when you find this value, but we recommend printing the
 551 currency symbol before the amount.  That's right for most countries.
 552 In other words, treat all nonzero values alike in these members.
 553
 554 The POSIX standard says that these two members apply to the
 555 @code{int_curr_symbol} as well as the @code{currency_symbol}.  The ISO
 556 C standard seems to imply that they should apply only to the
 557 @code{currency_symbol}---so the @code{int_curr_symbol} should always
 558 precede the amount.
 559
 560 We can only guess which of these (if either) matches the usual
 561 conventions for printing international currency symbols.  Our guess is
 562 that they should always precede the amount.  If we find out a reliable
 563 answer, we will put it here.
 564
 565 @item char p_sep_by_space
 566 @itemx char n_sep_by_space
 567 These members are @code{1} if a space should appear between the
 568 @code{currency_symbol} string and the amount, or @code{0} if no space
 569 should appear.  The @code{p_sep_by_space} member applies to positive
 570 amounts (or zero), and the @code{n_sep_by_space} member applies to
 571 negative amounts.
 572
 573 In the standard @samp{C} locale, both of these members have a value of
 574 @code{CHAR_MAX}, meaning ``unspecified''.  The ISO standard doesn't say
 575 what you should do when you find this value; we suggest you treat it as
 576 one (print a space).  In other words, treat all nonzero values alike in
 577 these members.
 578
 579 These members apply only to @code{currency_symbol}.  When you use
 580 @code{int_curr_symbol}, you never print an additional space, because
 581 @code{int_curr_symbol} itself contains the appropriate separator.
 582
 583 The POSIX standard says that these two members apply to the
 584 @code{int_curr_symbol} as well as the @code{currency_symbol}.  But an
 585 example in the @w{ISO C} standard clearly implies that they should apply
 586 only to the @code{currency_symbol}---that the @code{int_curr_symbol}
 587 contains any appropriate separator, so you should never print an
 588 additional space.
 589
 590 Based on what we know now, we recommend you ignore these members when
 591 printing international currency symbols, and print no extra space.
 592 @end table
 593
 594 @node Sign of Money Amount, , Currency Symbol, The Lame Way to Locale Data
 595 @subsubsection Printing the Sign of an Amount of Money
 596
 597 These members of the @code{struct lconv} structure specify how to print
 598 the sign (if any) in a monetary value.
 599
 600 @table @code
 601 @item char *positive_sign
 602 @itemx char *negative_sign
 603 These are strings used to indicate positive (or zero) and negative
 604 (respectively) monetary quantities.
 605
 606 In the standard @samp{C} locale, both of these members have a value of
 607 @code{""} (the empty string), meaning ``unspecified''.
 608
 609 The ISO standard doesn't say what to do when you find this value; we
 610 recommend printing @code{positive_sign} as you find it, even if it is
 611 empty.  For a negative value, print @code{negative_sign} as you find it
 612 unless both it and @code{positive_sign} are empty, in which case print
 613 @samp{-} instead.  (Failing to indicate the sign at all seems rather
 614 unreasonable.)
 615
 616 @item char p_sign_posn
 617 @itemx char n_sign_posn
 618 These members have values that are small integers indicating how to
 619 position the sign for nonnegative and negative monetary quantities,
 620 respectively.  (The string used by the sign is what was specified with
 621 @code{positive_sign} or @code{negative_sign}.)  The possible values are
 622 as follows:
 623
 624 @table @code
 625 @item 0
 626 The currency symbol and quantity should be surrounded by parentheses.
 627
 628 @item 1
 629 Print the sign string before the quantity and currency symbol.
 630
 631 @item 2
 632 Print the sign string after the quantity and currency symbol.
 633
 634 @item 3
 635 Print the sign string right before the currency symbol.
 636
 637 @item 4
 638 Print the sign string right after the currency symbol.
 639
 640 @item CHAR_MAX
 641 ``Unspecified''.  Both members have this value in the standard
 642 @samp{C} locale.
 643 @end table
 644
 645 The ISO standard doesn't say what you should do when the value is
 646 @code{CHAR_MAX}.  We recommend you print the sign after the currency
 647 symbol.
 648 @end table
 649
 650 It is not clear whether you should let these members apply to the
 651 international currency format or not.  POSIX says you should, but
 652 intuition plus the examples in the @w{ISO C} standard suggest you should
 653 not.  We hope that someone who knows well the conventions for formatting
 654 monetary quantities will tell us what we should recommend.
 655
 656 @node The Elegant and Fast Way, , The Lame Way to Locale Data, Locale Information
 657 @subsection Pinpoint Access to Locale Data
 658
 659 When writing the X/Open Portability Guide the authors realized that the
 660 @code{localeconv} function is not enough to provide reasonable access to
 661 the locale information.  The information which was meant to be available
 662 in the locale (as later specified in the POSIX.1 standard) requires more
 663 possibilities to access it.  Therefore the @code{nl_langinfo} function
 664 was introduced.
 665
 666 @comment langinfo.h
 667 @comment XOPEN
 668 @deftypefun {char *} nl_langinfo (nl_item @var{item})
 669 The @code{nl_langinfo} function can be used to access individual
 670 elements of the locale categories.  I.e., unlike the @code{localeconv}
 671 function which always returns all the information @code{nl_langinfo}
 672 lets the caller select what information is necessary.  This is very
 673 fast and it is no problem to call this function multiple times.
 674
 675 The second advantage is that not only the numeric and monetary
 676 formatting information is available.  Also the information of the
 677 @code{LC_TIME} and @code{LC_MESSAGES} categories is available.
 678
 679 The type @code{nl_type} is defined in @file{nl_types.h}.
 680 The argument @var{item} is a numeric values which must be one of the
 681 values defined in the header @file{langinfo.h}.  The X/Open standard
 682 defines the following values:
 683
 684 @vtable @code
 685 @item ABDAY_1
 686 @itemx ABDAY_2
 687 @itemx ABDAY_3
 688 @itemx ABDAY_4
 689 @itemx ABDAY_5
 690 @itemx ABDAY_6
 691 @itemx ABDAY_7
 692 @code{nl_langinfo} returns the abbreviated weekday name.  @code{ABDAY_1}
 693 corresponds to Sunday.
 694 @item DAY_1
 695 @itemx DAY_2
 696 @itemx DAY_3
 697 @itemx DAY_4
 698 @itemx DAY_5
 699 @itemx DAY_6
 700 @itemx DAY_7
 701 Similar to @code{ABDAY_1} etc, but here the return value is the
 702 unabbreviated weekday name.
 703 @item ABMON_1
 704 @itemx ABMON_2
 705 @itemx ABMON_3
 706 @itemx ABMON_4
 707 @itemx ABMON_5
 708 @itemx ABMON_6
 709 @itemx ABMON_7
 710 @itemx ABMON_8
 711 @itemx ABMON_9
 712 @itemx ABMON_10
 713 @itemx ABMON_11
 714 @itemx ABMON_12
 715 The return value is abbreviated name for the month names.  @code{ABMON_1}
 716 corresponds to January.
 717 @item MON_1
 718 @itemx MON_2
 719 @itemx MON_3
 720 @itemx MON_4
 721 @itemx MON_5
 722 @itemx MON_6
 723 @itemx MON_7
 724 @itemx MON_8
 725 @itemx MON_9
 726 @itemx MON_10
 727 @itemx MON_11
 728 @itemx MON_12
 729 Similar to @code{ABMON_1} etc but here the month names are not abbreviated.
 730 Here the first value @code{MON_1} also corresponds to January.
 731 @item AM_STR
 732 @itemx PM_STR
 733 The return values are strings which can be used in the time representation
 734 which uses to American 1 to 12 hours plus am/pm representation.
 735
 736 Please note that in locales which do not know this time representation
 737 these strings actually might be empty and therefore the am/pm format
 738 cannot be used at all.
 739 @item D_T_FMT
 740 The return value can be used as a format string for @code{strftime} to
 741 represent time and date in a locale specific way.
 742 @item D_FMT
 743 The return value can be used as a format string for @code{strftime} to
 744 represent a date in a locale specific way.
 745 @item T_FMT
 746 The return value can be used as a format string for @code{strftime} to
 747 represent time in a locale specific way.
 748 @item T_FMT_AMPM
 749 The return value can be used as a format string for @code{strftime} to
 750 represent time using the American-style am/pm format.
 751
 752 Please note that if the am/pm format does not make any sense for the
 753 selected locale the returned value might be the same as the one for
 754 @code{T_FMT}.
 755 @item ERA
 756 The return value is value representing the eras of time used in the
 757 current locale.
 758
 759 Most locales do not define this value.  An example for a locale which
 760 does define this value is the Japanese.  Here the traditional data
 761 representation is based on the eras measured by the reigns of the
 762 emperors.
 763
 764 Normally it should not be necessary to use this value directly.  Using
 765 the @code{E} modifier for its formats the @code{strftime} functions can
 766 be made to use this information.  The format of the returned string
 767 is not specified and therefore one should not generalize the knowledge
 768 about the representation on one system.
 769 @item ERA_YEAR
 770 The return value describes the name years for the eras of this locale.
 771 As for @code{ERA} it should not be necessary to use this value directly.
 772 @item ERA_D_T_FMT
 773 This return value can be used as a format string for @code{strftime} to
 774 represent time and date using the era representation in a locale
 775 specific way.
 776 @item ERA_D_FMT
 777 This return value can be used as a format string for @code{strftime} to
 778 represent a date using the era representation in a locale specific way.
 779 @item ERA_T_FMT
 780 This return value can be used as a format string for @code{strftime} to
 781 represent time using the era representation in a locale specific way.
 782 @item ALT_DIGITS
 783 The return value is a representation of up to @math{100} values used to
 784 represent the values @math{0} to @math{99}.  As for @code{ERA} this
 785 value is not intended to be used directly, but instead indirectly
 786 through the @code{strftime} function.  When the modifier @code{O} is
 787 used for format which would use numerals to represent hours, minutes,
 788 seconds, weekdays, months, or weeks the appropriate value for this
 789 locale values is used instead of the number.
 790 @item INT_CURR_SYMBOL
 791 This value is the same as returned by @code{localeconv} in the
 792 @code{int_curr_symbol} element of the @code{struct lconv}.
 793 @item CURRENCY_SYMBOL
 794 @itemx CRNCYSTR
 795 This value is the same as returned by @code{localeconv} in the
 796 @code{currency_symbol} element of the @code{struct lconv}.
 797
 798 @code{CRNCYSTR} is a deprecated alias, still required by Unix98.
 799 @item MON_DECIMAL_POINT
 800 This value is the same as returned by @code{localeconv} in the
 801 @code{mon_decimal_point} element of the @code{struct lconv}.
 802 @item MON_THOUSANDS_SEP
 803 This value is the same as returned by @code{localeconv} in the
 804 @code{mon_thousands_sep} element of the @code{struct lconv}.
 805 @item MON_GROUPING
 806 This value is the same as returned by @code{localeconv} in the
 807 @code{mon_grouping} element of the @code{struct lconv}.
 808 @item POSITIVE_SIGN
 809 This value is the same as returned by @code{localeconv} in the
 810 @code{positive_sign} element of the @code{struct lconv}.
 811 @item NEGATIVE_SIGN
 812 This value is the same as returned by @code{localeconv} in the
 813 @code{negative_sign} element of the @code{struct lconv}.
 814 @item INT_FRAC_DIGITS
 815 This value is the same as returned by @code{localeconv} in the
 816 @code{int_frac_digits} element of the @code{struct lconv}.
 817 @item FRAC_DIGITS
 818 This value is the same as returned by @code{localeconv} in the
 819 @code{frac_digits} element of the @code{struct lconv}.
 820 @item P_CS_PRECEDES
 821 This value is the same as returned by @code{localeconv} in the
 822 @code{p_cs_precedes} element of the @code{struct lconv}.
 823 @item P_SEP_BY_SPACE
 824 This value is the same as returned by @code{localeconv} in the
 825 @code{p_sep_by_space} element of the @code{struct lconv}.
 826 @item N_CS_PRECEDES
 827 This value is the same as returned by @code{localeconv} in the
 828 @code{n_cs_precedes} element of the @code{struct lconv}.
 829 @item N_SEP_BY_SPACE
 830 This value is the same as returned by @code{localeconv} in the
 831 @code{n_sep_by_space} element of the @code{struct lconv}.
 832 @item P_SIGN_POSN
 833 This value is the same as returned by @code{localeconv} in the
 834 @code{p_sign_posn} element of the @code{struct lconv}.
 835 @item N_SIGN_POSN
 836 This value is the same as returned by @code{localeconv} in the
 837 @code{n_sign_posn} element of the @code{struct lconv}.
 838 @item DECIMAL_POINT
 839 @itemx RADIXCHAR
 840 This value is the same as returned by @code{localeconv} in the
 841 @code{decimal_point} element of the @code{struct lconv}.
 842
 843 The name @code{RADIXCHAR} is a deprecated alias still used in Unix98.
 844 @item THOUSANDS_SEP
 845 @itemx THOUSEP
 846 This value is the same as returned by @code{localeconv} in the
 847 @code{thousands_sep} element of the @code{struct lconv}.
 848
 849 The name @code{THOUSEP} is a deprecated alias still used in Unix98.
 850 @item GROUPING
 851 This value is the same as returned by @code{localeconv} in the
 852 @code{grouping} element of the @code{struct lconv}.
 853 @item YESEXPR
 854 The return value is a regular expression which can be used with the
 855 @code{regex} function to recognize a positive response to a yes/no
 856 question.
 857 @item NOEXPR
 858 The return value is a regular expression which can be used with the
 859 @code{regex} function to recognize a negative response to a yes/no
 860 question.
 861 @item YESSTR
 862 The return value is a locale specific translation of the positive response
 863 to a yes/no question.
 864
 865 Using this value is deprecated since it is a very special case of
 866 message translation and this better can be handled using the message
 867 translation functions (@pxref{Message Translation}).
 868 @item NOSTR
 869 The return value is a locale specific translation of the negative response
 870 to a yes/no question.  What is said for @code{YESSTR} is also true here.
 871 @end vtable
 872
 873 The file @file{langinfo.h} defines a lot more symbols but none of them
 874 is official.  Using them is completely unportable and the format of the
 875 return values might change.  Therefore it is highly requested to not use
 876 them in any situation.
 877
 878 Please note that the return value for any valid argument can be used for
 879 in all situations (with the possible exception of the am/pm time format
 880 related values).  If the user has not selected any locale for the
 881 appropriate category @code{nl_langinfo} returns the information from the
 882 @code{"C"} locale.  It is therefore possible to use this function as
 883 shown in the example below.
 884
 885 If the argument @var{item} is not valid the global variable @var{errno}
 886 is set to @code{EINVAL} and a @code{NULL} pointer is returned.
 887 @end deftypefun
 888
 889 An example for the use of @code{nl_langinfo} is a function which has to
 890 print a given date and time in the locale specific way.  At first one
 891 might think the since @code{strftime} internally uses the locale
 892 information writing something like the following is enough:
 893
 894 @smallexample
 895 size_t
 896 i18n_time_n_data (char *s, size_t len, const struct tm *tp)
 897 @{
 898   return strftime (s, len, "%X %D", tp);
 899 @}
 900 @end smallexample
 901
 902 The format contains no weekday or month names and therefore is
 903 internationally usable.  Wrong!  The output produced is something like
 904 @code{"hh:mm:ss MM/DD/YY"}.  This format is only recognizable in the
 905 USA.  Other countries use different formats.  Therefore the function
 906 should be rewritten like this:
 907
 908 @smallexample
 909 size_t
 910 i18n_time_n_data (char *s, size_t len, const struct tm *tp)
 911 @{
 912   return strftime (s, len, nl_langinfo (D_T_FMT), tp);
 913 @}
 914 @end smallexample
 915
 916 Now the date and time format which is explicitly selected for the locale
 917 in place when the program runs is used.  If the user selects the locale
 918 correctly there should never be a misunderstanding over the time and
 919 date format.
 920
 921 @node Formatting Numbers, , Locale Information, Locales
 922 @section A dedicated function to format numbers
 923
 924 We have seen that the structure returned by @code{localeconv} as well as
 925 the values given to @code{nl_langinfo} allow to retrieve the various
 926 pieces of locale specific information to format numbers and monetary
 927 amounts.  But we have also seen that the rules underlying this
 928 information are quite complex.
 929
 930 Therefore the X/Open standards introduce a function which uses this
 931 information from the locale and so makes it is for the user to format
 932 numbers according to these rules.
 933
 934 @deftypefun ssize_t strfmon (char *@var{s}, size_t @var{maxsize}, const char *@var{format}, @dots{})
 935 The @code{strfmon} function is similar to the @code{strftime} function
 936 in that it takes a description of a buffer (with size), a format string
 937 and values to write into a buffer a textual representation of the values
 938 according to the format string.  As for @code{strftime} the function
 939 also returns the number of bytes written into the buffer.
 940
 941 There are two difference: @code{strfmon} can take more than one argument
 942 and of course the format specification is different.  The format string
 943 consists as for @code{strftime} of normal text which is simply printed
 944 and format specifiers, which here are also introduced using @samp{%}.
 945 Following the @samp{%} the function allows similar to @code{printf} a
 946 sequence of flags and other specifications before the format character:
 947
 948 @itemize @bullet
 949 @item
 950 Immediately following the @samp{%} there can be one or more of the
 951 following flags:
 952 @table @asis
 953 @item @samp{=@var{f}}
 954 The single byte character @var{f} is used for this field as the numeric
 955 fill character.  By default this character is a space character.
 956 Filling with this character is only performed if a left precision
 957 is specified.  It is not just to fill to the given field width.
 958 @item @samp{^}
 959 The number is printed without grouping the digits using the rules of the
 960 current locale.  By default grouping is enabled.
 961 @item @samp{+}, @samp{(}
 962 At most one of these flags must be used.  They select which format to
 963 represent the sign of currency amount is used.  By default and if
 964 @samp{+} is used the locale equivalent to @math{+}/@math{-} is used.  If
 965 @samp{(} is used negative amounts are enclosed in parentheses.  The
 966 exact format is determined by the values of the @code{LC_MONETARY}
 967 category of the locale selected at program runtime.
 968 @item @samp{!}
 969 The output will not contain the currency symbol.
 970 @item @samp{-}
 971 The output will be formatted right-justified instead left-justified if
 972 the output does not fill the entire field width.
 973 @end table
 974 @end itemize
 975
 976 The next part of a specification is an, again optional, specification of
 977 the field width.  The width is given by digits following the flags.  If
 978 no width is specified it is assumed to be @math{0}.  The width value is
 979 used after it is determined how much space the printed result needs.  If
 980 it does not require fewer characters than specified by the width value
 981 nothing happens.  Otherwise the output is extended to use as many
 982 characters as the width says by filling with spaces.  At which side
 983 depends on whether the @samp{-} flag was given or not.  If it was given,
 984 the spaces are added at the right, making the output right-justified and
 985 vice versa.
 986
 987 So far the format looks familiar as it is similar to @code{printf} or
 988 @code{strftime} formats.  But the next two fields introduce something
 989 new.  The first one, if available, is introduced by a @samp{#} character
 990 which is followed by a decimal digit string.  The value of the digit
 991 string specifies the width the formatted digits left to the radix
 992 character.  This does @emph{not} include the grouping character needed
 993 if the @samp{^} flag is not given.  If the space needed to print the
 994 number does not fill the whole width the field is padded at the left
 995 side with the fill character which can be selected using the @samp{=}
 996 flag and which by default is a space.  For example, if the field width
 997 is selected as 6 and the number is @math{123}, the fill character is
 998 @samp{*} the result will be @samp{***123}.
 999
1000 The next field is introduced by a @samp{.} (period) and consists of
1001 another decimal digit string.  Its value describes the number of
1002 characters printed after the radix character.  The default is
1003 selected from the current locale (@code{frac_digits},
1004 @code{int_frac_digits}, see @pxref{General Numeric}).  If the exact
1005 representation needs more digits than those specified by the field width
1006 the displayed value is rounded.  In case the number of fractional digits
1007 is selected to be zero, no radix character is printed.
1008
1009 As a GNU extension the @code{strfmon} implementation in the GNU libc
1010 allows as the next field an optional @samp{L} as a format modifier.  If
1011 this modifier is given the argument is expected to be a @code{long
1012 double} instead of a @code{double} value.
1013
1014 Finally as the last component of the format there must come a format
1015 specifying.  There are three specifiers defined:
1016
1017 @table @asis
1018 @item @samp{i}
1019 The argument is formatted according to the locale's rules to format an
1020 international currency value.
1021 @item @samp{n}
1022 The argument is formatted according to the locale's rules to format an
1023 national currency value.
1024 @item @samp{%}
1025 Creates a @samp{%} in the output.  There must be no flag, width
1026 specifier or modifier given, only @samp{%%} is allowed.
1027 @end table
1028
1029 As it is done for @code{printf}, the function reads the format string
1030 from left to right and uses the values passed to the function following
1031 the format string.  The values are expected to be either of type
1032 @code{double} or @code{long double}, depending on the presence of the
1033 modifier @samp{L}.  The result is stored in the buffer pointed to by
1034 @var{s}.  At most @var{maxsize} characters are stored.
1035
1036 The return value of the function is the number of characters stored in
1037 @var{s}, including the terminating NUL byte.  If the number of
1038 characters stored would exceed @var{maxsize} the function returns
1039 @math{-1} and the content of the buffer @var{s} is unspecified.  In this
1040 case @code{errno} is set to @code{E2BIG}.
1041 @end deftypefun
1042
1043 A few examples should make it clear how to use this function.  It is
1044 assumed that all the following pieces of code are executed in a program
1045 which uses the locale valid for the USA (@code{en_US}).  The simplest
1046 form of the format is this:
1047
1048 @smallexample
1049 strfmon (buf, 100, "@@%n@@%n@@%n@@", 123.45, -567.89, 12345.678);
1050 @end smallexample
1051
1052 @noindent
1053 The output produced is
1054 @smallexample
1055 "@@$123.45@@-$567.89@@$12,345.68@@"
1056 @end smallexample
1057
1058 We can notice several things here.  First, the width for all formats is
1059 different.  We have not specified a width in the format string and so
1060 this is no wonder.  Second, the third number is printed using thousands
1061 separators.  The thousands separator for the @code{en_US} locale is a
1062 comma.  Beside this the number is rounded.  The @math{.678} are rounded
1063 to @math{.68} since the format does not specify a precision and the
1064 default value in the locale is @math{2}.  A last thing is that the
1065 national currency symbol is printed since @samp{%n} was used, not
1066 @samp{i}.  The next example shows how we can align the output.
1067
1068 @smallexample
1069 strfmon (buf, 100, "@@%=*11n@@%=*11n@@%=*11n@@", 123.45, -567.89, 12345.678);
1070 @end smallexample
1071
1072 @noindent
1073 The output this time is:
1074
1075 @smallexample
1076 "@@    $123.45@@   -$567.89@@ $12,345.68@@"
1077 @end smallexample
1078
1079 Two things stand out.  First, all fields have the same width (eleven
1080 characters) since this is the width given in the format and since no
1081 number required more characters to be printed.  The second important
1082 point is that the fill character is not used.  This is correct since the
1083 white space was not used to fill the space specified by the right
1084 precision, but instead it is used to fill to the given width.  The
1085 difference becomes obvious if we now add a right width specification.
1086
1087 @smallexample
1088 strfmon (buf, 100, "@@%=*11#5n@@%=*11#5n@@%=*11#5n@@",
1089          123.45, -567.89, 12345.678);
1090 @end smallexample
1091
1092 @noindent
1093 The output is
1094
1095 @smallexample
1096 "@@ $***123.45@@-$***567.89@@ $12,456.68@@"
1097 @end smallexample
1098
1099 Here we can see that all the currency symbols are now aligned and the
1100 space between the currency sign and the number is filled with the
1101 selected fill character.  Please note that although the right precision
1102 is selected to be @math{5} and @math{123.45} has three characters right
1103 of the radix character, the space is filled with three asterisks.  This
1104 is correct since as explained above, the right precision does not count
1105 the characters used for the thousands separators in.  One last example
1106 should explain the remaining functionality.
1107
1108 @smallexample
1109 strfmon (buf, 100, "@@%=0(16#5.3i@@%=0(16#5.3i@@%=0(16#5.3i@@",
1110          123.45, -567.89, 12345.678);
1111 @end smallexample
1112
1113 @noindent
1114 This rather complex format string produces the following output:
1115
1116 @smallexample
1117 "@@ USD 000123,450 @@(USD 000567.890)@@ USD 12,345.678 @@"
1118 @end smallexample
1119
1120 The most noticeable change is the use of the alternative style to
1121 represent negative numbers.  In financial circles it is often done using
1122 parentheses and this is what the @samp{(} flag selected.  The fill character
1123 is now @samp{0}.  Please note that this @samp{0} character is not
1124 regarded as a numeric zero and therefore the first and second number are
1125 not printed using a thousands separator.  Since we use in the format the
1126 specifier @samp{i} instead of @samp{n} now the international form of the
1127 currency symbol is used.  This is a four letter string, in this case
1128 @code{"USD "}.  The last point is that since the left precision is
1129 selected to be three the first and second number are printed with an
1130 extra zero at the end and the third number is printed unrounded.