manual/locale.texi

   1 @node Locales, Message Translation, Character Set Handling, Top
   2 @c %MENU% The country and language can affect the behavior of library functions
   3 @chapter Locales and Internationalization
   4
   5 Different countries and cultures have varying conventions for how to
   6 communicate.  These conventions range from very simple ones, such as the
   7 format for representing dates and times, to very complex ones, such as
   8 the language spoken.
   9
  10 @cindex internationalization
  11 @cindex locales
  12 @dfn{Internationalization} of software means programming it to be able
  13 to adapt to the user's favorite conventions.  In @w{ISO C},
  14 internationalization works by means of @dfn{locales}.  Each locale
  15 specifies a collection of conventions, one convention for each purpose.
  16 The user chooses a set of conventions by specifying a locale (via
  17 environment variables).
  18
  19 All programs inherit the chosen locale as part of their environment.
  20 Provided the programs are written to obey the choice of locale, they
  21 will follow the conventions preferred by the user.
  22
  23 @menu
  24 * Effects of Locale::           Actions affected by the choice of
  25                                  locale.
  26 * Choosing Locale::             How the user specifies a locale.
  27 * Locale Categories::           Different purposes for which you can
  28                                  select a locale.
  29 * Setting the Locale::          How a program specifies the locale
  30                                  with library functions.
  31 * Standard Locales::            Locale names available on all systems.
  32 * Locale Information::          How to access the information for the locale.
  33 * Formatting Numbers::          A dedicated function to format numbers.
  34 @end menu
  35
  36 @node Effects of Locale, Choosing Locale,  , Locales
  37 @section What Effects a Locale Has
  38
  39 Each locale specifies conventions for several purposes, including the
  40 following:
  41
  42 @itemize @bullet
  43 @item
  44 What multibyte character sequences are valid, and how they are
  45 interpreted (@pxref{Character Set Handling}).
  46
  47 @item
  48 Classification of which characters in the local character set are
  49 considered alphabetic, and upper- and lower-case conversion conventions
  50 (@pxref{Character Handling}).
  51
  52 @item
  53 The collating sequence for the local language and character set
  54 (@pxref{Collation Functions}).
  55
  56 @item
  57 Formatting of numbers and currency amounts (@pxref{General Numeric}).
  58
  59 @item
  60 Formatting of dates and times (@pxref{Formatting Calendar Time}).
  61
  62 @item
  63 What language to use for output, including error messages
  64 (@pxref{Message Translation}).
  65
  66 @item
  67 What language to use for user answers to yes-or-no questions.
  68
  69 @item
  70 What language to use for more complex user input.
  71 (The C library doesn't yet help you implement this.)
  72 @end itemize
  73
  74 Some aspects of adapting to the specified locale are handled
  75 automatically by the library subroutines.  For example, all your program
  76 needs to do in order to use the collating sequence of the chosen locale
  77 is to use @code{strcoll} or @code{strxfrm} to compare strings.
  78
  79 Other aspects of locales are beyond the comprehension of the library.
  80 For example, the library can't automatically translate your program's
  81 output messages into other languages.  The only way you can support
  82 output in the user's favorite language is to program this more or less
  83 by hand.  The C library provides functions to handle translations for
  84 multiple languages easily.
  85
  86 This chapter discusses the mechanism by which you can modify the current
  87 locale.  The effects of the current locale on specific library functions
  88 are discussed in more detail in the descriptions of those functions.
  89
  90 @node Choosing Locale, Locale Categories, Effects of Locale, Locales
  91 @section Choosing a Locale
  92
  93 The simplest way for the user to choose a locale is to set the
  94 environment variable @code{LANG}.  This specifies a single locale to use
  95 for all purposes.  For example, a user could specify a hypothetical
  96 locale named @samp{espana-castellano} to use the standard conventions of
  97 most of Spain.
  98
  99 The set of locales supported depends on the operating system you are
 100 using, and so do their names.  We can't make any promises about what
 101 locales will exist, except for one standard locale called @samp{C} or
 102 @samp{POSIX}.  Later we will describe how to construct locales.
 103 @comment (@pxref{Building Locale Files}).
 104
 105 @cindex combining locales
 106 A user also has the option of specifying different locales for different
 107 purposes---in effect, choosing a mixture of multiple locales.
 108
 109 For example, the user might specify the locale @samp{espana-castellano}
 110 for most purposes, but specify the locale @samp{usa-english} for
 111 currency formatting.  This might make sense if the user is a
 112 Spanish-speaking American, working in Spanish, but representing monetary
 113 amounts in US dollars.
 114
 115 Note that both locales @samp{espana-castellano} and @samp{usa-english},
 116 like all locales, would include conventions for all of the purposes to
 117 which locales apply.  However, the user can choose to use each locale
 118 for a particular subset of those purposes.
 119
 120 @node Locale Categories, Setting the Locale, Choosing Locale, Locales
 121 @section Categories of Activities that Locales Affect
 122 @cindex categories for locales
 123 @cindex locale categories
 124
 125 The purposes that locales serve are grouped into @dfn{categories}, so
 126 that a user or a program can choose the locale for each category
 127 independently.  Here is a table of categories; each name is both an
 128 environment variable that a user can set, and a macro name that you can
 129 use as an argument to @code{setlocale}.
 130
 131 @vtable @code
 132 @comment locale.h
 133 @comment ISO
 134 @item LC_COLLATE
 135 This category applies to collation of strings (functions @code{strcoll}
 136 and @code{strxfrm}); see @ref{Collation Functions}.
 137
 138 @comment locale.h
 139 @comment ISO
 140 @item LC_CTYPE
 141 This category applies to classification and conversion of characters,
 142 and to multibyte and wide characters;
 143 see @ref{Character Handling}, and @ref{Character Set Handling}.
 144
 145 @comment locale.h
 146 @comment ISO
 147 @item LC_MONETARY
 148 This category applies to formatting monetary values; see @ref{General Numeric}.
 149
 150 @comment locale.h
 151 @comment ISO
 152 @item LC_NUMERIC
 153 This category applies to formatting numeric values that are not
 154 monetary; see @ref{General Numeric}.
 155
 156 @comment locale.h
 157 @comment ISO
 158 @item LC_TIME
 159 This category applies to formatting date and time values; see
 160 @ref{Formatting Calendar Time}.
 161
 162 @comment locale.h
 163 @comment XOPEN
 164 @item LC_MESSAGES
 165 This category applies to selecting the language used in the user
 166 interface for message translation (@pxref{The Uniforum approach};
 167 @pxref{Message catalogs a la X/Open}).
 168
 169 @comment locale.h
 170 @comment ISO
 171 @item LC_ALL
 172 This is not an environment variable; it is only a macro that you can use
 173 with @code{setlocale} to set a single locale for all purposes.  Setting
 174 this environment variable overwrites all selections by the other
 175 @code{LC_*} variables or @code{LANG}.
 176
 177 @comment locale.h
 178 @comment ISO
 179 @item LANG
 180 If this environment variable is defined, its value specifies the locale
 181 to use for all purposes except as overridden by the variables above.
 182 @end vtable
 183
 184 @vindex LANGUAGE
 185 When developing the message translation functions it was felt that the
 186 functionality provided by the variables above is not sufficient.  For
 187 example, it should be possible to specify more than one locale name.
 188 Take a Swedish user who better speaks German than English, and a program
 189 whose messages are output in English by default.  It should be possible
 190 to specify that the first choice of language is Swedish, the second
 191 German, and if this also fails to use English.  This is
 192 possible with the variable @code{LANGUAGE}.  For further description of
 193 this GNU extension see @ref{Using gettextized software}.
 194
 195 @node Setting the Locale, Standard Locales, Locale Categories, Locales
 196 @section How Programs Set the Locale
 197
 198 A C program inherits its locale environment variables when it starts up.
 199 This happens automatically.  However, these variables do not
 200 automatically control the locale used by the library functions, because
 201 @w{ISO C} says that all programs start by default in the standard @samp{C}
 202 locale.  To use the locales specified by the environment, you must call
 203 @code{setlocale}.  Call it as follows:
 204
 205 @smallexample
 206 setlocale (LC_ALL, "");
 207 @end smallexample
 208
 209 @noindent
 210 to select a locale based on the user choice of the appropriate
 211 environment variables.
 212
 213 @cindex changing the locale
 214 @cindex locale, changing
 215 You can also use @code{setlocale} to specify a particular locale, for
 216 general use or for a specific category.
 217
 218 @pindex locale.h
 219 The symbols in this section are defined in the header file @file{locale.h}.
 220
 221 @comment locale.h
 222 @comment ISO
 223 @deftypefun {char *} setlocale (int @var{category}, const char *@var{locale})
 224 The function @code{setlocale} sets the current locale for category
 225 @var{category} to @var{locale}.  A list of all the locales the system
 226 provides can be created by running
 227
 228 @smallexample
 229   locale -a
 230 @end smallexample
 231
 232 If @var{category} is @code{LC_ALL}, this specifies the locale for all
 233 purposes.  The other possible values of @var{category} specify an
 234 single purpose (@pxref{Locale Categories}).
 235
 236 You can also use this function to find out the current locale by passing
 237 a null pointer as the @var{locale} argument.  In this case,
 238 @code{setlocale} returns a string that is the name of the locale
 239 currently selected for category @var{category}.
 240
 241 The string returned by @code{setlocale} can be overwritten by subsequent
 242 calls, so you should make a copy of the string (@pxref{Copying and
 243 Concatenation}) if you want to save it past any further calls to
 244 @code{setlocale}.  (The standard library is guaranteed never to call
 245 @code{setlocale} itself.)
 246
 247 You should not modify the string returned by @code{setlocale}.  It might
 248 be the same string that was passed as an argument in a previous call to
 249 @code{setlocale}.  One requirement is that the @var{category} must be
 250 the same in the call the string was returned and the one when the string
 251 is passed in as @var{locale} parameter.
 252
 253 When you read the current locale for category @code{LC_ALL}, the value
 254 encodes the entire combination of selected locales for all categories.
 255 In this case, the value is not just a single locale name.  In fact, we
 256 don't make any promises about what it looks like.  But if you specify
 257 the same ``locale name'' with @code{LC_ALL} in a subsequent call to
 258 @code{setlocale}, it restores the same combination of locale selections.
 259
 260 To be sure you can use the returned string encoding the currently selected
 261 locale at a later time, you must make a copy of the string.  It is not
 262 guaranteed that the returned pointer remains valid over time.
 263
 264 When the @var{locale} argument is not a null pointer, the string returned
 265 by @code{setlocale} reflects the newly-modified locale.
 266
 267 If you specify an empty string for @var{locale}, this means to read the
 268 appropriate environment variable and use its value to select the locale
 269 for @var{category}.
 270
 271 If a nonempty string is given for @var{locale}, then the locale of that
 272 name is used if possible.
 273
 274 If you specify an invalid locale name, @code{setlocale} returns a null
 275 pointer and leaves the current locale unchanged.
 276 @end deftypefun
 277
 278 Here is an example showing how you might use @code{setlocale} to
 279 temporarily switch to a new locale.
 280
 281 @smallexample
 282 #include <stddef.h>
 283 #include <locale.h>
 284 #include <stdlib.h>
 285 #include <string.h>
 286
 287 void
 288 with_other_locale (char *new_locale,
 289                    void (*subroutine) (int),
 290                    int argument)
 291 @{
 292   char *old_locale, *saved_locale;
 293
 294   /* @r{Get the name of the current locale.}  */
 295   old_locale = setlocale (LC_ALL, NULL);
 296
 297   /* @r{Copy the name so it won't be clobbered by @code{setlocale}.} */
 298   saved_locale = strdup (old_locale);
 299   if (saved_locale == NULL)
 300     fatal ("Out of memory");
 301
 302   /* @r{Now change the locale and do some stuff with it.} */
 303   setlocale (LC_ALL, new_locale);
 304   (*subroutine) (argument);
 305
 306   /* @r{Restore the original locale.} */
 307   setlocale (LC_ALL, saved_locale);
 308   free (saved_locale);
 309 @}
 310 @end smallexample
 311
 312 @strong{Portability Note:} Some @w{ISO C} systems may define additional
 313 locale categories, and future versions of the library will do so.  For
 314 portability, assume that any symbol beginning with @samp{LC_} might be
 315 defined in @file{locale.h}.
 316
 317 @node Standard Locales, Locale Information, Setting the Locale, Locales
 318 @section Standard Locales
 319
 320 The only locale names you can count on finding on all operating systems
 321 are these three standard ones:
 322
 323 @table @code
 324 @item "C"
 325 This is the standard C locale.  The attributes and behavior it provides
 326 are specified in the @w{ISO C} standard.  When your program starts up, it
 327 initially uses this locale by default.
 328
 329 @item "POSIX"
 330 This is the standard POSIX locale.  Currently, it is an alias for the
 331 standard C locale.
 332
 333 @item ""
 334 The empty name says to select a locale based on environment variables.
 335 @xref{Locale Categories}.
 336 @end table
 337
 338 Defining and installing named locales is normally a responsibility of
 339 the system administrator at your site (or the person who installed the
 340 GNU C library).  It is also possible for the user to create private
 341 locales.  All this will be discussed later when describing the tool to
 342 do so.
 343 @comment (@pxref{Building Locale Files}).
 344
 345 If your program needs to use something other than the @samp{C} locale,
 346 it will be more portable if you use whatever locale the user specifies
 347 with the environment, rather than trying to specify some non-standard
 348 locale explicitly by name.  Remember, different machines might have
 349 different sets of locales installed.
 350
 351 @node Locale Information, Formatting Numbers, Standard Locales, Locales
 352 @section Accessing Locale Information
 353
 354 There are several ways to access locale information.  The simplest
 355 way is to let the C library itself do the work.  Several of the
 356 functions in this library implicitly access the locale data, and use
 357 what information is provided by the currently selected locale.  This is
 358 how the locale model is meant to work normally.
 359
 360 As an example take the @code{strftime} function, which is meant to nicely
 361 format date and time information (@pxref{Formatting Calendar Time}).
 362 Part of the standard information contained in the @code{LC_TIME}
 363 category is the names of the months.  Instead of requiring the
 364 programmer to take care of providing the translations the
 365 @code{strftime} function does this all by itself.  @code{%A}
 366 in the format string is replaced by the appropriate weekday
 367 name of the locale currently selected by @code{LC_TIME}.  This is an
 368 easy example, and wherever possible functions do things automatically
 369 in this way.
 370
 371 But there are quite often situations when there is simply no function
 372 to perform the task, or it is simply not possible to do the work
 373 automatically.  For these cases it is necessary to access the
 374 information in the locale directly.  To do this the C library provides
 375 two functions: @code{localeconv} and @code{nl_langinfo}.  The former is
 376 part of @w{ISO C} and therefore portable, but has a brain-damaged
 377 interface.  The second is part of the Unix interface and is portable in
 378 as far as the system follows the Unix standards.
 379
 380 @menu
 381 * The Lame Way to Locale Data::   ISO C's @code{localeconv}.
 382 * The Elegant and Fast Way::      X/Open's @code{nl_langinfo}.
 383 @end menu
 384
 385 @node The Lame Way to Locale Data, The Elegant and Fast Way, ,Locale Information
 386 @subsection @code{localeconv}: It is portable but @dots{}
 387
 388 Together with the @code{setlocale} function the @w{ISO C} people
 389 invented the @code{localeconv} function.  It is a masterpiece of poor
 390 design.  It is expensive to use, not extendable, and not generally
 391 usable as it provides access to only @code{LC_MONETARY} and
 392 @code{LC_NUMERIC} related information.  Nevertheless, if it is
 393 applicable to a given situation it should be used since it is very
 394 portable.  The function @code{strfmon} formats monetary amounts
 395 according to the selected locale using this information.
 396 @pindex locale.h
 397 @cindex monetary value formatting
 398 @cindex numeric value formatting
 399
 400 @comment locale.h
 401 @comment ISO
 402 @deftypefun {struct lconv *} localeconv (void)
 403 The @code{localeconv} function returns a pointer to a structure whose
 404 components contain information about how numeric and monetary values
 405 should be formatted in the current locale.
 406
 407 You should not modify the structure or its contents.  The structure might
 408 be overwritten by subsequent calls to @code{localeconv}, or by calls to
 409 @code{setlocale}, but no other function in the library overwrites this
 410 value.
 411 @end deftypefun
 412
 413 @comment locale.h
 414 @comment ISO
 415 @deftp {Data Type} {struct lconv}
 416 @code{localeconv}'s return value is of this data type.  Its elements are
 417 described in the following subsections.
 418 @end deftp
 419
 420 If a member of the structure @code{struct lconv} has type @code{char},
 421 and the value is @code{CHAR_MAX}, it means that the current locale has
 422 no value for that parameter.
 423
 424 @menu
 425 * General Numeric::             Parameters for formatting numbers and
 426                                  currency amounts.
 427 * Currency Symbol::             How to print the symbol that identifies an
 428                                  amount of money (e.g. @samp{$}).
 429 * Sign of Money Amount::        How to print the (positive or negative) sign
 430                                  for a monetary amount, if one exists.
 431 @end menu
 432
 433 @node General Numeric, Currency Symbol, , The Lame Way to Locale Data
 434 @subsubsection Generic Numeric Formatting Parameters
 435
 436 These are the standard members of @code{struct lconv}; there may be
 437 others.
 438
 439 @table @code
 440 @item char *decimal_point
 441 @itemx char *mon_decimal_point
 442 These are the decimal-point separators used in formatting non-monetary
 443 and monetary quantities, respectively.  In the @samp{C} locale, the
 444 value of @code{decimal_point} is @code{"."}, and the value of
 445 @code{mon_decimal_point} is @code{""}.
 446 @cindex decimal-point separator
 447
 448 @item char *thousands_sep
 449 @itemx char *mon_thousands_sep
 450 These are the separators used to delimit groups of digits to the left of
 451 the decimal point in formatting non-monetary and monetary quantities,
 452 respectively.  In the @samp{C} locale, both members have a value of
 453 @code{""} (the empty string).
 454
 455 @item char *grouping
 456 @itemx char *mon_grouping
 457 These are strings that specify how to group the digits to the left of
 458 the decimal point.  @code{grouping} applies to non-monetary quantities
 459 and @code{mon_grouping} applies to monetary quantities.  Use either
 460 @code{thousands_sep} or @code{mon_thousands_sep} to separate the digit
 461 groups.
 462 @cindex grouping of digits
 463
 464 Each member of these strings is to be interpreted as an integer value of
 465 type @code{char}.  Successive numbers (from left to right) give the
 466 sizes of successive groups (from right to left, starting at the decimal
 467 point.)  The last member is either @code{0}, in which case the previous
 468 member is used over and over again for all the remaining groups, or
 469 @code{CHAR_MAX}, in which case there is no more grouping---or, put
 470 another way, any remaining digits form one large group without
 471 separators.
 472
 473 For example, if @code{grouping} is @code{"\04\03\02"}, the correct
 474 grouping for the number @code{123456787654321} is @samp{12}, @samp{34},
 475 @samp{56}, @samp{78}, @samp{765}, @samp{4321}.  This uses a group of 4
 476 digits at the end, preceded by a group of 3 digits, preceded by groups
 477 of 2 digits (as many as needed).  With a separator of @samp{,}, the
 478 number would be printed as @samp{12,34,56,78,765,4321}.
 479
 480 A value of @code{"\03"} indicates repeated groups of three digits, as
 481 normally used in the U.S.
 482
 483 In the standard @samp{C} locale, both @code{grouping} and
 484 @code{mon_grouping} have a value of @code{""}.  This value specifies no
 485 grouping at all.
 486
 487 @item char int_frac_digits
 488 @itemx char frac_digits
 489 These are small integers indicating how many fractional digits (to the
 490 right of the decimal point) should be displayed in a monetary value in
 491 international and local formats, respectively.  (Most often, both
 492 members have the same value.)
 493
 494 In the standard @samp{C} locale, both of these members have the value
 495 @code{CHAR_MAX}, meaning ``unspecified''.  The ISO standard doesn't say
 496 what to do when you find this value; we recommend printing no
 497 fractional digits.  (This locale also specifies the empty string for
 498 @code{mon_decimal_point}, so printing any fractional digits would be
 499 confusing!)
 500 @end table
 501
 502 @node Currency Symbol, Sign of Money Amount, General Numeric, The Lame Way to Locale Data
 503 @subsubsection Printing the Currency Symbol
 504 @cindex currency symbols
 505
 506 These members of the @code{struct lconv} structure specify how to print
 507 the symbol to identify a monetary value---the international analog of
 508 @samp{$} for US dollars.
 509
 510 Each country has two standard currency symbols.  The @dfn{local currency
 511 symbol} is used commonly within the country, while the
 512 @dfn{international currency symbol} is used internationally to refer to
 513 that country's currency when it is necessary to indicate the country
 514 unambiguously.
 515
 516 For example, many countries use the dollar as their monetary unit, and
 517 when dealing with international currencies it's important to specify
 518 that one is dealing with (say) Canadian dollars instead of U.S. dollars
 519 or Australian dollars.  But when the context is known to be Canada,
 520 there is no need to make this explicit---dollar amounts are implicitly
 521 assumed to be in Canadian dollars.
 522
 523 @table @code
 524 @item char *currency_symbol
 525 The local currency symbol for the selected locale.
 526
 527 In the standard @samp{C} locale, this member has a value of @code{""}
 528 (the empty string), meaning ``unspecified''.  The ISO standard doesn't
 529 say what to do when you find this value; we recommend you simply print
 530 the empty string as you would print any other string pointed to by this
 531 variable.
 532
 533 @item char *int_curr_symbol
 534 The international currency symbol for the selected locale.
 535
 536 The value of @code{int_curr_symbol} should normally consist of a
 537 three-letter abbreviation determined by the international standard
 538 @cite{ISO 4217 Codes for the Representation of Currency and Funds},
 539 followed by a one-character separator (often a space).
 540
 541 In the standard @samp{C} locale, this member has a value of @code{""}
 542 (the empty string), meaning ``unspecified''.  We recommend you simply print
 543 the empty string as you would print any other string pointed to by this
 544 variable.
 545
 546 @item char p_cs_precedes
 547 @itemx char n_cs_precedes
 548 @itemx char int_p_cs_precedes
 549 @itemx char int_n_cs_precedes
 550 These members are @code{1} if the @code{currency_symbol} or
 551 @code{int_curr_symbol} strings should precede the value of a monetary
 552 amount, or @code{0} if the strings should follow the value.  The
 553 @code{p_cs_precedes} and @code{int_p_cs_precedes} members apply to
 554 positive amounts (or zero), and the @code{n_cs_precedes} and
 555 @code{int_n_cs_precedes} members apply to negative amounts.
 556
 557 In the standard @samp{C} locale, all of these members have a value of
 558 @code{CHAR_MAX}, meaning ``unspecified''.  The ISO standard doesn't say
 559 what to do when you find this value.  We recommend printing the
 560 currency symbol before the amount, which is right for most countries.
 561 In other words, treat all nonzero values alike in these members.
 562
 563 The members with the @code{int_} prefix apply to the
 564 @code{int_curr_symbol} while the other two apply to
 565 @code{currency_symbol}.
 566
 567 @item char p_sep_by_space
 568 @itemx char n_sep_by_space
 569 @itemx char int_p_sep_by_space
 570 @itemx char int_n_sep_by_space
 571 These members are @code{1} if a space should appear between the
 572 @code{currency_symbol} or @code{int_curr_symbol} strings and the
 573 amount, or @code{0} if no space should appear.  The
 574 @code{p_sep_by_space} and @code{int_p_sep_by_space} members apply to
 575 positive amounts (or zero), and the @code{n_sep_by_space} and
 576 @code{int_n_sep_by_space} members apply to negative amounts.
 577
 578 In the standard @samp{C} locale, all of these members have a value of
 579 @code{CHAR_MAX}, meaning ``unspecified''.  The ISO standard doesn't say
 580 what you should do when you find this value; we suggest you treat it as
 581 1 (print a space).  In other words, treat all nonzero values alike in
 582 these members.
 583
 584 The members with the @code{int_} prefix apply to the
 585 @code{int_curr_symbol} while the other two apply to
 586 @code{currency_symbol}.  There is one specialty with the
 587 @code{int_curr_symbol}, though.  Since all legal values contain a space
 588 at the end the string one either printf this space (if the currency
 589 symbol must appear in front and must be separated) or one has to avoid
 590 printing this character at all (especially when at the end of the
 591 string).
 592 @end table
 593
 594 @node Sign of Money Amount, , Currency Symbol, The Lame Way to Locale Data
 595 @subsubsection Printing the Sign of a Monetary Amount
 596
 597 These members of the @code{struct lconv} structure specify how to print
 598 the sign (if any) of a monetary value.
 599
 600 @table @code
 601 @item char *positive_sign
 602 @itemx char *negative_sign
 603 These are strings used to indicate positive (or zero) and negative
 604 monetary quantities, respectively.
 605
 606 In the standard @samp{C} locale, both of these members have a value of
 607 @code{""} (the empty string), meaning ``unspecified''.
 608
 609 The ISO standard doesn't say what to do when you find this value; we
 610 recommend printing @code{positive_sign} as you find it, even if it is
 611 empty.  For a negative value, print @code{negative_sign} as you find it
 612 unless both it and @code{positive_sign} are empty, in which case print
 613 @samp{-} instead.  (Failing to indicate the sign at all seems rather
 614 unreasonable.)
 615
 616 @item char p_sign_posn
 617 @itemx char n_sign_posn
 618 @itemx char int_p_sign_posn
 619 @itemx char int_n_sign_posn
 620 These members are small integers that indicate how to
 621 position the sign for nonnegative and negative monetary quantities,
 622 respectively.  (The string used by the sign is what was specified with
 623 @code{positive_sign} or @code{negative_sign}.)  The possible values are
 624 as follows:
 625
 626 @table @code
 627 @item 0
 628 The currency symbol and quantity should be surrounded by parentheses.
 629
 630 @item 1
 631 Print the sign string before the quantity and currency symbol.
 632
 633 @item 2
 634 Print the sign string after the quantity and currency symbol.
 635
 636 @item 3
 637 Print the sign string right before the currency symbol.
 638
 639 @item 4
 640 Print the sign string right after the currency symbol.
 641
 642 @item CHAR_MAX
 643 ``Unspecified''.  Both members have this value in the standard
 644 @samp{C} locale.
 645 @end table
 646
 647 The ISO standard doesn't say what you should do when the value is
 648 @code{CHAR_MAX}.  We recommend you print the sign after the currency
 649 symbol.
 650
 651 The members with the @code{int_} prefix apply to the
 652 @code{int_curr_symbol} while the other two apply to
 653 @code{currency_symbol}.
 654 @end table
 655
 656 @node The Elegant and Fast Way, , The Lame Way to Locale Data, Locale Information
 657 @subsection Pinpoint Access to Locale Data
 658
 659 When writing the X/Open Portability Guide the authors realized that the
 660 @code{localeconv} function is not enough to provide reasonable access to
 661 locale information.  The information which was meant to be available
 662 in the locale (as later specified in the POSIX.1 standard) requires more
 663 ways to access it.  Therefore the @code{nl_langinfo} function
 664 was introduced.
 665
 666 @comment langinfo.h
 667 @comment XOPEN
 668 @deftypefun {char *} nl_langinfo (nl_item @var{item})
 669 The @code{nl_langinfo} function can be used to access individual
 670 elements of the locale categories.  Unlike the @code{localeconv}
 671 function, which returns all the information, @code{nl_langinfo}
 672 lets the caller select what information it requires.  This is very
 673 fast and it is not a problem to call this function multiple times.
 674
 675 A second advantage is that in addition to the numeric and monetary
 676 formatting information, information from the
 677 @code{LC_TIME} and @code{LC_MESSAGES} categories is available.
 678
 679 The type @code{nl_type} is defined in @file{nl_types.h}.  The argument
 680 @var{item} is a numeric value defined in the header @file{langinfo.h}.
 681 The X/Open standard defines the following values:
 682
 683 @vtable @code
 684 @item ABDAY_1
 685 @itemx ABDAY_2
 686 @itemx ABDAY_3
 687 @itemx ABDAY_4
 688 @itemx ABDAY_5
 689 @itemx ABDAY_6
 690 @itemx ABDAY_7
 691 @code{nl_langinfo} returns the abbreviated weekday name.  @code{ABDAY_1}
 692 corresponds to Sunday.
 693 @item DAY_1
 694 @itemx DAY_2
 695 @itemx DAY_3
 696 @itemx DAY_4
 697 @itemx DAY_5
 698 @itemx DAY_6
 699 @itemx DAY_7
 700 Similar to @code{ABDAY_1} etc., but here the return value is the
 701 unabbreviated weekday name.
 702 @item ABMON_1
 703 @itemx ABMON_2
 704 @itemx ABMON_3
 705 @itemx ABMON_4
 706 @itemx ABMON_5
 707 @itemx ABMON_6
 708 @itemx ABMON_7
 709 @itemx ABMON_8
 710 @itemx ABMON_9
 711 @itemx ABMON_10
 712 @itemx ABMON_11
 713 @itemx ABMON_12
 714 The return value is abbreviated name of the month.  @code{ABMON_1}
 715 corresponds to January.
 716 @item MON_1
 717 @itemx MON_2
 718 @itemx MON_3
 719 @itemx MON_4
 720 @itemx MON_5
 721 @itemx MON_6
 722 @itemx MON_7
 723 @itemx MON_8
 724 @itemx MON_9
 725 @itemx MON_10
 726 @itemx MON_11
 727 @itemx MON_12
 728 Similar to @code{ABMON_1} etc., but here the month names are not abbreviated.
 729 Here the first value @code{MON_1} also corresponds to January.
 730 @item AM_STR
 731 @itemx PM_STR
 732 The return values are strings which can be used in the representation of time
 733 as an hour from 1 to 12 plus an am/pm specifier.
 734
 735 Note that in locales which do not use this time representation
 736 these strings might be empty, in which case the am/pm format
 737 cannot be used at all.
 738 @item D_T_FMT
 739 The return value can be used as a format string for @code{strftime} to
 740 represent time and date in a locale-specific way.
 741 @item D_FMT
 742 The return value can be used as a format string for @code{strftime} to
 743 represent a date in a locale-specific way.
 744 @item T_FMT
 745 The return value can be used as a format string for @code{strftime} to
 746 represent time in a locale-specific way.
 747 @item T_FMT_AMPM
 748 The return value can be used as a format string for @code{strftime} to
 749 represent time in the am/pm format.
 750
 751 Note that if the am/pm format does not make any sense for the
 752 selected locale, the return value might be the same as the one for
 753 @code{T_FMT}.
 754 @item ERA
 755 The return value represents the era used in the current locale.
 756
 757 Most locales do not define this value.  An example of a locale which
 758 does define this value is the Japanese one.  In Japan, the traditional
 759 representation of dates includes the name of the era corresponding to
 760 the then-emperor's reign.
 761
 762 Normally it should not be necessary to use this value directly.
 763 Specifying the @code{E} modifier in their format strings causes the
 764 @code{strftime} functions to use this information.  The format of the
 765 returned string is not specified, and therefore you should not assume
 766 knowledge of it on different systems.
 767 @item ERA_YEAR
 768 The return value gives the year in the relevant era of the locale.
 769 As for @code{ERA} it should not be necessary to use this value directly.
 770 @item ERA_D_T_FMT
 771 This return value can be used as a format string for @code{strftime} to
 772 represent dates and times in a locale-specific era-based way.
 773 @item ERA_D_FMT
 774 This return value can be used as a format string for @code{strftime} to
 775 represent a date in a locale-specific era-based way.
 776 @item ERA_T_FMT
 777 This return value can be used as a format string for @code{strftime} to
 778 represent time in a locale-specific era-based way.
 779 @item ALT_DIGITS
 780 The return value is a representation of up to @math{100} values used to
 781 represent the values @math{0} to @math{99}.  As for @code{ERA} this
 782 value is not intended to be used directly, but instead indirectly
 783 through the @code{strftime} function.  When the modifier @code{O} is
 784 used in a format which would otherwise use numerals to represent hours,
 785 minutes, seconds, weekdays, months, or weeks, the appropriate value for
 786 the locale is used instead.
 787 @item INT_CURR_SYMBOL
 788 The same as the value returned by @code{localeconv} in the
 789 @code{int_curr_symbol} element of the @code{struct lconv}.
 790 @item CURRENCY_SYMBOL
 791 @itemx CRNCYSTR
 792 The same as the value returned by @code{localeconv} in the
 793 @code{currency_symbol} element of the @code{struct lconv}.
 794
 795 @code{CRNCYSTR} is a deprecated alias still required by Unix98.
 796 @item MON_DECIMAL_POINT
 797 The same as the value returned by @code{localeconv} in the
 798 @code{mon_decimal_point} element of the @code{struct lconv}.
 799 @item MON_THOUSANDS_SEP
 800 The same as the value returned by @code{localeconv} in the
 801 @code{mon_thousands_sep} element of the @code{struct lconv}.
 802 @item MON_GROUPING
 803 The same as the value returned by @code{localeconv} in the
 804 @code{mon_grouping} element of the @code{struct lconv}.
 805 @item POSITIVE_SIGN
 806 The same as the value returned by @code{localeconv} in the
 807 @code{positive_sign} element of the @code{struct lconv}.
 808 @item NEGATIVE_SIGN
 809 The same as the value returned by @code{localeconv} in the
 810 @code{negative_sign} element of the @code{struct lconv}.
 811 @item INT_FRAC_DIGITS
 812 The same as the value returned by @code{localeconv} in the
 813 @code{int_frac_digits} element of the @code{struct lconv}.
 814 @item FRAC_DIGITS
 815 The same as the value returned by @code{localeconv} in the
 816 @code{frac_digits} element of the @code{struct lconv}.
 817 @item P_CS_PRECEDES
 818 The same as the value returned by @code{localeconv} in the
 819 @code{p_cs_precedes} element of the @code{struct lconv}.
 820 @item P_SEP_BY_SPACE
 821 The same as the value returned by @code{localeconv} in the
 822 @code{p_sep_by_space} element of the @code{struct lconv}.
 823 @item N_CS_PRECEDES
 824 The same as the value returned by @code{localeconv} in the
 825 @code{n_cs_precedes} element of the @code{struct lconv}.
 826 @item N_SEP_BY_SPACE
 827 The same as the value returned by @code{localeconv} in the
 828 @code{n_sep_by_space} element of the @code{struct lconv}.
 829 @item P_SIGN_POSN
 830 The same as the value returned by @code{localeconv} in the
 831 @code{p_sign_posn} element of the @code{struct lconv}.
 832 @item N_SIGN_POSN
 833 The same as the value returned by @code{localeconv} in the
 834 @code{n_sign_posn} element of the @code{struct lconv}.
 835 @item DECIMAL_POINT
 836 @itemx RADIXCHAR
 837 The same as the value returned by @code{localeconv} in the
 838 @code{decimal_point} element of the @code{struct lconv}.
 839
 840 The name @code{RADIXCHAR} is a deprecated alias still used in Unix98.
 841 @item THOUSANDS_SEP
 842 @itemx THOUSEP
 843 The same as the value returned by @code{localeconv} in the
 844 @code{thousands_sep} element of the @code{struct lconv}.
 845
 846 The name @code{THOUSEP} is a deprecated alias still used in Unix98.
 847 @item GROUPING
 848 The same as the value returned by @code{localeconv} in the
 849 @code{grouping} element of the @code{struct lconv}.
 850 @item YESEXPR
 851 The return value is a regular expression which can be used with the
 852 @code{regex} function to recognize a positive response to a yes/no
 853 question.
 854 @item NOEXPR
 855 The return value is a regular expression which can be used with the
 856 @code{regex} function to recognize a negative response to a yes/no
 857 question.
 858 @item YESSTR
 859 The return value is a locale-specific translation of the positive response
 860 to a yes/no question.
 861
 862 Using this value is deprecated since it is a very special case of
 863 message translation, and is better handled by the message
 864 translation functions (@pxref{Message Translation}).
 865 @item NOSTR
 866 The return value is a locale-specific translation of the negative response
 867 to a yes/no question.  What is said for @code{YESSTR} is also true here.
 868 @end vtable
 869
 870 The file @file{langinfo.h} defines a lot more symbols but none of them
 871 is official.  Using them is not portable, and the format of the
 872 return values might change.  Therefore we recommended you not use
 873 them.
 874
 875 Note that the return value for any valid argument can be used for
 876 in all situations (with the possible exception of the am/pm time formatting
 877 codes).  If the user has not selected any locale for the
 878 appropriate category, @code{nl_langinfo} returns the information from the
 879 @code{"C"} locale.  It is therefore possible to use this function as
 880 shown in the example below.
 881
 882 If the argument @var{item} is not valid, a pointer to an empty string is
 883 returned.
 884 @end deftypefun
 885
 886 An example of @code{nl_langinfo} usage is a function which has to
 887 print a given date and time in a locale-specific way.  At first one
 888 might think that, since @code{strftime} internally uses the locale
 889 information, writing something like the following is enough:
 890
 891 @smallexample
 892 size_t
 893 i18n_time_n_data (char *s, size_t len, const struct tm *tp)
 894 @{
 895   return strftime (s, len, "%X %D", tp);
 896 @}
 897 @end smallexample
 898
 899 The format contains no weekday or month names and therefore is
 900 internationally usable.  Wrong!  The output produced is something like
 901 @code{"hh:mm:ss MM/DD/YY"}.  This format is only recognizable in the
 902 USA.  Other countries use different formats.  Therefore the function
 903 should be rewritten like this:
 904
 905 @smallexample
 906 size_t
 907 i18n_time_n_data (char *s, size_t len, const struct tm *tp)
 908 @{
 909   return strftime (s, len, nl_langinfo (D_T_FMT), tp);
 910 @}
 911 @end smallexample
 912
 913 Now it uses the date and time format of the locale
 914 selected when the program runs.  If the user selects the locale
 915 correctly there should never be a misunderstanding over the time and
 916 date format.
 917
 918 @node Formatting Numbers, , Locale Information, Locales
 919 @section A dedicated function to format numbers
 920
 921 We have seen that the structure returned by @code{localeconv} as well as
 922 the values given to @code{nl_langinfo} allow you to retrieve the various
 923 pieces of locale-specific information to format numbers and monetary
 924 amounts.  We have also seen that the underlying rules are quite complex.
 925
 926 Therefore the X/Open standards introduce a function which uses such
 927 locale information, making it easier for the user to format
 928 numbers according to these rules.
 929
 930 @deftypefun ssize_t strfmon (char *@var{s}, size_t @var{maxsize}, const char *@var{format}, @dots{})
 931 The @code{strfmon} function is similar to the @code{strftime} function
 932 in that it takes a buffer, its size, a format string,
 933 and values to write into the buffer as text in a form specified
 934 by the format string.  Like @code{strftime}, the function
 935 also returns the number of bytes written into the buffer.
 936
 937 There are two differences: @code{strfmon} can take more than one
 938 argument, and, of course, the format specification is different.  Like
 939 @code{strftime}, the format string consists of normal text, which is
 940 output as is, and format specifiers, which are indicated by a @samp{%}.
 941 Immediately after the @samp{%}, you can optionally specify various flags
 942 and formatting information before the main formatting character, in a
 943 similar way to @code{printf}:
 944
 945 @itemize @bullet
 946 @item
 947 Immediately following the @samp{%} there can be one or more of the
 948 following flags:
 949 @table @asis
 950 @item @samp{=@var{f}}
 951 The single byte character @var{f} is used for this field as the numeric
 952 fill character.  By default this character is a space character.
 953 Filling with this character is only performed if a left precision
 954 is specified.  It is not just to fill to the given field width.
 955 @item @samp{^}
 956 The number is printed without grouping the digits according to the rules
 957 of the current locale.  By default grouping is enabled.
 958 @item @samp{+}, @samp{(}
 959 At most one of these flags can be used.  They select which format to
 960 represent the sign of a currency amount.  By default, and if
 961 @samp{+} is given, the locale equivalent of @math{+}/@math{-} is used.  If
 962 @samp{(} is given, negative amounts are enclosed in parentheses.  The
 963 exact format is determined by the values of the @code{LC_MONETARY}
 964 category of the locale selected at program runtime.
 965 @item @samp{!}
 966 The output will not contain the currency symbol.
 967 @item @samp{-}
 968 The output will be formatted left-justified instead of right-justified if
 969 it does not fill the entire field width.
 970 @end table
 971 @end itemize
 972
 973 The next part of a specification is an optional field width.  If no
 974 width is specified @math{0} is taken.  During output, the function first
 975 determines how much space is required.  If it requires at least as many
 976 characters as given by the field width, it is output using as much space
 977 as necessary.  Otherwise, it is extended to use the full width by
 978 filling with the space character.  The presence or absence of the
 979 @samp{-} flag determines the side at which such padding occurs.  If
 980 present, the spaces are added at the right making the output
 981 left-justified, and vice versa.
 982
 983 So far the format looks familiar, being similar to the @code{printf} and
 984 @code{strftime} formats.  However, the next two optional fields
 985 introduce something new.  The first one is a @samp{#} character followed
 986 by a decimal digit string.  The value of the digit string specifies the
 987 number of @emph{digit} positions to the left of the decimal point (or
 988 equivalent).  This does @emph{not} include the grouping character when
 989 the @samp{^} flag is not given.  If the space needed to print the number
 990 does not fill the whole width, the field is padded at the left side with
 991 the fill character, which can be selected using the @samp{=} flag and by
 992 default is a space.  For example, if the field width is selected as 6
 993 and the number is @math{123}, the fill character is @samp{*} the result
 994 will be @samp{***123}.
 995
 996 The second optional field starts with a @samp{.} (period) and consists
 997 of another decimal digit string.  Its value describes the number of
 998 characters printed after the decimal point.  The default is selected
 999 from the current locale (@code{frac_digits}, @code{int_frac_digits}, see
1000 @pxref{General Numeric}).  If the exact representation needs more digits
1001 than given by the field width, the displayed value is rounded.  If the
1002 number of fractional digits is selected to be zero, no decimal point is
1003 printed.
1004
1005 As a GNU extension, the @code{strfmon} implementation in the GNU libc
1006 allows an optional @samp{L} next as a format modifier.  If this modifier
1007 is given, the argument is expected to be a @code{long double} instead of
1008 a @code{double} value.
1009
1010 Finally, the last component is a format specifier.  There are three
1011 specifiers defined:
1012
1013 @table @asis
1014 @item @samp{i}
1015 Use the locale's rules for formatting an international currency value.
1016 @item @samp{n}
1017 Use the locale's rules for formatting a national currency value.
1018 @item @samp{%}
1019 Place a @samp{%} in the output.  There must be no flag, width
1020 specifier or modifier given, only @samp{%%} is allowed.
1021 @end table
1022
1023 As for @code{printf}, the function reads the format string
1024 from left to right and uses the values passed to the function following
1025 the format string.  The values are expected to be either of type
1026 @code{double} or @code{long double}, depending on the presence of the
1027 modifier @samp{L}.  The result is stored in the buffer pointed to by
1028 @var{s}.  At most @var{maxsize} characters are stored.
1029
1030 The return value of the function is the number of characters stored in
1031 @var{s}, including the terminating @code{NULL} byte.  If the number of
1032 characters stored would exceed @var{maxsize}, the function returns
1033 @math{-1} and the content of the buffer @var{s} is unspecified.  In this
1034 case @code{errno} is set to @code{E2BIG}.
1035 @end deftypefun
1036
1037 A few examples should make clear how the function works.  It is
1038 assumed that all the following pieces of code are executed in a program
1039 which uses the USA locale (@code{en_US}).  The simplest
1040 form of the format is this:
1041
1042 @smallexample
1043 strfmon (buf, 100, "@@%n@@%n@@%n@@", 123.45, -567.89, 12345.678);
1044 @end smallexample
1045
1046 @noindent
1047 The output produced is
1048 @smallexample
1049 "@@$123.45@@-$567.89@@$12,345.68@@"
1050 @end smallexample
1051
1052 We can notice several things here.  First, the widths of the output
1053 numbers are different.  We have not specified a width in the format
1054 string, and so this is no wonder.  Second, the third number is printed
1055 using thousands separators.  The thousands separator for the
1056 @code{en_US} locale is a comma.  The number is also rounded.
1057 @math{.678} is rounded to @math{.68} since the format does not specify a
1058 precision and the default value in the locale is @math{2}.  Finally,
1059 note that the national currency symbol is printed since @samp{%n} was
1060 used, not @samp{i}.  The next example shows how we can align the output.
1061
1062 @smallexample
1063 strfmon (buf, 100, "@@%=*11n@@%=*11n@@%=*11n@@", 123.45, -567.89, 12345.678);
1064 @end smallexample
1065
1066 @noindent
1067 The output this time is:
1068
1069 @smallexample
1070 "@@    $123.45@@   -$567.89@@ $12,345.68@@"
1071 @end smallexample
1072
1073 Two things stand out.  Firstly, all fields have the same width (eleven
1074 characters) since this is the width given in the format and since no
1075 number required more characters to be printed.  The second important
1076 point is that the fill character is not used.  This is correct since the
1077 white space was not used to achieve a precision given by a @samp{#}
1078 modifier, but instead to fill to the given width.  The difference
1079 becomes obvious if we now add a width specification.
1080
1081 @smallexample
1082 strfmon (buf, 100, "@@%=*11#5n@@%=*11#5n@@%=*11#5n@@",
1083          123.45, -567.89, 12345.678);
1084 @end smallexample
1085
1086 @noindent
1087 The output is
1088
1089 @smallexample
1090 "@@ $***123.45@@-$***567.89@@ $12,456.68@@"
1091 @end smallexample
1092
1093 Here we can see that all the currency symbols are now aligned, and that
1094 the space between the currency sign and the number is filled with the
1095 selected fill character.  Note that although the width is selected to be
1096 @math{5} and @math{123.45} has three digits left of the decimal point,
1097 the space is filled with three asterisks.  This is correct since, as
1098 explained above, the width does not include the positions used to store
1099 thousands separators.  One last example should explain the remaining
1100 functionality.
1101
1102 @smallexample
1103 strfmon (buf, 100, "@@%=0(16#5.3i@@%=0(16#5.3i@@%=0(16#5.3i@@",
1104          123.45, -567.89, 12345.678);
1105 @end smallexample
1106
1107 @noindent
1108 This rather complex format string produces the following output:
1109
1110 @smallexample
1111 "@@ USD 000123,450 @@(USD 000567.890)@@ USD 12,345.678 @@"
1112 @end smallexample
1113
1114 The most noticeable change is the alternative way of representing
1115 negative numbers.  In financial circles this is often done using
1116 parentheses, and this is what the @samp{(} flag selected.  The fill
1117 character is now @samp{0}.  Note that this @samp{0} character is not
1118 regarded as a numeric zero, and therefore the first and second numbers
1119 are not printed using a thousands separator.  Since we used the format
1120 specifier @samp{i} instead of @samp{n}, the international form of the
1121 currency symbol is used.  This is a four letter string, in this case
1122 @code{"USD "}.  The last point is that since the precision right of the
1123 decimal point is selected to be three, the first and second numbers are
1124 printed with an extra zero at the end and the third number is printed
1125 without rounding.