1 .\" $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $
3 .\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4 .\" All rights reserved.
6 .\" This code is derived from software contributed to The NetBSD Foundation
7 .\" by Gregory McGarry.
9 .\" Redistribution and use in source and binary forms, with or without
10 .\" modification, are permitted provided that the following conditions
12 .\" 1. Redistributions of source code must retain the above copyright
13 .\" notice, this list of conditions and the following disclaimer.
14 .\" 2. Redistributions in binary form must reproduce the above copyright
15 .\" notice, this list of conditions and the following disclaimer in the
16 .\" documentation and/or other materials provided with the distribution.
17 .\" 3. All advertising materials mentioning features or use of this software
18 .\" must display the following acknowledgement:
19 .\" This product includes software developed by the NetBSD
20 .\" Foundation, Inc. and its contributors.
21 .\" 4. Neither the name of The NetBSD Foundation nor the names of its
22 .\" contributors may be used to endorse or promote products derived
23 .\" from this software without specific prior written permission.
25 .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27 .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28 .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29 .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32 .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33 .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35 .\" POSSIBILITY OF SUCH DAMAGE.
37 .\" $DragonFly: src/share/man/man7/nls.7,v 1.7 2008/05/02 02:05:06 swildner Exp $
44 .Nd Native Language Support Overview
46 Native Language Support (NLS) provides commands for a single
47 worldwide operating system base.
48 An internationalized system has no built-in assumptions or dependencies
49 on language-specific or cultural-specific conventions such as:
51 .Bl -bullet -offset indent -compact
53 Character classifications
55 Character comparison rules
57 Character collation order
59 Numeric and monetary formatting
61 Date and time formatting
68 All information pertaining to cultural conventions and language is
69 obtained at program run time.
71 .Dq Internationalization
74 refers to the operation by which system software is developed to support
75 multiple cultural-specific and language-specific conventions.
76 This is a generalization process by which the system is untied from
77 calling only English strings or other English-specific conventions.
81 refers to the operations by which the user environment is customized to
82 handle its input and output appropriate for specific language and cultural
84 This is a specialization process, by which generic methods already
85 implemented in an internationalized system are used in specific ways.
86 The formal description of cultural conventions for some country, together
87 with all associated translations targeted to the native language, is
92 provides extensive support to programmers and system developers to
93 enable internationalized software to be developed.
95 also supplies a large variety of locales for system localization.
96 .Ss Localization of Information
97 All locale information is accessible to programs at run time so that
98 data is processed and displayed correctly for specific cultural
99 conventions and language.
101 A locale is divided into categories.
102 A category is a group of language-specific and culture-specific conventions
103 as outlined in the list above.
104 ISO C specifies the following six standard categories supported by
107 .Bl -tag -compact -width LC_MONETARYXX
109 string-collation order information
111 character classification, case conversion, and other character attributes
113 the format for affirmative and negative responses
115 rules and symbols for formatting monetary numeric information
117 rules and symbols for formatting nonmonetary numeric information
119 rules and symbols for formatting time and date information
122 Localization of the system is achieved by setting appropriate values
123 in environment variables to identify which locale should be used.
124 The environment variables have the same names as their respective
131 environment variables are used.
134 environment variable specifies a colon-separated list of directory names
135 where the message catalog files of the NLS database are located.
140 environment variables also determine the current locale.
142 The values of these environment variables contains a string format as:
144 language[_territory][.codeset][@modifier]
147 Valid values for the language field come from the ISO639 standard which
148 defines two-character codes for many languages.
149 Some common language codes are:
152 .ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
153 \fILanguage Name\fP \fICode\fP \fILanguage Family\fP
154 .ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
156 ABKHAZIAN AB IBERO-CAUCASIAN
157 AFAN (OROMO) OM HAMITIC
159 AFRIKAANS AF GERMANIC
160 ALBANIAN SQ INDO-EUROPEAN (OTHER)
163 ARMENIAN HY INDO-EUROPEAN (OTHER)
166 AZERBAIJANI AZ TURKIC/ALTAIC
167 BASHKIR BA TURKIC/ALTAIC
176 BYELORUSSIAN BE SLAVIC
186 ESPERANTO EO INTERNATIONAL AUX.
187 ESTONIAN ET FINNO-UGRIC
189 FIJI FJ OCEANIC/INDONESIAN
190 FINNISH FI FINNO-UGRIC
194 GEORGIAN KA IBERO-CAUCASIAN
197 GREENLANDIC KL ESKIMO
198 GUARANI GN AMERINDIAN
200 HAUSA HA NEGRO-AFRICAN
203 HUNGARIAN HU FINNO-UGRIC
204 ICELANDIC IS GERMANIC
205 INDONESIAN ID OCEANIC/INDONESIAN
206 INTERLINGUA IA INTERNATIONAL AUX.
207 INTERLINGUE IE INTERNATIONAL AUX.
213 JAVANESE JV OCEANIC/INDONESIAN
216 KAZAKH KK TURKIC/ALTAIC
217 KINYARWANDA RW NEGRO-AFRICAN
218 KIRGHIZ KY TURKIC/ALTAIC
219 KURUNDI RN NEGRO-AFRICAN
225 LINGALA LN NEGRO-AFRICAN
228 MALAGASY MG OCEANIC/INDONESIAN
229 MALAY MS OCEANIC/INDONESIAN
230 MALAYALAM ML DRAVIDIAN
232 MAORI MI OCEANIC/INDONESIAN
238 NORWEGIAN NO GERMANIC
242 PERSIAN (farsi) FA IRANIAN
244 PORTUGUESE PT ROMANCE
246 QUECHUA QU AMERINDIAN
247 RHAETO-ROMANCE RM ROMANCE
250 SAMOAN SM OCEANIC/INDONESIAN
251 SANGHO SG NEGRO-AFRICAN
253 SCOTS GAELIC GD CELTIC
255 SERBO-CROATIAN SH SLAVIC
256 SESOTHO ST NEGRO-AFRICAN
257 SETSWANA TN NEGRO-AFRICAN
258 SHONA SN NEGRO-AFRICAN
261 SISWATI SS NEGRO-AFRICAN
266 SUNDANESE SU OCEANIC/INDONESIAN
267 SWAHILI SW NEGRO-AFRICAN
269 TAGALOG TL OCEANIC/INDONESIAN
272 TATAR TT TURKIC/ALTAIC
277 TONGA TO OCEANIC/INDONESIAN
278 TSONGA TS NEGRO-AFRICAN
279 TURKISH TR TURKIC/ALTAIC
280 TURKMEN TK TURKIC/ALTAIC
285 UZBEK UZ TURKIC/ALTAIC
287 VOLAPUK VO INTERNATIONAL AUX.
289 WOLOF WO NEGRO-AFRICAN
290 XHOSA XH NEGRO-AFRICAN
292 YORUBA YO NEGRO-AFRICAN
294 ZULU ZU NEGRO-AFRICAN
298 For example, the locale for the Danish language spoken in Denmark
299 using the ISO8859-1 character set is da_DK.ISO8859-1.
300 The da stands for the Danish language and the DK stands for Denmark.
301 The short form of da_DK is sufficient to indicate this locale.
303 The environment variable settings are queried by their priority level
304 in the following manner:
309 environment variable is set, all six categories use the locale it
314 environment variable is not set, each individual category uses the
315 locale specified by its corresponding environment variable.
319 environment variable is not set, and a value for a particular
321 environment variable is not set, the value of the
323 environment variable specifies the default locale for all categories.
326 environment variable should be set in /etc/profile, since it makes it
327 most easy for the user to override the system default using the individual
333 environment variable is not set, a value for a particular
335 environment variable is not set, and the value of the
337 environment variable is not set, the locale for that specific
338 category defaults to the C locale.
339 The C or POSIX locale assumes the 7-bit ASCII character set and defines
340 information for the six categories.
343 A character is any symbol used for the organization, control, or
344 representation of data.
345 A group of such symbols used to describe a
346 particular language make up a character set.
347 It is the encoding values in a character set that provide
348 the interface between the system and its input and output devices.
350 The following character sets are supported in
352 .Bl -tag -width ISO8859_family
354 Industry-standard character sets are provided by means of the ISO8859
355 family of character sets, which provide a range of single-byte character set
356 support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
358 The eucJP character set is the industry-standard character set used to support
361 A Unicode environment based on the UTF-8 character set is supported for all
362 supported language/territories.
363 UTF-8 provides character support for most of the major languages of the
364 world and can be used in environments where multiple languages must be
365 processed simultaneously.
368 A font set contains the glyphs to be displayed on the screen for a
369 corresponding character in a character set.
370 A display must support a suitable font to display a character set.
371 If suitable fonts are available to the X server, then X clients can
372 include support for different character sets.
374 includes support for UTF-8 character sets.
376 is useful for displaying all the characters in an X font.
381 console provides support for loading a variety of fonts using the
383 utility. Available fonts can be found in
384 .Pa /usr/share/syscons/fonts .
385 .Ss Internationalization for Programmers
386 To facilitate translations of messages into various languages and to
387 make the translated messages available to the program based on a
388 user's locale, it is necessary to keep messages separate from the
389 programs and provide them in the form of message catalogs that a
390 program can access at run time.
392 Access to locale information is provided through the
397 See their respective man pages for further information.
399 Message source files containing application messages are created by
400 the programmer and converted to message catalogs.
401 These catalogs are used by the application to retrieve and display
405 supports two message catalog interfaces: the X/Open
407 interface and the Uniforum
412 interface has the advantage that it belongs to a standard which is
414 Unfortunately the interface is complicated to use and
415 maintenance of the catalogs is difficult.
416 The implementation also doesn't support different character sets.
419 interface has not been standardized yet, however it is being supported
420 by an increasing number of systems.
421 It also provides many additional tools which make programming and
422 catalog maintenance much easier.
423 .Ss Support for Multibyte Characters and Wide Characters
424 Character sets with multibyte characters may be difficult to decode, or may
425 contain state (i.e., adjacent characters are dependent).
426 ISO C specifies a set of functions using 'wide characters' which can handle
427 multibyte characters properly.
428 A wide character is specified in ISO C
429 as being a fixed number of bits wide and is stateless.
431 There are two types for wide characters:
436 is a type which can contain one wide character and operates like 'char'
437 type does for one character.
439 can contain one wide character or WEOF (wide EOF).
441 There are functions that operate on
443 and substitute for functions operating on 'char'.
449 There are some additional functions that operate on
457 Wide characters should be used for all I/O processing which may rely
458 on locale-specific strings.
459 The two primary issues requiring special use of wide characters are:
460 .Bl -bullet -offset indent
462 All I/O is performed using multibyte characters.
463 Input data is converted into wide characters immediately after
464 reading and data for output is converted from wide characters to
465 multibyte characters immediately before writing.
466 Conversion is achieved using
476 Wide characters are used directly for I/O, using
487 They are also used for formatted I/O functions for wide characters
499 and wide character identifier of %lc, %C, %ls, %S for conventional
500 formatted I/O functions.
508 .Xr gettext 3 Pq Pa pkgsrc/devel/gettext ,
512 This man page is incomplete.