2 :mod:`gettext` --- Multilingual internationalization services
3 =============================================================
6 :synopsis: Multilingual internationalization services.
7 .. moduleauthor:: Barry A. Warsaw <barry@zope.com>
8 .. sectionauthor:: Barry A. Warsaw <barry@zope.com>
11 The :mod:`gettext` module provides internationalization (I18N) and localization
12 (L10N) services for your Python modules and applications. It supports both the
13 GNU ``gettext`` message catalog API and a higher level, class-based API that may
14 be more appropriate for Python files. The interface described below allows you
15 to write your module and application messages in one natural language, and
16 provide a catalog of translated messages for running under different natural
19 Some hints on localizing your Python modules and applications are also given.
22 GNU :program:`gettext` API
23 --------------------------
25 The :mod:`gettext` module defines the following API, which is very similar to
26 the GNU :program:`gettext` API. If you use this API you will affect the
27 translation of your entire application globally. Often this is what you want if
28 your application is monolingual, with the choice of language dependent on the
29 locale of your user. If you are localizing a Python module, or if your
30 application needs to switch languages on the fly, you probably want to use the
31 class-based API instead.
34 .. function:: bindtextdomain(domain[, localedir])
36 Bind the *domain* to the locale directory *localedir*. More concretely,
37 :mod:`gettext` will look for binary :file:`.mo` files for the given domain using
38 the path (on Unix): :file:`localedir/language/LC_MESSAGES/domain.mo`, where
39 *languages* is searched for in the environment variables :envvar:`LANGUAGE`,
40 :envvar:`LC_ALL`, :envvar:`LC_MESSAGES`, and :envvar:`LANG` respectively.
42 If *localedir* is omitted or ``None``, then the current binding for *domain* is
46 .. function:: bind_textdomain_codeset(domain[, codeset])
48 Bind the *domain* to *codeset*, changing the encoding of strings returned by the
49 :func:`gettext` family of functions. If *codeset* is omitted, then the current
55 .. function:: textdomain([domain])
57 Change or query the current global domain. If *domain* is ``None``, then the
58 current global domain is returned, otherwise the global domain is set to
59 *domain*, which is returned.
62 .. function:: gettext(message)
64 Return the localized translation of *message*, based on the current global
65 domain, language, and locale directory. This function is usually aliased as
66 :func:`_` in the local namespace (see examples below).
69 .. function:: lgettext(message)
71 Equivalent to :func:`gettext`, but the translation is returned in the preferred
72 system encoding, if no other encoding was explicitly set with
73 :func:`bind_textdomain_codeset`.
78 .. function:: dgettext(domain, message)
80 Like :func:`gettext`, but look the message up in the specified *domain*.
83 .. function:: ldgettext(domain, message)
85 Equivalent to :func:`dgettext`, but the translation is returned in the preferred
86 system encoding, if no other encoding was explicitly set with
87 :func:`bind_textdomain_codeset`.
92 .. function:: ngettext(singular, plural, n)
94 Like :func:`gettext`, but consider plural forms. If a translation is found,
95 apply the plural formula to *n*, and return the resulting message (some
96 languages have more than two plural forms). If no translation is found, return
97 *singular* if *n* is 1; return *plural* otherwise.
99 The Plural formula is taken from the catalog header. It is a C or Python
100 expression that has a free variable *n*; the expression evaluates to the index
101 of the plural in the catalog. See the GNU gettext documentation for the precise
102 syntax to be used in :file:`.po` files and the formulas for a variety of
105 .. versionadded:: 2.3
108 .. function:: lngettext(singular, plural, n)
110 Equivalent to :func:`ngettext`, but the translation is returned in the preferred
111 system encoding, if no other encoding was explicitly set with
112 :func:`bind_textdomain_codeset`.
114 .. versionadded:: 2.4
117 .. function:: dngettext(domain, singular, plural, n)
119 Like :func:`ngettext`, but look the message up in the specified *domain*.
121 .. versionadded:: 2.3
124 .. function:: ldngettext(domain, singular, plural, n)
126 Equivalent to :func:`dngettext`, but the translation is returned in the
127 preferred system encoding, if no other encoding was explicitly set with
128 :func:`bind_textdomain_codeset`.
130 .. versionadded:: 2.4
132 Note that GNU :program:`gettext` also defines a :func:`dcgettext` method, but
133 this was deemed not useful and so it is currently unimplemented.
135 Here's an example of typical usage for this API::
138 gettext.bindtextdomain('myapplication', '/path/to/my/language/directory')
139 gettext.textdomain('myapplication')
142 print _('This is a translatable string.')
148 The class-based API of the :mod:`gettext` module gives you more flexibility and
149 greater convenience than the GNU :program:`gettext` API. It is the recommended
150 way of localizing your Python applications and modules. :mod:`gettext` defines
151 a "translations" class which implements the parsing of GNU :file:`.mo` format
152 files, and has methods for returning either standard 8-bit strings or Unicode
153 strings. Instances of this "translations" class can also install themselves in
154 the built-in namespace as the function :func:`_`.
157 .. function:: find(domain[, localedir[, languages[, all]]])
159 This function implements the standard :file:`.mo` file search algorithm. It
160 takes a *domain*, identical to what :func:`textdomain` takes. Optional
161 *localedir* is as in :func:`bindtextdomain` Optional *languages* is a list of
162 strings, where each string is a language code.
164 If *localedir* is not given, then the default system locale directory is used.
165 [#]_ If *languages* is not given, then the following environment variables are
166 searched: :envvar:`LANGUAGE`, :envvar:`LC_ALL`, :envvar:`LC_MESSAGES`, and
167 :envvar:`LANG`. The first one returning a non-empty value is used for the
168 *languages* variable. The environment variables should contain a colon separated
169 list of languages, which will be split on the colon to produce the expected list
170 of language code strings.
172 :func:`find` then expands and normalizes the languages, and then iterates
173 through them, searching for an existing file built of these components:
175 :file:`localedir/language/LC_MESSAGES/domain.mo`
177 The first such file name that exists is returned by :func:`find`. If no such
178 file is found, then ``None`` is returned. If *all* is given, it returns a list
179 of all file names, in the order in which they appear in the languages list or
180 the environment variables.
183 .. function:: translation(domain[, localedir[, languages[, class_[, fallback[, codeset]]]]])
185 Return a :class:`Translations` instance based on the *domain*, *localedir*, and
186 *languages*, which are first passed to :func:`find` to get a list of the
187 associated :file:`.mo` file paths. Instances with identical :file:`.mo` file
188 names are cached. The actual class instantiated is either *class_* if provided,
189 otherwise :class:`GNUTranslations`. The class's constructor must take a single
190 file object argument. If provided, *codeset* will change the charset used to
191 encode translated strings.
193 If multiple files are found, later files are used as fallbacks for earlier ones.
194 To allow setting the fallback, :func:`copy.copy` is used to clone each
195 translation object from the cache; the actual instance data is still shared with
198 If no :file:`.mo` file is found, this function raises :exc:`IOError` if
199 *fallback* is false (which is the default), and returns a
200 :class:`NullTranslations` instance if *fallback* is true.
202 .. versionchanged:: 2.4
203 Added the *codeset* parameter.
206 .. function:: install(domain[, localedir[, unicode [, codeset[, names]]]])
208 This installs the function :func:`_` in Python's builtins namespace, based on
209 *domain*, *localedir*, and *codeset* which are passed to the function
210 :func:`translation`. The *unicode* flag is passed to the resulting translation
211 object's :meth:`~NullTranslations.install` method.
213 For the *names* parameter, please see the description of the translation
214 object's :meth:`~NullTranslations.install` method.
216 As seen below, you usually mark the strings in your application that are
217 candidates for translation, by wrapping them in a call to the :func:`_`
218 function, like this::
220 print _('This string will be translated.')
222 For convenience, you want the :func:`_` function to be installed in Python's
223 builtins namespace, so it is easily accessible in all modules of your
226 .. versionchanged:: 2.4
227 Added the *codeset* parameter.
229 .. versionchanged:: 2.5
230 Added the *names* parameter.
233 The :class:`NullTranslations` class
234 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
236 Translation classes are what actually implement the translation of original
237 source file message strings to translated message strings. The base class used
238 by all translation classes is :class:`NullTranslations`; this provides the basic
239 interface you can use to write your own specialized translation classes. Here
240 are the methods of :class:`NullTranslations`:
243 .. class:: NullTranslations([fp])
245 Takes an optional file object *fp*, which is ignored by the base class.
246 Initializes "protected" instance variables *_info* and *_charset* which are set
247 by derived classes, as well as *_fallback*, which is set through
248 :meth:`add_fallback`. It then calls ``self._parse(fp)`` if *fp* is not
252 .. method:: _parse(fp)
254 No-op'd in the base class, this method takes file object *fp*, and reads
255 the data from the file, initializing its message catalog. If you have an
256 unsupported message catalog file format, you should override this method
257 to parse your format.
260 .. method:: add_fallback(fallback)
262 Add *fallback* as the fallback object for the current translation
263 object. A translation object should consult the fallback if it cannot provide a
264 translation for a given message.
267 .. method:: gettext(message)
269 If a fallback has been set, forward :meth:`gettext` to the
270 fallback. Otherwise, return the translated message. Overridden in derived
274 .. method:: lgettext(message)
276 If a fallback has been set, forward :meth:`lgettext` to the
277 fallback. Otherwise, return the translated message. Overridden in derived
280 .. versionadded:: 2.4
283 .. method:: ugettext(message)
285 If a fallback has been set, forward :meth:`ugettext` to the
286 fallback. Otherwise, return the translated message as a Unicode
287 string. Overridden in derived classes.
290 .. method:: ngettext(singular, plural, n)
292 If a fallback has been set, forward :meth:`ngettext` to the
293 fallback. Otherwise, return the translated message. Overridden in derived
296 .. versionadded:: 2.3
299 .. method:: lngettext(singular, plural, n)
301 If a fallback has been set, forward :meth:`ngettext` to the
302 fallback. Otherwise, return the translated message. Overridden in derived
305 .. versionadded:: 2.4
308 .. method:: ungettext(singular, plural, n)
310 If a fallback has been set, forward :meth:`ungettext` to the fallback.
311 Otherwise, return the translated message as a Unicode string. Overridden
314 .. versionadded:: 2.3
319 Return the "protected" :attr:`_info` variable.
322 .. method:: charset()
324 Return the "protected" :attr:`_charset` variable.
327 .. method:: output_charset()
329 Return the "protected" :attr:`_output_charset` variable, which defines the
330 encoding used to return translated messages.
332 .. versionadded:: 2.4
335 .. method:: set_output_charset(charset)
337 Change the "protected" :attr:`_output_charset` variable, which defines the
338 encoding used to return translated messages.
340 .. versionadded:: 2.4
343 .. method:: install([unicode [, names]])
345 If the *unicode* flag is false, this method installs :meth:`self.gettext`
346 into the built-in namespace, binding it to ``_``. If *unicode* is true,
347 it binds :meth:`self.ugettext` instead. By default, *unicode* is false.
349 If the *names* parameter is given, it must be a sequence containing the
350 names of functions you want to install in the builtins namespace in
351 addition to :func:`_`. Supported names are ``'gettext'`` (bound to
352 :meth:`self.gettext` or :meth:`self.ugettext` according to the *unicode*
353 flag), ``'ngettext'`` (bound to :meth:`self.ngettext` or
354 :meth:`self.ungettext` according to the *unicode* flag), ``'lgettext'``
357 Note that this is only one way, albeit the most convenient way, to make
358 the :func:`_` function available to your application. Because it affects
359 the entire application globally, and specifically the built-in namespace,
360 localized modules should never install :func:`_`. Instead, they should use
361 this code to make :func:`_` available to their module::
364 t = gettext.translation('mymodule', ...)
367 This puts :func:`_` only in the module's global namespace and so only
368 affects calls within this module.
370 .. versionchanged:: 2.5
371 Added the *names* parameter.
374 The :class:`GNUTranslations` class
375 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
377 The :mod:`gettext` module provides one additional class derived from
378 :class:`NullTranslations`: :class:`GNUTranslations`. This class overrides
379 :meth:`_parse` to enable reading GNU :program:`gettext` format :file:`.mo` files
380 in both big-endian and little-endian format. It also coerces both message ids
381 and message strings to Unicode.
383 :class:`GNUTranslations` parses optional meta-data out of the translation
384 catalog. It is convention with GNU :program:`gettext` to include meta-data as
385 the translation for the empty string. This meta-data is in :rfc:`822`\ -style
386 ``key: value`` pairs, and should contain the ``Project-Id-Version`` key. If the
387 key ``Content-Type`` is found, then the ``charset`` property is used to
388 initialize the "protected" :attr:`_charset` instance variable, defaulting to
389 ``None`` if not found. If the charset encoding is specified, then all message
390 ids and message strings read from the catalog are converted to Unicode using
391 this encoding. The :meth:`ugettext` method always returns a Unicode, while the
392 :meth:`gettext` returns an encoded 8-bit string. For the message id arguments
393 of both methods, either Unicode strings or 8-bit strings containing only
394 US-ASCII characters are acceptable. Note that the Unicode version of the
395 methods (i.e. :meth:`ugettext` and :meth:`ungettext`) are the recommended
396 interface to use for internationalized Python programs.
398 The entire set of key/value pairs are placed into a dictionary and set as the
399 "protected" :attr:`_info` instance variable.
401 If the :file:`.mo` file's magic number is invalid, or if other problems occur
402 while reading the file, instantiating a :class:`GNUTranslations` class can raise
405 The following methods are overridden from the base class implementation:
408 .. method:: GNUTranslations.gettext(message)
410 Look up the *message* id in the catalog and return the corresponding message
411 string, as an 8-bit string encoded with the catalog's charset encoding, if
412 known. If there is no entry in the catalog for the *message* id, and a fallback
413 has been set, the look up is forwarded to the fallback's :meth:`gettext` method.
414 Otherwise, the *message* id is returned.
417 .. method:: GNUTranslations.lgettext(message)
419 Equivalent to :meth:`gettext`, but the translation is returned in the preferred
420 system encoding, if no other encoding was explicitly set with
421 :meth:`set_output_charset`.
423 .. versionadded:: 2.4
426 .. method:: GNUTranslations.ugettext(message)
428 Look up the *message* id in the catalog and return the corresponding message
429 string, as a Unicode string. If there is no entry in the catalog for the
430 *message* id, and a fallback has been set, the look up is forwarded to the
431 fallback's :meth:`ugettext` method. Otherwise, the *message* id is returned.
434 .. method:: GNUTranslations.ngettext(singular, plural, n)
436 Do a plural-forms lookup of a message id. *singular* is used as the message id
437 for purposes of lookup in the catalog, while *n* is used to determine which
438 plural form to use. The returned message string is an 8-bit string encoded with
439 the catalog's charset encoding, if known.
441 If the message id is not found in the catalog, and a fallback is specified, the
442 request is forwarded to the fallback's :meth:`ngettext` method. Otherwise, when
443 *n* is 1 *singular* is returned, and *plural* is returned in all other cases.
445 .. versionadded:: 2.3
448 .. method:: GNUTranslations.lngettext(singular, plural, n)
450 Equivalent to :meth:`gettext`, but the translation is returned in the preferred
451 system encoding, if no other encoding was explicitly set with
452 :meth:`set_output_charset`.
454 .. versionadded:: 2.4
457 .. method:: GNUTranslations.ungettext(singular, plural, n)
459 Do a plural-forms lookup of a message id. *singular* is used as the message id
460 for purposes of lookup in the catalog, while *n* is used to determine which
461 plural form to use. The returned message string is a Unicode string.
463 If the message id is not found in the catalog, and a fallback is specified, the
464 request is forwarded to the fallback's :meth:`ungettext` method. Otherwise,
465 when *n* is 1 *singular* is returned, and *plural* is returned in all other
470 n = len(os.listdir('.'))
471 cat = GNUTranslations(somefile)
472 message = cat.ungettext(
473 'There is %(num)d file in this directory',
474 'There are %(num)d files in this directory',
477 .. versionadded:: 2.3
480 Solaris message catalog support
481 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
483 The Solaris operating system defines its own binary :file:`.mo` file format, but
484 since no documentation can be found on this format, it is not supported at this
488 The Catalog constructor
489 ^^^^^^^^^^^^^^^^^^^^^^^
491 .. index:: single: GNOME
493 GNOME uses a version of the :mod:`gettext` module by James Henstridge, but this
494 version has a slightly different API. Its documented usage was::
497 cat = gettext.Catalog(domain, localedir)
499 print _('hello world')
501 For compatibility with this older module, the function :func:`Catalog` is an
502 alias for the :func:`translation` function described above.
504 One difference between this module and Henstridge's: his catalog objects
505 supported access through a mapping API, but this appears to be unused and so is
506 not currently supported.
509 Internationalizing your programs and modules
510 --------------------------------------------
512 Internationalization (I18N) refers to the operation by which a program is made
513 aware of multiple languages. Localization (L10N) refers to the adaptation of
514 your program, once internationalized, to the local language and cultural habits.
515 In order to provide multilingual messages for your Python programs, you need to
516 take the following steps:
518 #. prepare your program or module by specially marking translatable strings
520 #. run a suite of tools over your marked files to generate raw messages catalogs
522 #. create language specific translations of the message catalogs
524 #. use the :mod:`gettext` module so that message strings are properly translated
526 In order to prepare your code for I18N, you need to look at all the strings in
527 your files. Any string that needs to be translated should be marked by wrapping
528 it in ``_('...')`` --- that is, a call to the function :func:`_`. For example::
530 filename = 'mylog.txt'
531 message = _('writing a log message')
532 fp = open(filename, 'w')
536 In this example, the string ``'writing a log message'`` is marked as a candidate
537 for translation, while the strings ``'mylog.txt'`` and ``'w'`` are not.
539 The Python distribution comes with two tools which help you generate the message
540 catalogs once you've prepared your source code. These may or may not be
541 available from a binary distribution, but they can be found in a source
542 distribution, in the :file:`Tools/i18n` directory.
544 The :program:`pygettext` [#]_ program scans all your Python source code looking
545 for the strings you previously marked as translatable. It is similar to the GNU
546 :program:`gettext` program except that it understands all the intricacies of
547 Python source code, but knows nothing about C or C++ source code. You don't
548 need GNU ``gettext`` unless you're also going to be translating C code (such as
549 C extension modules).
551 :program:`pygettext` generates textual Uniforum-style human readable message
552 catalog :file:`.pot` files, essentially structured human readable files which
553 contain every marked string in the source code, along with a placeholder for the
554 translation strings. :program:`pygettext` is a command line script that supports
555 a similar command line interface as :program:`xgettext`; for details on its use,
560 Copies of these :file:`.pot` files are then handed over to the individual human
561 translators who write language-specific versions for every supported natural
562 language. They send you back the filled in language-specific versions as a
563 :file:`.po` file. Using the :program:`msgfmt.py` [#]_ program (in the
564 :file:`Tools/i18n` directory), you take the :file:`.po` files from your
565 translators and generate the machine-readable :file:`.mo` binary catalog files.
566 The :file:`.mo` files are what the :mod:`gettext` module uses for the actual
567 translation processing during run-time.
569 How you use the :mod:`gettext` module in your code depends on whether you are
570 internationalizing a single module or your entire application. The next two
571 sections will discuss each case.
574 Localizing your module
575 ^^^^^^^^^^^^^^^^^^^^^^
577 If you are localizing your module, you must take care not to make global
578 changes, e.g. to the built-in namespace. You should not use the GNU ``gettext``
579 API but instead the class-based API.
581 Let's say your module is called "spam" and the module's various natural language
582 translation :file:`.mo` files reside in :file:`/usr/share/locale` in GNU
583 :program:`gettext` format. Here's what you would put at the top of your
587 t = gettext.translation('spam', '/usr/share/locale')
590 If your translators were providing you with Unicode strings in their :file:`.po`
591 files, you'd instead do::
594 t = gettext.translation('spam', '/usr/share/locale')
598 Localizing your application
599 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
601 If you are localizing your application, you can install the :func:`_` function
602 globally into the built-in namespace, usually in the main driver file of your
603 application. This will let all your application-specific files just use
604 ``_('...')`` without having to explicitly install it in each file.
606 In the simple case then, you need only add the following bit of code to the main
607 driver file of your application::
610 gettext.install('myapplication')
612 If you need to set the locale directory or the *unicode* flag, you can pass
613 these into the :func:`install` function::
616 gettext.install('myapplication', '/usr/share/locale', unicode=1)
619 Changing languages on the fly
620 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
622 If your program needs to support many languages at the same time, you may want
623 to create multiple translation instances and then switch between them
624 explicitly, like so::
628 lang1 = gettext.translation('myapplication', languages=['en'])
629 lang2 = gettext.translation('myapplication', languages=['fr'])
630 lang3 = gettext.translation('myapplication', languages=['de'])
632 # start by using language1
635 # ... time goes by, user selects language 2
638 # ... more time goes by, user selects language 3
642 Deferred translations
643 ^^^^^^^^^^^^^^^^^^^^^
645 In most coding situations, strings are translated where they are coded.
646 Occasionally however, you need to mark strings for translation, but defer actual
647 translation until later. A classic example is::
649 animals = ['mollusk',
658 Here, you want to mark the strings in the ``animals`` list as being
659 translatable, but you don't actually want to translate them until they are
662 Here is one way you can handle this situation::
664 def _(message): return message
666 animals = [_('mollusk'),
678 This works because the dummy definition of :func:`_` simply returns the string
679 unchanged. And this dummy definition will temporarily override any definition
680 of :func:`_` in the built-in namespace (until the :keyword:`del` command). Take
681 care, though if you have a previous definition of :func:`_` in the local
684 Note that the second use of :func:`_` will not identify "a" as being
685 translatable to the :program:`pygettext` program, since it is not a string.
687 Another way to handle this is with the following example::
689 def N_(message): return message
691 animals = [N_('mollusk'),
701 In this case, you are marking translatable strings with the function :func:`N_`,
702 [#]_ which won't conflict with any definition of :func:`_`. However, you will
703 need to teach your message extraction program to look for translatable strings
704 marked with :func:`N_`. :program:`pygettext` and :program:`xpot` both support
705 this through the use of command line switches.
708 :func:`gettext` vs. :func:`lgettext`
709 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
711 In Python 2.4 the :func:`lgettext` family of functions were introduced. The
712 intention of these functions is to provide an alternative which is more
713 compliant with the current implementation of GNU gettext. Unlike
714 :func:`gettext`, which returns strings encoded with the same codeset used in the
715 translation file, :func:`lgettext` will return strings encoded with the
716 preferred system encoding, as returned by :func:`locale.getpreferredencoding`.
717 Also notice that Python 2.4 introduces new functions to explicitly choose the
718 codeset used in translated strings. If a codeset is explicitly set, even
719 :func:`lgettext` will return translated strings in the requested codeset, as
720 would be expected in the GNU gettext implementation.
726 The following people contributed code, feedback, design suggestions, previous
727 implementations, and valuable experience to the creation of this module:
733 * Juan David Ibáñez Palomar
745 .. rubric:: Footnotes
747 .. [#] The default locale directory is system dependent; for example, on RedHat Linux
748 it is :file:`/usr/share/locale`, but on Solaris it is :file:`/usr/lib/locale`.
749 The :mod:`gettext` module does not try to support these system dependent
750 defaults; instead its default is :file:`sys.prefix/share/locale`. For this
751 reason, it is always best to call :func:`bindtextdomain` with an explicit
752 absolute path at the start of your application.
754 .. [#] See the footnote for :func:`bindtextdomain` above.
756 .. [#] François Pinard has written a program called :program:`xpot` which does a
757 similar job. It is available as part of his :program:`po-utils` package at http
758 ://po-utils.progiciels-bpi.ca/.
760 .. [#] :program:`msgfmt.py` is binary compatible with GNU :program:`msgfmt` except that
761 it provides a simpler, all-Python implementation. With this and
762 :program:`pygettext.py`, you generally won't need to install the GNU
763 :program:`gettext` package to internationalize your Python applications.
765 .. [#] The choice of :func:`N_` here is totally arbitrary; it could have just as easily
766 been :func:`MarkThisStringForTranslation`.