Install gettext-0.18.1.1.tar.gz
[msysgit.git] / mingw / share / doc / gettext / gettext_15.html
blob8ab1e4c673660368cb65e466f9b6458c052ae2fc
1 <HTML>
2 <HEAD>
3 <!-- This HTML file has been created by texi2html 1.52b
4 from gettext.texi on 6 June 2010 -->
6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7 <TITLE>GNU gettext utilities - 15 Other Programming Languages</TITLE>
8 </HEAD>
9 <BODY>
10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_14.html">previous</A>, <A HREF="gettext_16.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11 <P><HR><P>
14 <H1><A NAME="SEC245" HREF="gettext_toc.html#TOC245">15 Other Programming Languages</A></H1>
16 <P>
17 While the presentation of <CODE>gettext</CODE> focuses mostly on C and
18 implicitly applies to C++ as well, its scope is far broader than that:
19 Many programming languages, scripting languages and other textual data
20 like GUI resources or package descriptions can make use of the gettext
21 approach.
23 </P>
27 <H2><A NAME="SEC246" HREF="gettext_toc.html#TOC246">15.1 The Language Implementor's View</A></H2>
28 <P>
29 <A NAME="IDX1170"></A>
30 <A NAME="IDX1171"></A>
32 </P>
33 <P>
34 All programming and scripting languages that have the notion of strings
35 are eligible to supporting <CODE>gettext</CODE>. Supporting <CODE>gettext</CODE>
36 means the following:
38 </P>
40 <OL>
41 <LI>
43 You should add to the language a syntax for translatable strings. In
44 principle, a function call of <CODE>gettext</CODE> would do, but a shorthand
45 syntax helps keeping the legibility of internationalized programs. For
46 example, in C we use the syntax <CODE>_("string")</CODE>, and in GNU awk we use
47 the shorthand <CODE>_"string"</CODE>.
49 <LI>
51 You should arrange that evaluation of such a translatable string at
52 runtime calls the <CODE>gettext</CODE> function, or performs equivalent
53 processing.
55 <LI>
57 Similarly, you should make the functions <CODE>ngettext</CODE>,
58 <CODE>dcgettext</CODE>, <CODE>dcngettext</CODE> available from within the language.
59 These functions are less often used, but are nevertheless necessary for
60 particular purposes: <CODE>ngettext</CODE> for correct plural handling, and
61 <CODE>dcgettext</CODE> and <CODE>dcngettext</CODE> for obeying other locale-related
62 environment variables than <CODE>LC_MESSAGES</CODE>, such as <CODE>LC_TIME</CODE> or
63 <CODE>LC_MONETARY</CODE>. For these latter functions, you need to make the
64 <CODE>LC_*</CODE> constants, available in the C header <CODE>&#60;locale.h&#62;</CODE>,
65 referenceable from within the language, usually either as enumeration
66 values or as strings.
68 <LI>
70 You should allow the programmer to designate a message domain, either by
71 making the <CODE>textdomain</CODE> function available from within the
72 language, or by introducing a magic variable called <CODE>TEXTDOMAIN</CODE>.
73 Similarly, you should allow the programmer to designate where to search
74 for message catalogs, by providing access to the <CODE>bindtextdomain</CODE>
75 function.
77 <LI>
79 You should either perform a <CODE>setlocale (LC_ALL, "")</CODE> call during
80 the startup of your language runtime, or allow the programmer to do so.
81 Remember that gettext will act as a no-op if the <CODE>LC_MESSAGES</CODE> and
82 <CODE>LC_CTYPE</CODE> locale categories are not both set.
84 <LI>
86 A programmer should have a way to extract translatable strings from a
87 program into a PO file. The GNU <CODE>xgettext</CODE> program is being
88 extended to support very different programming languages. Please
89 contact the GNU <CODE>gettext</CODE> maintainers to help them doing this. If
90 the string extractor is best integrated into your language's parser, GNU
91 <CODE>xgettext</CODE> can function as a front end to your string extractor.
93 <LI>
95 The language's library should have a string formatting facility where
96 the arguments of a format string are denoted by a positional number or a
97 name. This is needed because for some languages and some messages with
98 more than one substitutable argument, the translation will need to
99 output the substituted arguments in different order. See section <A HREF="gettext_4.html#SEC22">4.6 Special Comments preceding Keywords</A>.
101 <LI>
103 If the language has more than one implementation, and not all of the
104 implementations use <CODE>gettext</CODE>, but the programs should be portable
105 across implementations, you should provide a no-i18n emulation, that
106 makes the other implementations accept programs written for yours,
107 without actually translating the strings.
109 <LI>
111 To help the programmer in the task of marking translatable strings,
112 which is sometimes performed using the Emacs PO mode (see section <A HREF="gettext_4.html#SEC21">4.5 Marking Translatable Strings</A>),
113 you are welcome to
114 contact the GNU <CODE>gettext</CODE> maintainers, so they can add support for
115 your language to <TT>&lsquo;po-mode.el&rsquo;</TT>.
116 </OL>
119 On the implementation side, three approaches are possible, with
120 different effects on portability and copyright:
122 </P>
124 <UL>
125 <LI>
127 You may integrate the GNU <CODE>gettext</CODE>'s <TT>&lsquo;intl/&rsquo;</TT> directory in
128 your package, as described in section <A HREF="gettext_13.html#SEC211">13 The Maintainer's View</A>. This allows you to
129 have internationalization on all kinds of platforms. Note that when you
130 then distribute your package, it legally falls under the GNU General
131 Public License, and the GNU project will be glad about your contribution
132 to the Free Software pool.
134 <LI>
136 You may link against GNU <CODE>gettext</CODE> functions if they are found in
137 the C library. For example, an autoconf test for <CODE>gettext()</CODE> and
138 <CODE>ngettext()</CODE> will detect this situation. For the moment, this test
139 will succeed on GNU systems and not on other platforms. No severe
140 copyright restrictions apply.
142 <LI>
144 You may emulate or reimplement the GNU <CODE>gettext</CODE> functionality.
145 This has the advantage of full portability and no copyright
146 restrictions, but also the drawback that you have to reimplement the GNU
147 <CODE>gettext</CODE> features (such as the <CODE>LANGUAGE</CODE> environment
148 variable, the locale aliases database, the automatic charset conversion,
149 and plural handling).
150 </UL>
154 <H2><A NAME="SEC247" HREF="gettext_toc.html#TOC247">15.2 The Programmer's View</A></H2>
157 For the programmer, the general procedure is the same as for the C
158 language. The Emacs PO mode marking supports other languages, and the GNU
159 <CODE>xgettext</CODE> string extractor recognizes other languages based on the
160 file extension or a command-line option. In some languages,
161 <CODE>setlocale</CODE> is not needed because it is already performed by the
162 underlying language runtime.
164 </P>
167 <H2><A NAME="SEC248" HREF="gettext_toc.html#TOC248">15.3 The Translator's View</A></H2>
170 The translator works exactly as in the C language case. The only
171 difference is that when translating format strings, she has to be aware
172 of the language's particular syntax for positional arguments in format
173 strings.
175 </P>
179 <H3><A NAME="SEC249" HREF="gettext_toc.html#TOC249">15.3.1 C Format Strings</A></H3>
182 C format strings are described in POSIX (IEEE P1003.1 2001), section
183 XSH 3 fprintf(),
184 <A HREF="http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html">http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html</A>.
185 See also the fprintf() manual page,
186 <A HREF="http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php">http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php</A>,
187 <A HREF="http://informatik.fh-wuerzburg.de/student/i510/man/printf.html">http://informatik.fh-wuerzburg.de/student/i510/man/printf.html</A>.
189 </P>
191 Although format strings with positions that reorder arguments, such as
193 </P>
195 <PRE>
196 "Only %2$d bytes free on '%1$s'."
197 </PRE>
200 which is semantically equivalent to
202 </P>
204 <PRE>
205 "'%s' has only %d bytes free."
206 </PRE>
209 are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
210 on this reordering ability: On the few platforms where <CODE>printf()</CODE>,
211 <CODE>fprintf()</CODE> etc. don't support this feature natively, <TT>&lsquo;libintl.a&rsquo;</TT>
212 or <TT>&lsquo;libintl.so&rsquo;</TT> provides replacement functions, and GNU <CODE>&#60;libintl.h&#62;</CODE>
213 activates these replacement functions automatically.
215 </P>
217 <A NAME="IDX1172"></A>
218 <A NAME="IDX1173"></A>
219 As a special feature for Farsi (Persian) and maybe Arabic, translators can
220 insert an <SAMP>&lsquo;I&rsquo;</SAMP> flag into numeric format directives. For example, the
221 translation of <CODE>"%d"</CODE> can be <CODE>"%Id"</CODE>. The effect of this flag,
222 on systems with GNU <CODE>libc</CODE>, is that in the output, the ASCII digits are
223 replaced with the <SAMP>&lsquo;outdigits&rsquo;</SAMP> defined in the <CODE>LC_CTYPE</CODE> locale
224 category. On other systems, the <CODE>gettext</CODE> function removes this flag,
225 so that it has no effect.
227 </P>
229 Note that the programmer should <EM>not</EM> put this flag into the
230 untranslated string. (Putting the <SAMP>&lsquo;I&rsquo;</SAMP> format directive flag into an
231 <VAR>msgid</VAR> string would lead to undefined behaviour on platforms without
232 glibc when NLS is disabled.)
234 </P>
237 <H3><A NAME="SEC250" HREF="gettext_toc.html#TOC250">15.3.2 Objective C Format Strings</A></H3>
240 Objective C format strings are like C format strings. They support an
241 additional format directive: "%@", which when executed consumes an argument
242 of type <CODE>Object *</CODE>.
244 </P>
247 <H3><A NAME="SEC251" HREF="gettext_toc.html#TOC251">15.3.3 Shell Format Strings</A></H3>
250 Shell format strings, as supported by GNU gettext and the <SAMP>&lsquo;envsubst&rsquo;</SAMP>
251 program, are strings with references to shell variables in the form
252 <CODE>$<VAR>variable</VAR></CODE> or <CODE>${<VAR>variable</VAR>}</CODE>. References of the form
253 <CODE>${<VAR>variable</VAR>-<VAR>default</VAR>}</CODE>,
254 <CODE>${<VAR>variable</VAR>:-<VAR>default</VAR>}</CODE>,
255 <CODE>${<VAR>variable</VAR>=<VAR>default</VAR>}</CODE>,
256 <CODE>${<VAR>variable</VAR>:=<VAR>default</VAR>}</CODE>,
257 <CODE>${<VAR>variable</VAR>+<VAR>replacement</VAR>}</CODE>,
258 <CODE>${<VAR>variable</VAR>:+<VAR>replacement</VAR>}</CODE>,
259 <CODE>${<VAR>variable</VAR>?<VAR>ignored</VAR>}</CODE>,
260 <CODE>${<VAR>variable</VAR>:?<VAR>ignored</VAR>}</CODE>,
261 that would be valid inside shell scripts, are not supported. The
262 <VAR>variable</VAR> names must consist solely of alphanumeric or underscore
263 ASCII characters, not start with a digit and be nonempty; otherwise such
264 a variable reference is ignored.
266 </P>
269 <H3><A NAME="SEC252" HREF="gettext_toc.html#TOC252">15.3.4 Python Format Strings</A></H3>
272 Python format strings are described in
273 Python Library reference /
274 2. Built-in Types, Exceptions and Functions /
275 2.2. Built-in Types /
276 2.2.6. Sequence Types /
277 2.2.6.2. String Formatting Operations.
278 <A HREF="http://www.python.org/doc/2.2.1/lib/typesseq-strings.html">http://www.python.org/doc/2.2.1/lib/typesseq-strings.html</A>.
280 </P>
283 <H3><A NAME="SEC253" HREF="gettext_toc.html#TOC253">15.3.5 Lisp Format Strings</A></H3>
286 Lisp format strings are described in the Common Lisp HyperSpec,
287 chapter 22.3 Formatted Output,
288 <A HREF="http://www.lisp.org/HyperSpec/Body/sec_22-3.html">http://www.lisp.org/HyperSpec/Body/sec_22-3.html</A>.
290 </P>
293 <H3><A NAME="SEC254" HREF="gettext_toc.html#TOC254">15.3.6 Emacs Lisp Format Strings</A></H3>
296 Emacs Lisp format strings are documented in the Emacs Lisp reference,
297 section Formatting Strings,
298 <A HREF="http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75">http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75</A>.
299 Note that as of version 21, XEmacs supports numbered argument specifications
300 in format strings while FSF Emacs doesn't.
302 </P>
305 <H3><A NAME="SEC255" HREF="gettext_toc.html#TOC255">15.3.7 librep Format Strings</A></H3>
308 librep format strings are documented in the librep manual, section
309 Formatted Output,
310 <A HREF="http://librep.sourceforge.net/librep-manual.html#Formatted%20Output">http://librep.sourceforge.net/librep-manual.html#Formatted%20Output</A>,
311 <A HREF="http://www.gwinnup.org/research/docs/librep.html#SEC122">http://www.gwinnup.org/research/docs/librep.html#SEC122</A>.
313 </P>
316 <H3><A NAME="SEC256" HREF="gettext_toc.html#TOC256">15.3.8 Scheme Format Strings</A></H3>
319 Scheme format strings are documented in the SLIB manual, section
320 Format Specification.
322 </P>
325 <H3><A NAME="SEC257" HREF="gettext_toc.html#TOC257">15.3.9 Smalltalk Format Strings</A></H3>
328 Smalltalk format strings are described in the GNU Smalltalk documentation,
329 class <CODE>CharArray</CODE>, methods <SAMP>&lsquo;bindWith:&rsquo;</SAMP> and
330 <SAMP>&lsquo;bindWithArguments:&rsquo;</SAMP>.
331 <A HREF="http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238">http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238</A>.
332 In summary, a directive starts with <SAMP>&lsquo;%&rsquo;</SAMP> and is followed by <SAMP>&lsquo;%&rsquo;</SAMP>
333 or a nonzero digit (<SAMP>&lsquo;1&rsquo;</SAMP> to <SAMP>&lsquo;9&rsquo;</SAMP>).
335 </P>
338 <H3><A NAME="SEC258" HREF="gettext_toc.html#TOC258">15.3.10 Java Format Strings</A></H3>
341 Java format strings are described in the JDK documentation for class
342 <CODE>java.text.MessageFormat</CODE>,
343 <A HREF="http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html">http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html</A>.
344 See also the ICU documentation
345 <A HREF="http://oss.software.ibm.com/icu/apiref/classMessageFormat.html">http://oss.software.ibm.com/icu/apiref/classMessageFormat.html</A>.
347 </P>
350 <H3><A NAME="SEC259" HREF="gettext_toc.html#TOC259">15.3.11 C# Format Strings</A></H3>
353 C# format strings are described in the .NET documentation for class
354 <CODE>System.String</CODE> and in
355 <A HREF="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp</A>.
357 </P>
360 <H3><A NAME="SEC260" HREF="gettext_toc.html#TOC260">15.3.12 awk Format Strings</A></H3>
363 awk format strings are described in the gawk documentation, section
364 Printf,
365 <A HREF="http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf">http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf</A>.
367 </P>
370 <H3><A NAME="SEC261" HREF="gettext_toc.html#TOC261">15.3.13 Object Pascal Format Strings</A></H3>
373 Object Pascal format strings are described in the documentation of the
374 Free Pascal runtime library, section Format,
375 <A HREF="http://www.freepascal.org/docs-html/rtl/sysutils/format.html">http://www.freepascal.org/docs-html/rtl/sysutils/format.html</A>.
377 </P>
380 <H3><A NAME="SEC262" HREF="gettext_toc.html#TOC262">15.3.14 YCP Format Strings</A></H3>
383 YCP sformat strings are described in the libycp documentation
384 <A HREF="file:/usr/share/doc/packages/libycp/YCP-builtins.html">file:/usr/share/doc/packages/libycp/YCP-builtins.html</A>.
385 In summary, a directive starts with <SAMP>&lsquo;%&rsquo;</SAMP> and is followed by <SAMP>&lsquo;%&rsquo;</SAMP>
386 or a nonzero digit (<SAMP>&lsquo;1&rsquo;</SAMP> to <SAMP>&lsquo;9&rsquo;</SAMP>).
388 </P>
391 <H3><A NAME="SEC263" HREF="gettext_toc.html#TOC263">15.3.15 Tcl Format Strings</A></H3>
394 Tcl format strings are described in the <TT>&lsquo;format.n&rsquo;</TT> manual page,
395 <A HREF="http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm">http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm</A>.
397 </P>
400 <H3><A NAME="SEC264" HREF="gettext_toc.html#TOC264">15.3.16 Perl Format Strings</A></H3>
403 There are two kinds format strings in Perl: those acceptable to the
404 Perl built-in function <CODE>printf</CODE>, labelled as <SAMP>&lsquo;perl-format&rsquo;</SAMP>,
405 and those acceptable to the <CODE>libintl-perl</CODE> function <CODE>__x</CODE>,
406 labelled as <SAMP>&lsquo;perl-brace-format&rsquo;</SAMP>.
408 </P>
410 Perl <CODE>printf</CODE> format strings are described in the <CODE>sprintf</CODE>
411 section of <SAMP>&lsquo;man perlfunc&rsquo;</SAMP>.
413 </P>
415 Perl brace format strings are described in the
416 <TT>&lsquo;Locale::TextDomain(3pm)&rsquo;</TT> manual page of the CPAN package
417 libintl-perl. In brief, Perl format uses placeholders put between
418 braces (<SAMP>&lsquo;{&rsquo;</SAMP> and <SAMP>&lsquo;}&rsquo;</SAMP>). The placeholder must have the syntax
419 of simple identifiers.
421 </P>
424 <H3><A NAME="SEC265" HREF="gettext_toc.html#TOC265">15.3.17 PHP Format Strings</A></H3>
427 PHP format strings are described in the documentation of the PHP function
428 <CODE>sprintf</CODE>, in <TT>&lsquo;phpdoc/manual/function.sprintf.html&rsquo;</TT> or
429 <A HREF="http://www.php.net/manual/en/function.sprintf.php">http://www.php.net/manual/en/function.sprintf.php</A>.
431 </P>
434 <H3><A NAME="SEC266" HREF="gettext_toc.html#TOC266">15.3.18 GCC internal Format Strings</A></H3>
437 These format strings are used inside the GCC sources. In such a format
438 string, a directive starts with <SAMP>&lsquo;%&rsquo;</SAMP>, is optionally followed by a
439 size specifier <SAMP>&lsquo;l&rsquo;</SAMP>, an optional flag <SAMP>&lsquo;+&rsquo;</SAMP>, another optional flag
440 <SAMP>&lsquo;#&rsquo;</SAMP>, and is finished by a specifier: <SAMP>&lsquo;%&rsquo;</SAMP> denotes a literal
441 percent sign, <SAMP>&lsquo;c&rsquo;</SAMP> denotes a character, <SAMP>&lsquo;s&rsquo;</SAMP> denotes a string,
442 <SAMP>&lsquo;i&rsquo;</SAMP> and <SAMP>&lsquo;d&rsquo;</SAMP> denote an integer, <SAMP>&lsquo;o&rsquo;</SAMP>, <SAMP>&lsquo;u&rsquo;</SAMP>, <SAMP>&lsquo;x&rsquo;</SAMP>
443 denote an unsigned integer, <SAMP>&lsquo;.*s&rsquo;</SAMP> denotes a string preceded by a
444 width specification, <SAMP>&lsquo;H&rsquo;</SAMP> denotes a <SAMP>&lsquo;location_t *&rsquo;</SAMP> pointer,
445 <SAMP>&lsquo;D&rsquo;</SAMP> denotes a general declaration, <SAMP>&lsquo;F&rsquo;</SAMP> denotes a function
446 declaration, <SAMP>&lsquo;T&rsquo;</SAMP> denotes a type, <SAMP>&lsquo;A&rsquo;</SAMP> denotes a function argument,
447 <SAMP>&lsquo;C&rsquo;</SAMP> denotes a tree code, <SAMP>&lsquo;E&rsquo;</SAMP> denotes an expression, <SAMP>&lsquo;L&rsquo;</SAMP>
448 denotes a programming language, <SAMP>&lsquo;O&rsquo;</SAMP> denotes a binary operator,
449 <SAMP>&lsquo;P&rsquo;</SAMP> denotes a function parameter, <SAMP>&lsquo;Q&rsquo;</SAMP> denotes an assignment
450 operator, <SAMP>&lsquo;V&rsquo;</SAMP> denotes a const/volatile qualifier.
452 </P>
455 <H3><A NAME="SEC267" HREF="gettext_toc.html#TOC267">15.3.19 GFC internal Format Strings</A></H3>
458 These format strings are used inside the GNU Fortran Compiler sources,
459 that is, the Fortran frontend in the GCC sources. In such a format
460 string, a directive starts with <SAMP>&lsquo;%&rsquo;</SAMP> and is finished by a
461 specifier: <SAMP>&lsquo;%&rsquo;</SAMP> denotes a literal percent sign, <SAMP>&lsquo;C&rsquo;</SAMP> denotes the
462 current source location, <SAMP>&lsquo;L&rsquo;</SAMP> denotes a source location, <SAMP>&lsquo;c&rsquo;</SAMP>
463 denotes a character, <SAMP>&lsquo;s&rsquo;</SAMP> denotes a string, <SAMP>&lsquo;i&rsquo;</SAMP> and <SAMP>&lsquo;d&rsquo;</SAMP>
464 denote an integer, <SAMP>&lsquo;u&rsquo;</SAMP> denotes an unsigned integer. <SAMP>&lsquo;i&rsquo;</SAMP>,
465 <SAMP>&lsquo;d&rsquo;</SAMP>, and <SAMP>&lsquo;u&rsquo;</SAMP> may be preceded by a size specifier <SAMP>&lsquo;l&rsquo;</SAMP>.
467 </P>
470 <H3><A NAME="SEC268" HREF="gettext_toc.html#TOC268">15.3.20 Qt Format Strings</A></H3>
473 Qt format strings are described in the documentation of the QString class
474 <A HREF="file:/usr/lib/qt-4.3.0/doc/html/qstring.html">file:/usr/lib/qt-4.3.0/doc/html/qstring.html</A>.
475 In summary, a directive consists of a <SAMP>&lsquo;%&rsquo;</SAMP> followed by a digit. The same
476 directive cannot occur more than once in a format string.
478 </P>
481 <H3><A NAME="SEC269" HREF="gettext_toc.html#TOC269">15.3.21 Qt Format Strings</A></H3>
484 Qt format strings are described in the documentation of the QObject::tr method
485 <A HREF="file:/usr/lib/qt-4.3.0/doc/html/qobject.html">file:/usr/lib/qt-4.3.0/doc/html/qobject.html</A>.
486 In summary, the only allowed directive is <SAMP>&lsquo;%n&rsquo;</SAMP>.
488 </P>
491 <H3><A NAME="SEC270" HREF="gettext_toc.html#TOC270">15.3.22 KDE Format Strings</A></H3>
494 KDE 4 format strings are defined as follows:
495 A directive consists of a <SAMP>&lsquo;%&rsquo;</SAMP> followed by a non-zero decimal number.
496 If a <SAMP>&lsquo;%n&rsquo;</SAMP> occurs in a format strings, all of <SAMP>&lsquo;%1&rsquo;</SAMP>, ..., <SAMP>&lsquo;%(n-1)&rsquo;</SAMP>
497 must occur as well, except possibly one of them.
499 </P>
502 <H3><A NAME="SEC271" HREF="gettext_toc.html#TOC271">15.3.23 Boost Format Strings</A></H3>
505 Boost format strings are described in the documentation of the
506 <CODE>boost::format</CODE> class, at
507 <A HREF="http://www.boost.org/libs/format/doc/format.html">http://www.boost.org/libs/format/doc/format.html</A>.
508 In summary, a directive has either the same syntax as in a C format string,
509 such as <SAMP>&lsquo;%1$+5d&rsquo;</SAMP>, or may be surrounded by vertical bars, such as
510 <SAMP>&lsquo;%|1$+5d|&rsquo;</SAMP> or <SAMP>&lsquo;%|1$+5|&rsquo;</SAMP>, or consists of just an argument number
511 between percent signs, such as <SAMP>&lsquo;%1%&rsquo;</SAMP>.
513 </P>
516 <H2><A NAME="SEC272" HREF="gettext_toc.html#TOC272">15.4 The Maintainer's View</A></H2>
519 For the maintainer, the general procedure differs from the C language
520 case in two ways.
522 </P>
524 <UL>
525 <LI>
527 For those languages that don't use GNU gettext, the <TT>&lsquo;intl/&rsquo;</TT> directory
528 is not needed and can be omitted. This means that the maintainer calls the
529 <CODE>gettextize</CODE> program without the <SAMP>&lsquo;--intl&rsquo;</SAMP> option, and that he
530 invokes the <CODE>AM_GNU_GETTEXT</CODE> autoconf macro via
531 <SAMP>&lsquo;AM_GNU_GETTEXT([external])&rsquo;</SAMP>.
533 <LI>
535 If only a single programming language is used, the <CODE>XGETTEXT_OPTIONS</CODE>
536 variable in <TT>&lsquo;po/Makevars&rsquo;</TT> (see section <A HREF="gettext_13.html#SEC218">13.4.3 <TT>&lsquo;Makevars&rsquo;</TT> in <TT>&lsquo;po/&rsquo;</TT></A>) should be adjusted to
537 match the <CODE>xgettext</CODE> options for that particular programming language.
538 If the package uses more than one programming language with <CODE>gettext</CODE>
539 support, it becomes necessary to change the POT file construction rule
540 in <TT>&lsquo;po/Makefile.in.in&rsquo;</TT>. It is recommended to make one <CODE>xgettext</CODE>
541 invocation per programming language, each with the options appropriate for
542 that language, and to combine the resulting files using <CODE>msgcat</CODE>.
543 </UL>
547 <H2><A NAME="SEC273" HREF="gettext_toc.html#TOC273">15.5 Individual Programming Languages</A></H2>
551 <H3><A NAME="SEC274" HREF="gettext_toc.html#TOC274">15.5.1 C, C++, Objective C</A></H3>
553 <A NAME="IDX1174"></A>
555 </P>
556 <DL COMPACT>
558 <DT>RPMs
559 <DD>
560 gcc, gpp, gobjc, glibc, gettext
562 <DT>File extension
563 <DD>
564 For C: <CODE>c</CODE>, <CODE>h</CODE>.
565 <BR>For C++: <CODE>C</CODE>, <CODE>c++</CODE>, <CODE>cc</CODE>, <CODE>cxx</CODE>, <CODE>cpp</CODE>, <CODE>hpp</CODE>.
566 <BR>For Objective C: <CODE>m</CODE>.
568 <DT>String syntax
569 <DD>
570 <CODE>"abc"</CODE>
572 <DT>gettext shorthand
573 <DD>
574 <CODE>_("abc")</CODE>
576 <DT>gettext/ngettext functions
577 <DD>
578 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
579 <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
581 <DT>textdomain
582 <DD>
583 <CODE>textdomain</CODE> function
585 <DT>bindtextdomain
586 <DD>
587 <CODE>bindtextdomain</CODE> function
589 <DT>setlocale
590 <DD>
591 Programmer must call <CODE>setlocale (LC_ALL, "")</CODE>
593 <DT>Prerequisite
594 <DD>
595 <CODE>#include &#60;libintl.h&#62;</CODE>
596 <BR><CODE>#include &#60;locale.h&#62;</CODE>
597 <BR><CODE>#define _(string) gettext (string)</CODE>
599 <DT>Use or emulate GNU gettext
600 <DD>
603 <DT>Extractor
604 <DD>
605 <CODE>xgettext -k_</CODE>
607 <DT>Formatting with positions
608 <DD>
609 <CODE>fprintf "%2$d %1$d"</CODE>
610 <BR>In C++: <CODE>autosprintf "%2$d %1$d"</CODE>
611 (see section ‘Introduction’ in <CITE>GNU autosprintf</CITE>)
613 <DT>Portability
614 <DD>
615 autoconf (gettext.m4) and #if ENABLE_NLS
617 <DT>po-mode marking
618 <DD>
620 </DL>
623 The following examples are available in the <TT>&lsquo;examples&rsquo;</TT> directory:
624 <CODE>hello-c</CODE>, <CODE>hello-c-gnome</CODE>, <CODE>hello-c++</CODE>, <CODE>hello-c++-qt</CODE>,
625 <CODE>hello-c++-kde</CODE>, <CODE>hello-c++-gnome</CODE>, <CODE>hello-c++-wxwidgets</CODE>,
626 <CODE>hello-objc</CODE>, <CODE>hello-objc-gnustep</CODE>, <CODE>hello-objc-gnome</CODE>.
628 </P>
631 <H3><A NAME="SEC275" HREF="gettext_toc.html#TOC275">15.5.2 sh - Shell Script</A></H3>
633 <A NAME="IDX1175"></A>
635 </P>
636 <DL COMPACT>
638 <DT>RPMs
639 <DD>
640 bash, gettext
642 <DT>File extension
643 <DD>
644 <CODE>sh</CODE>
646 <DT>String syntax
647 <DD>
648 <CODE>"abc"</CODE>, <CODE>'abc'</CODE>, <CODE>abc</CODE>
650 <DT>gettext shorthand
651 <DD>
652 <CODE>"`gettext \"abc\"`"</CODE>
654 <DT>gettext/ngettext functions
655 <DD>
656 <A NAME="IDX1176"></A>
657 <A NAME="IDX1177"></A>
658 <CODE>gettext</CODE>, <CODE>ngettext</CODE> programs
659 <BR><CODE>eval_gettext</CODE>, <CODE>eval_ngettext</CODE> shell functions
661 <DT>textdomain
662 <DD>
663 <A NAME="IDX1178"></A>
664 environment variable <CODE>TEXTDOMAIN</CODE>
666 <DT>bindtextdomain
667 <DD>
668 <A NAME="IDX1179"></A>
669 environment variable <CODE>TEXTDOMAINDIR</CODE>
671 <DT>setlocale
672 <DD>
673 automatic
675 <DT>Prerequisite
676 <DD>
677 <CODE>. gettext.sh</CODE>
679 <DT>Use or emulate GNU gettext
680 <DD>
683 <DT>Extractor
684 <DD>
685 <CODE>xgettext</CODE>
687 <DT>Formatting with positions
688 <DD>
691 <DT>Portability
692 <DD>
693 fully portable
695 <DT>po-mode marking
696 <DD>
698 </DL>
701 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-sh</CODE>.
703 </P>
707 <H4><A NAME="SEC276" HREF="gettext_toc.html#TOC276">15.5.2.1 Preparing Shell Scripts for Internationalization</A></H4>
709 <A NAME="IDX1180"></A>
711 </P>
713 Preparing a shell script for internationalization is conceptually similar
714 to the steps described in section <A HREF="gettext_4.html#SEC16">4 Preparing Program Sources</A>. The concrete steps for shell
715 scripts are as follows.
717 </P>
719 <OL>
720 <LI>
722 Insert the line
725 <PRE>
726 . gettext.sh
727 </PRE>
729 near the top of the script. <CODE>gettext.sh</CODE> is a shell function library
730 that provides the functions
731 <CODE>eval_gettext</CODE> (see section <A HREF="gettext_15.html#SEC281">15.5.2.6 Invoking the <CODE>eval_gettext</CODE> function</A>) and
732 <CODE>eval_ngettext</CODE> (see section <A HREF="gettext_15.html#SEC282">15.5.2.7 Invoking the <CODE>eval_ngettext</CODE> function</A>).
733 You have to ensure that <CODE>gettext.sh</CODE> can be found in the <CODE>PATH</CODE>.
735 <LI>
737 Set and export the <CODE>TEXTDOMAIN</CODE> and <CODE>TEXTDOMAINDIR</CODE> environment
738 variables. Usually <CODE>TEXTDOMAIN</CODE> is the package or program name, and
739 <CODE>TEXTDOMAINDIR</CODE> is the absolute pathname corresponding to
740 <CODE>$prefix/share/locale</CODE>, where <CODE>$prefix</CODE> is the installation location.
743 <PRE>
744 TEXTDOMAIN=@PACKAGE@
745 export TEXTDOMAIN
746 TEXTDOMAINDIR=@LOCALEDIR@
747 export TEXTDOMAINDIR
748 </PRE>
750 <LI>
752 Prepare the strings for translation, as described in section <A HREF="gettext_4.html#SEC19">4.3 Preparing Translatable Strings</A>.
754 <LI>
756 Simplify translatable strings so that they don't contain command substitution
757 (<CODE>"`...`"</CODE> or <CODE>"$(...)"</CODE>), variable access with defaulting (like
758 <CODE>${<VAR>variable</VAR>-<VAR>default</VAR>}</CODE>), access to positional arguments
759 (like <CODE>$0</CODE>, <CODE>$1</CODE>, ...) or highly volatile shell variables (like
760 <CODE>$?</CODE>). This can always be done through simple local code restructuring.
761 For example,
764 <PRE>
765 echo "Usage: $0 [OPTION] FILE..."
766 </PRE>
768 becomes
771 <PRE>
772 program_name=$0
773 echo "Usage: $program_name [OPTION] FILE..."
774 </PRE>
776 Similarly,
779 <PRE>
780 echo "Remaining files: `ls | wc -l`"
781 </PRE>
783 becomes
786 <PRE>
787 filecount="`ls | wc -l`"
788 echo "Remaining files: $filecount"
789 </PRE>
791 <LI>
793 For each translatable string, change the output command <SAMP>&lsquo;echo&rsquo;</SAMP> or
794 <SAMP>&lsquo;$echo&rsquo;</SAMP> to <SAMP>&lsquo;gettext&rsquo;</SAMP> (if the string contains no references to
795 shell variables) or to <SAMP>&lsquo;eval_gettext&rsquo;</SAMP> (if it refers to shell variables),
796 followed by a no-argument <SAMP>&lsquo;echo&rsquo;</SAMP> command (to account for the terminating
797 newline). Similarly, for cases with plural handling, replace a conditional
798 <SAMP>&lsquo;echo&rsquo;</SAMP> command with an invocation of <SAMP>&lsquo;ngettext&rsquo;</SAMP> or
799 <SAMP>&lsquo;eval_ngettext&rsquo;</SAMP>, followed by a no-argument <SAMP>&lsquo;echo&rsquo;</SAMP> command.
801 When doing this, you also need to add an extra backslash before the dollar
802 sign in references to shell variables, so that the <SAMP>&lsquo;eval_gettext&rsquo;</SAMP>
803 function receives the translatable string before the variable values are
804 substituted into it. For example,
807 <PRE>
808 echo "Remaining files: $filecount"
809 </PRE>
811 becomes
814 <PRE>
815 eval_gettext "Remaining files: \$filecount"; echo
816 </PRE>
818 If the output command is not <SAMP>&lsquo;echo&rsquo;</SAMP>, you can make it use <SAMP>&lsquo;echo&rsquo;</SAMP>
819 nevertheless, through the use of backquotes. However, note that inside
820 backquotes, backslashes must be doubled to be effective (because the
821 backquoting eats one level of backslashes). For example, assuming that
822 <SAMP>&lsquo;error&rsquo;</SAMP> is a shell function that signals an error,
825 <PRE>
826 error "file not found: $filename"
827 </PRE>
829 is first transformed into
832 <PRE>
833 error "`echo \"file not found: \$filename\"`"
834 </PRE>
836 which then becomes
839 <PRE>
840 error "`eval_gettext \"file not found: \\\$filename\"`"
841 </PRE>
843 </OL>
847 <H4><A NAME="SEC277" HREF="gettext_toc.html#TOC277">15.5.2.2 Contents of <CODE>gettext.sh</CODE></A></H4>
850 <CODE>gettext.sh</CODE>, contained in the run-time package of GNU gettext, provides
851 the following:
853 </P>
855 <UL>
856 <LI>$echo
858 The variable <CODE>echo</CODE> is set to a command that outputs its first argument
859 and a newline, without interpreting backslashes in the argument string.
861 <LI>eval_gettext
863 See section <A HREF="gettext_15.html#SEC281">15.5.2.6 Invoking the <CODE>eval_gettext</CODE> function</A>.
865 <LI>eval_ngettext
867 See section <A HREF="gettext_15.html#SEC282">15.5.2.7 Invoking the <CODE>eval_ngettext</CODE> function</A>.
868 </UL>
872 <H4><A NAME="SEC278" HREF="gettext_toc.html#TOC278">15.5.2.3 Invoking the <CODE>gettext</CODE> program</A></H4>
875 <A NAME="IDX1181"></A>
876 <A NAME="IDX1182"></A>
878 <PRE>
879 gettext [<VAR>option</VAR>] [[<VAR>textdomain</VAR>] <VAR>msgid</VAR>]
880 gettext [<VAR>option</VAR>] -s [<VAR>msgid</VAR>]...
881 </PRE>
884 <A NAME="IDX1183"></A>
885 The <CODE>gettext</CODE> program displays the native language translation of a
886 textual message.
888 </P>
890 <STRONG>Arguments</STRONG>
892 </P>
893 <DL COMPACT>
895 <DT><SAMP>&lsquo;-d <VAR>textdomain</VAR>&rsquo;</SAMP>
896 <DD>
897 <DT><SAMP>&lsquo;--domain=<VAR>textdomain</VAR>&rsquo;</SAMP>
898 <DD>
899 <A NAME="IDX1184"></A>
900 <A NAME="IDX1185"></A>
901 Retrieve translated messages from <VAR>textdomain</VAR>. Usually a <VAR>textdomain</VAR>
902 corresponds to a package, a program, or a module of a program.
904 <DT><SAMP>&lsquo;-e&rsquo;</SAMP>
905 <DD>
906 <A NAME="IDX1186"></A>
907 Enable expansion of some escape sequences. This option is for compatibility
908 with the <SAMP>&lsquo;echo&rsquo;</SAMP> program or shell built-in. The escape sequences
909 <SAMP>&lsquo;\a&rsquo;</SAMP>, <SAMP>&lsquo;\b&rsquo;</SAMP>, <SAMP>&lsquo;\c&rsquo;</SAMP>, <SAMP>&lsquo;\f&rsquo;</SAMP>, <SAMP>&lsquo;\n&rsquo;</SAMP>, <SAMP>&lsquo;\r&rsquo;</SAMP>, <SAMP>&lsquo;\t&rsquo;</SAMP>,
910 <SAMP>&lsquo;\v&rsquo;</SAMP>, <SAMP>&lsquo;\\&rsquo;</SAMP>, and <SAMP>&lsquo;\&rsquo;</SAMP> followed by one to three octal digits, are
911 interpreted like the System V <SAMP>&lsquo;echo&rsquo;</SAMP> program did.
913 <DT><SAMP>&lsquo;-E&rsquo;</SAMP>
914 <DD>
915 <A NAME="IDX1187"></A>
916 This option is only for compatibility with the <SAMP>&lsquo;echo&rsquo;</SAMP> program or shell
917 built-in. It has no effect.
919 <DT><SAMP>&lsquo;-h&rsquo;</SAMP>
920 <DD>
921 <DT><SAMP>&lsquo;--help&rsquo;</SAMP>
922 <DD>
923 <A NAME="IDX1188"></A>
924 <A NAME="IDX1189"></A>
925 Display this help and exit.
927 <DT><SAMP>&lsquo;-n&rsquo;</SAMP>
928 <DD>
929 <A NAME="IDX1190"></A>
930 Suppress trailing newline. By default, <CODE>gettext</CODE> adds a newline to
931 the output.
933 <DT><SAMP>&lsquo;-V&rsquo;</SAMP>
934 <DD>
935 <DT><SAMP>&lsquo;--version&rsquo;</SAMP>
936 <DD>
937 <A NAME="IDX1191"></A>
938 <A NAME="IDX1192"></A>
939 Output version information and exit.
941 <DT><SAMP>&lsquo;[<VAR>textdomain</VAR>] <VAR>msgid</VAR>&rsquo;</SAMP>
942 <DD>
943 Retrieve translated message corresponding to <VAR>msgid</VAR> from <VAR>textdomain</VAR>.
945 </DL>
948 If the <VAR>textdomain</VAR> parameter is not given, the domain is determined from
949 the environment variable <CODE>TEXTDOMAIN</CODE>. If the message catalog is not
950 found in the regular directory, another location can be specified with the
951 environment variable <CODE>TEXTDOMAINDIR</CODE>.
953 </P>
955 When used with the <CODE>-s</CODE> option the program behaves like the <SAMP>&lsquo;echo&rsquo;</SAMP>
956 command. But it does not simply copy its arguments to stdout. Instead those
957 messages found in the selected catalog are translated.
959 </P>
961 Note: <CODE>xgettext</CODE> supports only the one-argument form of the
962 <CODE>gettext</CODE> invocation, where no options are present and the
963 <VAR>textdomain</VAR> is implicit, from the environment.
965 </P>
968 <H4><A NAME="SEC279" HREF="gettext_toc.html#TOC279">15.5.2.4 Invoking the <CODE>ngettext</CODE> program</A></H4>
971 <A NAME="IDX1193"></A>
972 <A NAME="IDX1194"></A>
974 <PRE>
975 ngettext [<VAR>option</VAR>] [<VAR>textdomain</VAR>] <VAR>msgid</VAR> <VAR>msgid-plural</VAR> <VAR>count</VAR>
976 </PRE>
979 <A NAME="IDX1195"></A>
980 The <CODE>ngettext</CODE> program displays the native language translation of a
981 textual message whose grammatical form depends on a number.
983 </P>
985 <STRONG>Arguments</STRONG>
987 </P>
988 <DL COMPACT>
990 <DT><SAMP>&lsquo;-d <VAR>textdomain</VAR>&rsquo;</SAMP>
991 <DD>
992 <DT><SAMP>&lsquo;--domain=<VAR>textdomain</VAR>&rsquo;</SAMP>
993 <DD>
994 <A NAME="IDX1196"></A>
995 <A NAME="IDX1197"></A>
996 Retrieve translated messages from <VAR>textdomain</VAR>. Usually a <VAR>textdomain</VAR>
997 corresponds to a package, a program, or a module of a program.
999 <DT><SAMP>&lsquo;-e&rsquo;</SAMP>
1000 <DD>
1001 <A NAME="IDX1198"></A>
1002 Enable expansion of some escape sequences. This option is for compatibility
1003 with the <SAMP>&lsquo;gettext&rsquo;</SAMP> program. The escape sequences
1004 <SAMP>&lsquo;\a&rsquo;</SAMP>, <SAMP>&lsquo;\b&rsquo;</SAMP>, <SAMP>&lsquo;\c&rsquo;</SAMP>, <SAMP>&lsquo;\f&rsquo;</SAMP>, <SAMP>&lsquo;\n&rsquo;</SAMP>, <SAMP>&lsquo;\r&rsquo;</SAMP>, <SAMP>&lsquo;\t&rsquo;</SAMP>,
1005 <SAMP>&lsquo;\v&rsquo;</SAMP>, <SAMP>&lsquo;\\&rsquo;</SAMP>, and <SAMP>&lsquo;\&rsquo;</SAMP> followed by one to three octal digits, are
1006 interpreted like the System V <SAMP>&lsquo;echo&rsquo;</SAMP> program did.
1008 <DT><SAMP>&lsquo;-E&rsquo;</SAMP>
1009 <DD>
1010 <A NAME="IDX1199"></A>
1011 This option is only for compatibility with the <SAMP>&lsquo;gettext&rsquo;</SAMP> program. It has
1012 no effect.
1014 <DT><SAMP>&lsquo;-h&rsquo;</SAMP>
1015 <DD>
1016 <DT><SAMP>&lsquo;--help&rsquo;</SAMP>
1017 <DD>
1018 <A NAME="IDX1200"></A>
1019 <A NAME="IDX1201"></A>
1020 Display this help and exit.
1022 <DT><SAMP>&lsquo;-V&rsquo;</SAMP>
1023 <DD>
1024 <DT><SAMP>&lsquo;--version&rsquo;</SAMP>
1025 <DD>
1026 <A NAME="IDX1202"></A>
1027 <A NAME="IDX1203"></A>
1028 Output version information and exit.
1030 <DT><SAMP>&lsquo;<VAR>textdomain</VAR>&rsquo;</SAMP>
1031 <DD>
1032 Retrieve translated message from <VAR>textdomain</VAR>.
1034 <DT><SAMP>&lsquo;<VAR>msgid</VAR> <VAR>msgid-plural</VAR>&rsquo;</SAMP>
1035 <DD>
1036 Translate <VAR>msgid</VAR> (English singular) / <VAR>msgid-plural</VAR> (English plural).
1038 <DT><SAMP>&lsquo;<VAR>count</VAR>&rsquo;</SAMP>
1039 <DD>
1040 Choose singular/plural form based on this value.
1042 </DL>
1045 If the <VAR>textdomain</VAR> parameter is not given, the domain is determined from
1046 the environment variable <CODE>TEXTDOMAIN</CODE>. If the message catalog is not
1047 found in the regular directory, another location can be specified with the
1048 environment variable <CODE>TEXTDOMAINDIR</CODE>.
1050 </P>
1052 Note: <CODE>xgettext</CODE> supports only the three-arguments form of the
1053 <CODE>ngettext</CODE> invocation, where no options are present and the
1054 <VAR>textdomain</VAR> is implicit, from the environment.
1056 </P>
1059 <H4><A NAME="SEC280" HREF="gettext_toc.html#TOC280">15.5.2.5 Invoking the <CODE>envsubst</CODE> program</A></H4>
1062 <A NAME="IDX1204"></A>
1063 <A NAME="IDX1205"></A>
1065 <PRE>
1066 envsubst [<VAR>option</VAR>] [<VAR>shell-format</VAR>]
1067 </PRE>
1070 <A NAME="IDX1206"></A>
1071 <A NAME="IDX1207"></A>
1072 <A NAME="IDX1208"></A>
1073 The <CODE>envsubst</CODE> program substitutes the values of environment variables.
1075 </P>
1077 <STRONG>Operation mode</STRONG>
1079 </P>
1080 <DL COMPACT>
1082 <DT><SAMP>&lsquo;-v&rsquo;</SAMP>
1083 <DD>
1084 <DT><SAMP>&lsquo;--variables&rsquo;</SAMP>
1085 <DD>
1086 <A NAME="IDX1209"></A>
1087 <A NAME="IDX1210"></A>
1088 Output the variables occurring in <VAR>shell-format</VAR>.
1090 </DL>
1093 <STRONG>Informative output</STRONG>
1095 </P>
1096 <DL COMPACT>
1098 <DT><SAMP>&lsquo;-h&rsquo;</SAMP>
1099 <DD>
1100 <DT><SAMP>&lsquo;--help&rsquo;</SAMP>
1101 <DD>
1102 <A NAME="IDX1211"></A>
1103 <A NAME="IDX1212"></A>
1104 Display this help and exit.
1106 <DT><SAMP>&lsquo;-V&rsquo;</SAMP>
1107 <DD>
1108 <DT><SAMP>&lsquo;--version&rsquo;</SAMP>
1109 <DD>
1110 <A NAME="IDX1213"></A>
1111 <A NAME="IDX1214"></A>
1112 Output version information and exit.
1114 </DL>
1117 In normal operation mode, standard input is copied to standard output,
1118 with references to environment variables of the form <CODE>$VARIABLE</CODE> or
1119 <CODE>${VARIABLE}</CODE> being replaced with the corresponding values. If a
1120 <VAR>shell-format</VAR> is given, only those environment variables that are
1121 referenced in <VAR>shell-format</VAR> are substituted; otherwise all environment
1122 variables references occurring in standard input are substituted.
1124 </P>
1126 These substitutions are a subset of the substitutions that a shell performs
1127 on unquoted and double-quoted strings. Other kinds of substitutions done
1128 by a shell, such as <CODE>${<VAR>variable</VAR>-<VAR>default</VAR>}</CODE> or
1129 <CODE>$(<VAR>command-list</VAR>)</CODE> or <CODE>`<VAR>command-list</VAR>`</CODE>, are not performed
1130 by the <CODE>envsubst</CODE> program, due to security reasons.
1132 </P>
1134 When <CODE>--variables</CODE> is used, standard input is ignored, and the output
1135 consists of the environment variables that are referenced in
1136 <VAR>shell-format</VAR>, one per line.
1138 </P>
1141 <H4><A NAME="SEC281" HREF="gettext_toc.html#TOC281">15.5.2.6 Invoking the <CODE>eval_gettext</CODE> function</A></H4>
1144 <A NAME="IDX1215"></A>
1146 <PRE>
1147 eval_gettext <VAR>msgid</VAR>
1148 </PRE>
1151 <A NAME="IDX1216"></A>
1152 This function outputs the native language translation of a textual message,
1153 performing dollar-substitution on the result. Note that only shell variables
1154 mentioned in <VAR>msgid</VAR> will be dollar-substituted in the result.
1156 </P>
1159 <H4><A NAME="SEC282" HREF="gettext_toc.html#TOC282">15.5.2.7 Invoking the <CODE>eval_ngettext</CODE> function</A></H4>
1162 <A NAME="IDX1217"></A>
1164 <PRE>
1165 eval_ngettext <VAR>msgid</VAR> <VAR>msgid-plural</VAR> <VAR>count</VAR>
1166 </PRE>
1169 <A NAME="IDX1218"></A>
1170 This function outputs the native language translation of a textual message
1171 whose grammatical form depends on a number, performing dollar-substitution
1172 on the result. Note that only shell variables mentioned in <VAR>msgid</VAR> or
1173 <VAR>msgid-plural</VAR> will be dollar-substituted in the result.
1175 </P>
1178 <H3><A NAME="SEC283" HREF="gettext_toc.html#TOC283">15.5.3 bash - Bourne-Again Shell Script</A></H3>
1180 <A NAME="IDX1219"></A>
1182 </P>
1184 GNU <CODE>bash</CODE> 2.0 or newer has a special shorthand for translating a
1185 string and substituting variable values in it: <CODE>$"msgid"</CODE>. But
1186 the use of this construct is <STRONG>discouraged</STRONG>, due to the security
1187 holes it opens and due to its portability problems.
1189 </P>
1191 The security holes of <CODE>$"..."</CODE> come from the fact that after looking up
1192 the translation of the string, <CODE>bash</CODE> processes it like it processes
1193 any double-quoted string: dollar and backquote processing, like <SAMP>&lsquo;eval&rsquo;</SAMP>
1194 does.
1196 </P>
1198 <OL>
1199 <LI>
1201 In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,
1202 JOHAB, some double-byte characters have a second byte whose value is
1203 <CODE>0x60</CODE>. For example, the byte sequence <CODE>\xe0\x60</CODE> is a single
1204 character in these locales. Many versions of <CODE>bash</CODE> (all versions
1205 up to bash-2.05, and newer versions on platforms without <CODE>mbsrtowcs()</CODE>
1206 function) don't know about character boundaries and see a backquote character
1207 where there is only a particular Chinese character. Thus it can start
1208 executing part of the translation as a command list. This situation can occur
1209 even without the translator being aware of it: if the translator provides
1210 translations in the UTF-8 encoding, it is the <CODE>gettext()</CODE> function which
1211 will, during its conversion from the translator's encoding to the user's
1212 locale's encoding, produce the dangerous <CODE>\x60</CODE> bytes.
1214 <LI>
1216 A translator could - voluntarily or inadvertently - use backquotes
1217 <CODE>"`...`"</CODE> or dollar-parentheses <CODE>"$(...)"</CODE> in her translations.
1218 The enclosed strings would be executed as command lists by the shell.
1219 </OL>
1222 The portability problem is that <CODE>bash</CODE> must be built with
1223 internationalization support; this is normally not the case on systems
1224 that don't have the <CODE>gettext()</CODE> function in libc.
1226 </P>
1229 <H3><A NAME="SEC284" HREF="gettext_toc.html#TOC284">15.5.4 Python</A></H3>
1231 <A NAME="IDX1220"></A>
1233 </P>
1234 <DL COMPACT>
1236 <DT>RPMs
1237 <DD>
1238 python
1240 <DT>File extension
1241 <DD>
1242 <CODE>py</CODE>
1244 <DT>String syntax
1245 <DD>
1246 <CODE>'abc'</CODE>, <CODE>u'abc'</CODE>, <CODE>r'abc'</CODE>, <CODE>ur'abc'</CODE>,
1247 <BR><CODE>"abc"</CODE>, <CODE>u"abc"</CODE>, <CODE>r"abc"</CODE>, <CODE>ur"abc"</CODE>,
1248 <BR><CODE>”'abc”'</CODE>, <CODE>u”'abc”'</CODE>, <CODE>r”'abc”'</CODE>, <CODE>ur”'abc”'</CODE>,
1249 <BR><CODE>"""abc"""</CODE>, <CODE>u"""abc"""</CODE>, <CODE>r"""abc"""</CODE>, <CODE>ur"""abc"""</CODE>
1251 <DT>gettext shorthand
1252 <DD>
1253 <CODE>_('abc')</CODE> etc.
1255 <DT>gettext/ngettext functions
1256 <DD>
1257 <CODE>gettext.gettext</CODE>, <CODE>gettext.dgettext</CODE>,
1258 <CODE>gettext.ngettext</CODE>, <CODE>gettext.dngettext</CODE>,
1259 also <CODE>ugettext</CODE>, <CODE>ungettext</CODE>
1261 <DT>textdomain
1262 <DD>
1263 <CODE>gettext.textdomain</CODE> function, or
1264 <CODE>gettext.install(<VAR>domain</VAR>)</CODE> function
1266 <DT>bindtextdomain
1267 <DD>
1268 <CODE>gettext.bindtextdomain</CODE> function, or
1269 <CODE>gettext.install(<VAR>domain</VAR>,<VAR>localedir</VAR>)</CODE> function
1271 <DT>setlocale
1272 <DD>
1273 not used by the gettext emulation
1275 <DT>Prerequisite
1276 <DD>
1277 <CODE>import gettext</CODE>
1279 <DT>Use or emulate GNU gettext
1280 <DD>
1281 emulate
1283 <DT>Extractor
1284 <DD>
1285 <CODE>xgettext</CODE>
1287 <DT>Formatting with positions
1288 <DD>
1289 <CODE>'...%(ident)d...' % { 'ident': value }</CODE>
1291 <DT>Portability
1292 <DD>
1293 fully portable
1295 <DT>po-mode marking
1296 <DD>
1298 </DL>
1301 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-python</CODE>.
1303 </P>
1305 A note about format strings: Python supports format strings with unnamed
1306 arguments, such as <CODE>'...%d...'</CODE>, and format strings with named arguments,
1307 such as <CODE>'...%(ident)d...'</CODE>. The latter are preferable for
1308 internationalized programs, for two reasons:
1310 </P>
1312 <UL>
1313 <LI>
1315 When a format string takes more than one argument, the translator can provide
1316 a translation that uses the arguments in a different order, if the format
1317 string uses named arguments. For example, the translator can reformulate
1319 <PRE>
1320 "'%(volume)s' has only %(freespace)d bytes free."
1321 </PRE>
1325 <PRE>
1326 "Only %(freespace)d bytes free on '%(volume)s'."
1327 </PRE>
1329 Additionally, the identifiers also provide some context to the translator.
1331 <LI>
1333 In the context of plural forms, the format string used for the singular form
1334 does not use the numeric argument in many languages. Even in English, one
1335 prefers to write <CODE>"one hour"</CODE> instead of <CODE>"1 hour"</CODE>. Omitting
1336 individual arguments from format strings like this is only possible with
1337 the named argument syntax. (With unnamed arguments, Python -- unlike C --
1338 verifies that the format string uses all supplied arguments.)
1339 </UL>
1343 <H3><A NAME="SEC285" HREF="gettext_toc.html#TOC285">15.5.5 GNU clisp - Common Lisp</A></H3>
1345 <A NAME="IDX1221"></A>
1346 <A NAME="IDX1222"></A>
1347 <A NAME="IDX1223"></A>
1349 </P>
1350 <DL COMPACT>
1352 <DT>RPMs
1353 <DD>
1354 clisp 2.28 or newer
1356 <DT>File extension
1357 <DD>
1358 <CODE>lisp</CODE>
1360 <DT>String syntax
1361 <DD>
1362 <CODE>"abc"</CODE>
1364 <DT>gettext shorthand
1365 <DD>
1366 <CODE>(_ "abc")</CODE>, <CODE>(ENGLISH "abc")</CODE>
1368 <DT>gettext/ngettext functions
1369 <DD>
1370 <CODE>i18n:gettext</CODE>, <CODE>i18n:ngettext</CODE>
1372 <DT>textdomain
1373 <DD>
1374 <CODE>i18n:textdomain</CODE>
1376 <DT>bindtextdomain
1377 <DD>
1378 <CODE>i18n:textdomaindir</CODE>
1380 <DT>setlocale
1381 <DD>
1382 automatic
1384 <DT>Prerequisite
1385 <DD>
1388 <DT>Use or emulate GNU gettext
1389 <DD>
1392 <DT>Extractor
1393 <DD>
1394 <CODE>xgettext -k_ -kENGLISH</CODE>
1396 <DT>Formatting with positions
1397 <DD>
1398 <CODE>format "~1@*~D ~0@*~D"</CODE>
1400 <DT>Portability
1401 <DD>
1402 On platforms without gettext, no translation.
1404 <DT>po-mode marking
1405 <DD>
1407 </DL>
1410 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-clisp</CODE>.
1412 </P>
1415 <H3><A NAME="SEC286" HREF="gettext_toc.html#TOC286">15.5.6 GNU clisp C sources</A></H3>
1417 <A NAME="IDX1224"></A>
1419 </P>
1420 <DL COMPACT>
1422 <DT>RPMs
1423 <DD>
1424 clisp
1426 <DT>File extension
1427 <DD>
1428 <CODE>d</CODE>
1430 <DT>String syntax
1431 <DD>
1432 <CODE>"abc"</CODE>
1434 <DT>gettext shorthand
1435 <DD>
1436 <CODE>ENGLISH ? "abc" : ""</CODE>
1437 <BR><CODE>GETTEXT("abc")</CODE>
1438 <BR><CODE>GETTEXTL("abc")</CODE>
1440 <DT>gettext/ngettext functions
1441 <DD>
1442 <CODE>clgettext</CODE>, <CODE>clgettextl</CODE>
1444 <DT>textdomain
1445 <DD>
1448 <DT>bindtextdomain
1449 <DD>
1452 <DT>setlocale
1453 <DD>
1454 automatic
1456 <DT>Prerequisite
1457 <DD>
1458 <CODE>#include "lispbibl.c"</CODE>
1460 <DT>Use or emulate GNU gettext
1461 <DD>
1464 <DT>Extractor
1465 <DD>
1466 <CODE>clisp-xgettext</CODE>
1468 <DT>Formatting with positions
1469 <DD>
1470 <CODE>fprintf "%2$d %1$d"</CODE>
1472 <DT>Portability
1473 <DD>
1474 On platforms without gettext, no translation.
1476 <DT>po-mode marking
1477 <DD>
1479 </DL>
1483 <H3><A NAME="SEC287" HREF="gettext_toc.html#TOC287">15.5.7 Emacs Lisp</A></H3>
1485 <A NAME="IDX1225"></A>
1487 </P>
1488 <DL COMPACT>
1490 <DT>RPMs
1491 <DD>
1492 emacs, xemacs
1494 <DT>File extension
1495 <DD>
1496 <CODE>el</CODE>
1498 <DT>String syntax
1499 <DD>
1500 <CODE>"abc"</CODE>
1502 <DT>gettext shorthand
1503 <DD>
1504 <CODE>(_"abc")</CODE>
1506 <DT>gettext/ngettext functions
1507 <DD>
1508 <CODE>gettext</CODE>, <CODE>dgettext</CODE> (xemacs only)
1510 <DT>textdomain
1511 <DD>
1512 <CODE>domain</CODE> special form (xemacs only)
1514 <DT>bindtextdomain
1515 <DD>
1516 <CODE>bind-text-domain</CODE> function (xemacs only)
1518 <DT>setlocale
1519 <DD>
1520 automatic
1522 <DT>Prerequisite
1523 <DD>
1526 <DT>Use or emulate GNU gettext
1527 <DD>
1530 <DT>Extractor
1531 <DD>
1532 <CODE>xgettext</CODE>
1534 <DT>Formatting with positions
1535 <DD>
1536 <CODE>format "%2$d %1$d"</CODE>
1538 <DT>Portability
1539 <DD>
1540 Only XEmacs. Without <CODE>I18N3</CODE> defined at build time, no translation.
1542 <DT>po-mode marking
1543 <DD>
1545 </DL>
1549 <H3><A NAME="SEC288" HREF="gettext_toc.html#TOC288">15.5.8 librep</A></H3>
1551 <A NAME="IDX1226"></A>
1553 </P>
1554 <DL COMPACT>
1556 <DT>RPMs
1557 <DD>
1558 librep 0.15.3 or newer
1560 <DT>File extension
1561 <DD>
1562 <CODE>jl</CODE>
1564 <DT>String syntax
1565 <DD>
1566 <CODE>"abc"</CODE>
1568 <DT>gettext shorthand
1569 <DD>
1570 <CODE>(_"abc")</CODE>
1572 <DT>gettext/ngettext functions
1573 <DD>
1574 <CODE>gettext</CODE>
1576 <DT>textdomain
1577 <DD>
1578 <CODE>textdomain</CODE> function
1580 <DT>bindtextdomain
1581 <DD>
1582 <CODE>bindtextdomain</CODE> function
1584 <DT>setlocale
1585 <DD>
1588 <DT>Prerequisite
1589 <DD>
1590 <CODE>(require 'rep.i18n.gettext)</CODE>
1592 <DT>Use or emulate GNU gettext
1593 <DD>
1596 <DT>Extractor
1597 <DD>
1598 <CODE>xgettext</CODE>
1600 <DT>Formatting with positions
1601 <DD>
1602 <CODE>format "%2$d %1$d"</CODE>
1604 <DT>Portability
1605 <DD>
1606 On platforms without gettext, no translation.
1608 <DT>po-mode marking
1609 <DD>
1611 </DL>
1614 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-librep</CODE>.
1616 </P>
1619 <H3><A NAME="SEC289" HREF="gettext_toc.html#TOC289">15.5.9 GNU guile - Scheme</A></H3>
1621 <A NAME="IDX1227"></A>
1622 <A NAME="IDX1228"></A>
1624 </P>
1625 <DL COMPACT>
1627 <DT>RPMs
1628 <DD>
1629 guile
1631 <DT>File extension
1632 <DD>
1633 <CODE>scm</CODE>
1635 <DT>String syntax
1636 <DD>
1637 <CODE>"abc"</CODE>
1639 <DT>gettext shorthand
1640 <DD>
1641 <CODE>(_ "abc")</CODE>
1643 <DT>gettext/ngettext functions
1644 <DD>
1645 <CODE>gettext</CODE>, <CODE>ngettext</CODE>
1647 <DT>textdomain
1648 <DD>
1649 <CODE>textdomain</CODE>
1651 <DT>bindtextdomain
1652 <DD>
1653 <CODE>bindtextdomain</CODE>
1655 <DT>setlocale
1656 <DD>
1657 <CODE>(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))</CODE>
1659 <DT>Prerequisite
1660 <DD>
1661 <CODE>(use-modules (ice-9 format))</CODE>
1663 <DT>Use or emulate GNU gettext
1664 <DD>
1667 <DT>Extractor
1668 <DD>
1669 <CODE>xgettext -k_</CODE>
1671 <DT>Formatting with positions
1672 <DD>
1675 <DT>Portability
1676 <DD>
1677 On platforms without gettext, no translation.
1679 <DT>po-mode marking
1680 <DD>
1682 </DL>
1685 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-guile</CODE>.
1687 </P>
1690 <H3><A NAME="SEC290" HREF="gettext_toc.html#TOC290">15.5.10 GNU Smalltalk</A></H3>
1692 <A NAME="IDX1229"></A>
1694 </P>
1695 <DL COMPACT>
1697 <DT>RPMs
1698 <DD>
1699 smalltalk
1701 <DT>File extension
1702 <DD>
1703 <CODE>st</CODE>
1705 <DT>String syntax
1706 <DD>
1707 <CODE>'abc'</CODE>
1709 <DT>gettext shorthand
1710 <DD>
1711 <CODE>NLS ? 'abc'</CODE>
1713 <DT>gettext/ngettext functions
1714 <DD>
1715 <CODE>LcMessagesDomain&#62;&#62;#at:</CODE>, <CODE>LcMessagesDomain&#62;&#62;#at:plural:with:</CODE>
1717 <DT>textdomain
1718 <DD>
1719 <CODE>LcMessages&#62;&#62;#domain:localeDirectory:</CODE> (returns a <CODE>LcMessagesDomain</CODE>
1720 object).<BR>
1721 Example: <CODE>I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'</CODE>
1723 <DT>bindtextdomain
1724 <DD>
1725 <CODE>LcMessages&#62;&#62;#domain:localeDirectory:</CODE>, see above.
1727 <DT>setlocale
1728 <DD>
1729 Automatic if you use <CODE>I18N Locale default</CODE>.
1731 <DT>Prerequisite
1732 <DD>
1733 <CODE>PackageLoader fileInPackage: 'I18N'!</CODE>
1735 <DT>Use or emulate GNU gettext
1736 <DD>
1737 emulate
1739 <DT>Extractor
1740 <DD>
1741 <CODE>xgettext</CODE>
1743 <DT>Formatting with positions
1744 <DD>
1745 <CODE>'%1 %2' bindWith: 'Hello' with: 'world'</CODE>
1747 <DT>Portability
1748 <DD>
1749 fully portable
1751 <DT>po-mode marking
1752 <DD>
1754 </DL>
1757 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory:
1758 <CODE>hello-smalltalk</CODE>.
1760 </P>
1763 <H3><A NAME="SEC291" HREF="gettext_toc.html#TOC291">15.5.11 Java</A></H3>
1765 <A NAME="IDX1230"></A>
1767 </P>
1768 <DL COMPACT>
1770 <DT>RPMs
1771 <DD>
1772 java, java2
1774 <DT>File extension
1775 <DD>
1776 <CODE>java</CODE>
1778 <DT>String syntax
1779 <DD>
1780 "abc"
1782 <DT>gettext shorthand
1783 <DD>
1784 _("abc")
1786 <DT>gettext/ngettext functions
1787 <DD>
1788 <CODE>GettextResource.gettext</CODE>, <CODE>GettextResource.ngettext</CODE>,
1789 <CODE>GettextResource.pgettext</CODE>, <CODE>GettextResource.npgettext</CODE>
1791 <DT>textdomain
1792 <DD>
1793 ---, use <CODE>ResourceBundle.getResource</CODE> instead
1795 <DT>bindtextdomain
1796 <DD>
1797 ---, use CLASSPATH instead
1799 <DT>setlocale
1800 <DD>
1801 automatic
1803 <DT>Prerequisite
1804 <DD>
1807 <DT>Use or emulate GNU gettext
1808 <DD>
1809 ---, uses a Java specific message catalog format
1811 <DT>Extractor
1812 <DD>
1813 <CODE>xgettext -k_</CODE>
1815 <DT>Formatting with positions
1816 <DD>
1817 <CODE>MessageFormat.format "{1,number} {0,number}"</CODE>
1819 <DT>Portability
1820 <DD>
1821 fully portable
1823 <DT>po-mode marking
1824 <DD>
1826 </DL>
1829 Before marking strings as internationalizable, uses of the string
1830 concatenation operator need to be converted to <CODE>MessageFormat</CODE>
1831 applications. For example, <CODE>"file "+filename+" not found"</CODE> becomes
1832 <CODE>MessageFormat.format("file {0} not found", new Object[] { filename })</CODE>.
1833 Only after this is done, can the strings be marked and extracted.
1835 </P>
1837 GNU gettext uses the native Java internationalization mechanism, namely
1838 <CODE>ResourceBundle</CODE>s. There are two formats of <CODE>ResourceBundle</CODE>s:
1839 <CODE>.properties</CODE> files and <CODE>.class</CODE> files. The <CODE>.properties</CODE>
1840 format is a text file which the translators can directly edit, like PO
1841 files, but which doesn't support plural forms. Whereas the <CODE>.class</CODE>
1842 format is compiled from <CODE>.java</CODE> source code and can support plural
1843 forms (provided it is accessed through an appropriate API, see below).
1845 </P>
1847 To convert a PO file to a <CODE>.properties</CODE> file, the <CODE>msgcat</CODE>
1848 program can be used with the option <CODE>--properties-output</CODE>. To convert
1849 a <CODE>.properties</CODE> file back to a PO file, the <CODE>msgcat</CODE> program
1850 can be used with the option <CODE>--properties-input</CODE>. All the tools
1851 that manipulate PO files can work with <CODE>.properties</CODE> files as well,
1852 if given the <CODE>--properties-input</CODE> and/or <CODE>--properties-output</CODE>
1853 option.
1855 </P>
1857 To convert a PO file to a ResourceBundle class, the <CODE>msgfmt</CODE> program
1858 can be used with the option <CODE>--java</CODE> or <CODE>--java2</CODE>. To convert a
1859 ResourceBundle back to a PO file, the <CODE>msgunfmt</CODE> program can be used
1860 with the option <CODE>--java</CODE>.
1862 </P>
1864 Two different programmatic APIs can be used to access ResourceBundles.
1865 Note that both APIs work with all kinds of ResourceBundles, whether
1866 GNU gettext generated classes, or other <CODE>.class</CODE> or <CODE>.properties</CODE>
1867 files.
1869 </P>
1871 <OL>
1872 <LI>
1874 The <CODE>java.util.ResourceBundle</CODE> API.
1876 In particular, its <CODE>getString</CODE> function returns a string translation.
1877 Note that a missing translation yields a <CODE>MissingResourceException</CODE>.
1879 This has the advantage of being the standard API. And it does not require
1880 any additional libraries, only the <CODE>msgcat</CODE> generated <CODE>.properties</CODE>
1881 files or the <CODE>msgfmt</CODE> generated <CODE>.class</CODE> files. But it cannot do
1882 plural handling, even if the resource was generated by <CODE>msgfmt</CODE> from
1883 a PO file with plural handling.
1885 <LI>
1887 The <CODE>gnu.gettext.GettextResource</CODE> API.
1889 Reference documentation in Javadoc 1.1 style format is in the
1890 <A HREF="javadoc2/index.html">javadoc2 directory</A>.
1892 Its <CODE>gettext</CODE> function returns a string translation. Note that when
1893 a translation is missing, the <VAR>msgid</VAR> argument is returned unchanged.
1895 This has the advantage of having the <CODE>ngettext</CODE> function for plural
1896 handling and the <CODE>pgettext</CODE> and <CODE>npgettext</CODE> for strings constraint
1897 to a particular context.
1899 <A NAME="IDX1231"></A>
1900 To use this API, one needs the <CODE>libintl.jar</CODE> file which is part of
1901 the GNU gettext package and distributed under the LGPL.
1902 </OL>
1905 Four examples, using the second API, are available in the <TT>&lsquo;examples&rsquo;</TT>
1906 directory: <CODE>hello-java</CODE>, <CODE>hello-java-awt</CODE>, <CODE>hello-java-swing</CODE>,
1907 <CODE>hello-java-qtjambi</CODE>.
1909 </P>
1911 Now, to make use of the API and define a shorthand for <SAMP>&lsquo;getString&rsquo;</SAMP>,
1912 there are three idioms that you can choose from:
1914 </P>
1916 <UL>
1917 <LI>
1919 (This one assumes Java 1.5 or newer.)
1920 In a unique class of your project, say <SAMP>&lsquo;Util&rsquo;</SAMP>, define a static variable
1921 holding the <CODE>ResourceBundle</CODE> instance and the shorthand:
1924 <PRE>
1925 private static ResourceBundle myResources =
1926 ResourceBundle.getBundle("domain-name");
1927 public static String _(String s) {
1928 return myResources.getString(s);
1930 </PRE>
1932 All classes containing internationalized strings then contain
1935 <PRE>
1936 import static Util._;
1937 </PRE>
1939 and the shorthand is used like this:
1942 <PRE>
1943 System.out.println(_("Operation completed."));
1944 </PRE>
1946 <LI>
1948 In a unique class of your project, say <SAMP>&lsquo;Util&rsquo;</SAMP>, define a static variable
1949 holding the <CODE>ResourceBundle</CODE> instance:
1952 <PRE>
1953 public static ResourceBundle myResources =
1954 ResourceBundle.getBundle("domain-name");
1955 </PRE>
1957 All classes containing internationalized strings then contain
1960 <PRE>
1961 private static ResourceBundle res = Util.myResources;
1962 private static String _(String s) { return res.getString(s); }
1963 </PRE>
1965 and the shorthand is used like this:
1968 <PRE>
1969 System.out.println(_("Operation completed."));
1970 </PRE>
1972 <LI>
1974 You add a class with a very short name, say <SAMP>&lsquo;S&rsquo;</SAMP>, containing just the
1975 definition of the resource bundle and of the shorthand:
1978 <PRE>
1979 public class S {
1980 public static ResourceBundle myResources =
1981 ResourceBundle.getBundle("domain-name");
1982 public static String _(String s) {
1983 return myResources.getString(s);
1986 </PRE>
1988 and the shorthand is used like this:
1991 <PRE>
1992 System.out.println(S._("Operation completed."));
1993 </PRE>
1995 </UL>
1998 Which of the three idioms you choose, will depend on whether your project
1999 requires portability to Java versions prior to Java 1.5 and, if so, whether
2000 copying two lines of codes into every class is more acceptable in your project
2001 than a class with a single-letter name.
2003 </P>
2006 <H3><A NAME="SEC292" HREF="gettext_toc.html#TOC292">15.5.12 C#</A></H3>
2008 <A NAME="IDX1232"></A>
2010 </P>
2011 <DL COMPACT>
2013 <DT>RPMs
2014 <DD>
2015 pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer
2017 <DT>File extension
2018 <DD>
2019 <CODE>cs</CODE>
2021 <DT>String syntax
2022 <DD>
2023 <CODE>"abc"</CODE>, <CODE>@"abc"</CODE>
2025 <DT>gettext shorthand
2026 <DD>
2027 _("abc")
2029 <DT>gettext/ngettext functions
2030 <DD>
2031 <CODE>GettextResourceManager.GetString</CODE>,
2032 <CODE>GettextResourceManager.GetPluralString</CODE>
2033 <CODE>GettextResourceManager.GetParticularString</CODE>
2034 <CODE>GettextResourceManager.GetParticularPluralString</CODE>
2036 <DT>textdomain
2037 <DD>
2038 <CODE>new GettextResourceManager(domain)</CODE>
2040 <DT>bindtextdomain
2041 <DD>
2042 ---, compiled message catalogs are located in subdirectories of the directory
2043 containing the executable
2045 <DT>setlocale
2046 <DD>
2047 automatic
2049 <DT>Prerequisite
2050 <DD>
2053 <DT>Use or emulate GNU gettext
2054 <DD>
2055 ---, uses a C# specific message catalog format
2057 <DT>Extractor
2058 <DD>
2059 <CODE>xgettext -k_</CODE>
2061 <DT>Formatting with positions
2062 <DD>
2063 <CODE>String.Format "{1} {0}"</CODE>
2065 <DT>Portability
2066 <DD>
2067 fully portable
2069 <DT>po-mode marking
2070 <DD>
2072 </DL>
2075 Before marking strings as internationalizable, uses of the string
2076 concatenation operator need to be converted to <CODE>String.Format</CODE>
2077 invocations. For example, <CODE>"file "+filename+" not found"</CODE> becomes
2078 <CODE>String.Format("file {0} not found", filename)</CODE>.
2079 Only after this is done, can the strings be marked and extracted.
2081 </P>
2083 GNU gettext uses the native C#/.NET internationalization mechanism, namely
2084 the classes <CODE>ResourceManager</CODE> and <CODE>ResourceSet</CODE>. Applications
2085 use the <CODE>ResourceManager</CODE> methods to retrieve the native language
2086 translation of strings. An instance of <CODE>ResourceSet</CODE> is the in-memory
2087 representation of a message catalog file. The <CODE>ResourceManager</CODE> loads
2088 and accesses <CODE>ResourceSet</CODE> instances as needed to look up the
2089 translations.
2091 </P>
2093 There are two formats of <CODE>ResourceSet</CODE>s that can be directly loaded by
2094 the C# runtime: <CODE>.resources</CODE> files and <CODE>.dll</CODE> files.
2096 </P>
2098 <UL>
2099 <LI>
2101 The <CODE>.resources</CODE> format is a binary file usually generated through the
2102 <CODE>resgen</CODE> or <CODE>monoresgen</CODE> utility, but which doesn't support plural
2103 forms. <CODE>.resources</CODE> files can also be embedded in .NET <CODE>.exe</CODE> files.
2104 This only affects whether a file system access is performed to load the message
2105 catalog; it doesn't affect the contents of the message catalog.
2107 <LI>
2109 On the other hand, the <CODE>.dll</CODE> format is a binary file that is compiled
2110 from <CODE>.cs</CODE> source code and can support plural forms (provided it is
2111 accessed through the GNU gettext API, see below).
2112 </UL>
2115 Note that these .NET <CODE>.dll</CODE> and <CODE>.exe</CODE> files are not tied to a
2116 particular platform; their file format and GNU gettext for C# can be used
2117 on any platform.
2119 </P>
2121 To convert a PO file to a <CODE>.resources</CODE> file, the <CODE>msgfmt</CODE> program
2122 can be used with the option <SAMP>&lsquo;--csharp-resources&rsquo;</SAMP>. To convert a
2123 <CODE>.resources</CODE> file back to a PO file, the <CODE>msgunfmt</CODE> program can be
2124 used with the option <SAMP>&lsquo;--csharp-resources&rsquo;</SAMP>. You can also, in some cases,
2125 use the <CODE>resgen</CODE> program (from the <CODE>pnet</CODE> package) or the
2126 <CODE>monoresgen</CODE> program (from the <CODE>mono</CODE>/<CODE>mcs</CODE> package). These
2127 programs can also convert a <CODE>.resources</CODE> file back to a PO file. But
2128 beware: as of this writing (January 2004), the <CODE>monoresgen</CODE> converter is
2129 quite buggy and the <CODE>resgen</CODE> converter ignores the encoding of the PO
2130 files.
2132 </P>
2134 To convert a PO file to a <CODE>.dll</CODE> file, the <CODE>msgfmt</CODE> program can be
2135 used with the option <CODE>--csharp</CODE>. The result will be a <CODE>.dll</CODE> file
2136 containing a subclass of <CODE>GettextResourceSet</CODE>, which itself is a subclass
2137 of <CODE>ResourceSet</CODE>. To convert a <CODE>.dll</CODE> file containing a
2138 <CODE>GettextResourceSet</CODE> subclass back to a PO file, the <CODE>msgunfmt</CODE>
2139 program can be used with the option <CODE>--csharp</CODE>.
2141 </P>
2143 The advantages of the <CODE>.dll</CODE> format over the <CODE>.resources</CODE> format
2144 are:
2146 </P>
2148 <OL>
2149 <LI>
2151 Freedom to localize: Users can add their own translations to an application
2152 after it has been built and distributed. Whereas when the programmer uses
2153 a <CODE>ResourceManager</CODE> constructor provided by the system, the set of
2154 <CODE>.resources</CODE> files for an application must be specified when the
2155 application is built and cannot be extended afterwards.
2157 <LI>
2159 Plural handling: A message catalog in <CODE>.dll</CODE> format supports the plural
2160 handling function <CODE>GetPluralString</CODE>. Whereas <CODE>.resources</CODE> files can
2161 only contain data and only support lookups that depend on a single string.
2163 <LI>
2165 Context handling: A message catalog in <CODE>.dll</CODE> format supports the
2166 query-with-context functions <CODE>GetParticularString</CODE> and
2167 <CODE>GetParticularPluralString</CODE>. Whereas <CODE>.resources</CODE> files can
2168 only contain data and only support lookups that depend on a single string.
2170 <LI>
2172 The <CODE>GettextResourceManager</CODE> that loads the message catalogs in
2173 <CODE>.dll</CODE> format also provides for inheritance on a per-message basis.
2174 For example, in Austrian (<CODE>de_AT</CODE>) locale, translations from the German
2175 (<CODE>de</CODE>) message catalog will be used for messages not found in the
2176 Austrian message catalog. This has the consequence that the Austrian
2177 translators need only translate those few messages for which the translation
2178 into Austrian differs from the German one. Whereas when working with
2179 <CODE>.resources</CODE> files, each message catalog must provide the translations
2180 of all messages by itself.
2182 <LI>
2184 The <CODE>GettextResourceManager</CODE> that loads the message catalogs in
2185 <CODE>.dll</CODE> format also provides for a fallback: The English <VAR>msgid</VAR> is
2186 returned when no translation can be found. Whereas when working with
2187 <CODE>.resources</CODE> files, a language-neutral <CODE>.resources</CODE> file must
2188 explicitly be provided as a fallback.
2189 </OL>
2192 On the side of the programmatic APIs, the programmer can use either the
2193 standard <CODE>ResourceManager</CODE> API and the GNU <CODE>GettextResourceManager</CODE>
2194 API. The latter is an extension of the former, because
2195 <CODE>GettextResourceManager</CODE> is a subclass of <CODE>ResourceManager</CODE>.
2197 </P>
2199 <OL>
2200 <LI>
2202 The <CODE>System.Resources.ResourceManager</CODE> API.
2204 This API works with resources in <CODE>.resources</CODE> format.
2206 The creation of the <CODE>ResourceManager</CODE> is done through
2208 <PRE>
2209 new ResourceManager(domainname, Assembly.GetExecutingAssembly())
2210 </PRE>
2213 The <CODE>GetString</CODE> function returns a string's translation. Note that this
2214 function returns null when a translation is missing (i.e. not even found in
2215 the fallback resource file).
2217 <LI>
2219 The <CODE>GNU.Gettext.GettextResourceManager</CODE> API.
2221 This API works with resources in <CODE>.dll</CODE> format.
2223 Reference documentation is in the
2224 <A HREF="csharpdoc/index.html">csharpdoc directory</A>.
2226 The creation of the <CODE>ResourceManager</CODE> is done through
2228 <PRE>
2229 new GettextResourceManager(domainname)
2230 </PRE>
2232 The <CODE>GetString</CODE> function returns a string's translation. Note that when
2233 a translation is missing, the <VAR>msgid</VAR> argument is returned unchanged.
2235 The <CODE>GetPluralString</CODE> function returns a string translation with plural
2236 handling, like the <CODE>ngettext</CODE> function in C.
2238 The <CODE>GetParticularString</CODE> function returns a string's translation,
2239 specific to a particular context, like the <CODE>pgettext</CODE> function in C.
2240 Note that when a translation is missing, the <VAR>msgid</VAR> argument is returned
2241 unchanged.
2243 The <CODE>GetParticularPluralString</CODE> function returns a string translation,
2244 specific to a particular context, with plural handling, like the
2245 <CODE>npgettext</CODE> function in C.
2247 <A NAME="IDX1233"></A>
2248 To use this API, one needs the <CODE>GNU.Gettext.dll</CODE> file which is part of
2249 the GNU gettext package and distributed under the LGPL.
2250 </OL>
2253 You can also mix both approaches: use the
2254 <CODE>GNU.Gettext.GettextResourceManager</CODE> constructor, but otherwise use
2255 only the <CODE>ResourceManager</CODE> type and only the <CODE>GetString</CODE> method.
2256 This is appropriate when you want to profit from the tools for PO files,
2257 but don't want to change an existing source code that uses
2258 <CODE>ResourceManager</CODE> and don't (yet) need the <CODE>GetPluralString</CODE> method.
2260 </P>
2262 Two examples, using the second API, are available in the <TT>&lsquo;examples&rsquo;</TT>
2263 directory: <CODE>hello-csharp</CODE>, <CODE>hello-csharp-forms</CODE>.
2265 </P>
2267 Now, to make use of the API and define a shorthand for <SAMP>&lsquo;GetString&rsquo;</SAMP>,
2268 there are two idioms that you can choose from:
2270 </P>
2272 <UL>
2273 <LI>
2275 In a unique class of your project, say <SAMP>&lsquo;Util&rsquo;</SAMP>, define a static variable
2276 holding the <CODE>ResourceManager</CODE> instance:
2279 <PRE>
2280 public static GettextResourceManager MyResourceManager =
2281 new GettextResourceManager("domain-name");
2282 </PRE>
2284 All classes containing internationalized strings then contain
2287 <PRE>
2288 private static GettextResourceManager Res = Util.MyResourceManager;
2289 private static String _(String s) { return Res.GetString(s); }
2290 </PRE>
2292 and the shorthand is used like this:
2295 <PRE>
2296 Console.WriteLine(_("Operation completed."));
2297 </PRE>
2299 <LI>
2301 You add a class with a very short name, say <SAMP>&lsquo;S&rsquo;</SAMP>, containing just the
2302 definition of the resource manager and of the shorthand:
2305 <PRE>
2306 public class S {
2307 public static GettextResourceManager MyResourceManager =
2308 new GettextResourceManager("domain-name");
2309 public static String _(String s) {
2310 return MyResourceManager.GetString(s);
2313 </PRE>
2315 and the shorthand is used like this:
2318 <PRE>
2319 Console.WriteLine(S._("Operation completed."));
2320 </PRE>
2322 </UL>
2325 Which of the two idioms you choose, will depend on whether copying two lines
2326 of codes into every class is more acceptable in your project than a class
2327 with a single-letter name.
2329 </P>
2332 <H3><A NAME="SEC293" HREF="gettext_toc.html#TOC293">15.5.13 GNU awk</A></H3>
2334 <A NAME="IDX1234"></A>
2335 <A NAME="IDX1235"></A>
2337 </P>
2338 <DL COMPACT>
2340 <DT>RPMs
2341 <DD>
2342 gawk 3.1 or newer
2344 <DT>File extension
2345 <DD>
2346 <CODE>awk</CODE>
2348 <DT>String syntax
2349 <DD>
2350 <CODE>"abc"</CODE>
2352 <DT>gettext shorthand
2353 <DD>
2354 <CODE>_"abc"</CODE>
2356 <DT>gettext/ngettext functions
2357 <DD>
2358 <CODE>dcgettext</CODE>, missing <CODE>dcngettext</CODE> in gawk-3.1.0
2360 <DT>textdomain
2361 <DD>
2362 <CODE>TEXTDOMAIN</CODE> variable
2364 <DT>bindtextdomain
2365 <DD>
2366 <CODE>bindtextdomain</CODE> function
2368 <DT>setlocale
2369 <DD>
2370 automatic, but missing <CODE>setlocale (LC_MESSAGES, "")</CODE> in gawk-3.1.0
2372 <DT>Prerequisite
2373 <DD>
2376 <DT>Use or emulate GNU gettext
2377 <DD>
2380 <DT>Extractor
2381 <DD>
2382 <CODE>xgettext</CODE>
2384 <DT>Formatting with positions
2385 <DD>
2386 <CODE>printf "%2$d %1$d"</CODE> (GNU awk only)
2388 <DT>Portability
2389 <DD>
2390 On platforms without gettext, no translation. On non-GNU awks, you must
2391 define <CODE>dcgettext</CODE>, <CODE>dcngettext</CODE> and <CODE>bindtextdomain</CODE>
2392 yourself.
2394 <DT>po-mode marking
2395 <DD>
2397 </DL>
2400 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-gawk</CODE>.
2402 </P>
2405 <H3><A NAME="SEC294" HREF="gettext_toc.html#TOC294">15.5.14 Pascal - Free Pascal Compiler</A></H3>
2407 <A NAME="IDX1236"></A>
2408 <A NAME="IDX1237"></A>
2409 <A NAME="IDX1238"></A>
2411 </P>
2412 <DL COMPACT>
2414 <DT>RPMs
2415 <DD>
2418 <DT>File extension
2419 <DD>
2420 <CODE>pp</CODE>, <CODE>pas</CODE>
2422 <DT>String syntax
2423 <DD>
2424 <CODE>'abc'</CODE>
2426 <DT>gettext shorthand
2427 <DD>
2428 automatic
2430 <DT>gettext/ngettext functions
2431 <DD>
2432 ---, use <CODE>ResourceString</CODE> data type instead
2434 <DT>textdomain
2435 <DD>
2436 ---, use <CODE>TranslateResourceStrings</CODE> function instead
2438 <DT>bindtextdomain
2439 <DD>
2440 ---, use <CODE>TranslateResourceStrings</CODE> function instead
2442 <DT>setlocale
2443 <DD>
2444 automatic, but uses only LANG, not LC_MESSAGES or LC_ALL
2446 <DT>Prerequisite
2447 <DD>
2448 <CODE>{$mode delphi}</CODE> or <CODE>{$mode objfpc}</CODE><BR><CODE>uses gettext;</CODE>
2450 <DT>Use or emulate GNU gettext
2451 <DD>
2452 emulate partially
2454 <DT>Extractor
2455 <DD>
2456 <CODE>ppc386</CODE> followed by <CODE>xgettext</CODE> or <CODE>rstconv</CODE>
2458 <DT>Formatting with positions
2459 <DD>
2460 <CODE>uses sysutils;</CODE><BR><CODE>format "%1:d %0:d"</CODE>
2462 <DT>Portability
2463 <DD>
2466 <DT>po-mode marking
2467 <DD>
2469 </DL>
2472 The Pascal compiler has special support for the <CODE>ResourceString</CODE> data
2473 type. It generates a <CODE>.rst</CODE> file. This is then converted to a
2474 <CODE>.pot</CODE> file by use of <CODE>xgettext</CODE> or <CODE>rstconv</CODE>. At runtime,
2475 a <CODE>.mo</CODE> file corresponding to translations of this <CODE>.pot</CODE> file
2476 can be loaded using the <CODE>TranslateResourceStrings</CODE> function in the
2477 <CODE>gettext</CODE> unit.
2479 </P>
2481 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-pascal</CODE>.
2483 </P>
2486 <H3><A NAME="SEC295" HREF="gettext_toc.html#TOC295">15.5.15 wxWidgets library</A></H3>
2488 <A NAME="IDX1239"></A>
2490 </P>
2491 <DL COMPACT>
2493 <DT>RPMs
2494 <DD>
2495 wxGTK, gettext
2497 <DT>File extension
2498 <DD>
2499 <CODE>cpp</CODE>
2501 <DT>String syntax
2502 <DD>
2503 <CODE>"abc"</CODE>
2505 <DT>gettext shorthand
2506 <DD>
2507 <CODE>_("abc")</CODE>
2509 <DT>gettext/ngettext functions
2510 <DD>
2511 <CODE>wxLocale::GetString</CODE>, <CODE>wxGetTranslation</CODE>
2513 <DT>textdomain
2514 <DD>
2515 <CODE>wxLocale::AddCatalog</CODE>
2517 <DT>bindtextdomain
2518 <DD>
2519 <CODE>wxLocale::AddCatalogLookupPathPrefix</CODE>
2521 <DT>setlocale
2522 <DD>
2523 <CODE>wxLocale::Init</CODE>, <CODE>wxSetLocale</CODE>
2525 <DT>Prerequisite
2526 <DD>
2527 <CODE>#include &#60;wx/intl.h&#62;</CODE>
2529 <DT>Use or emulate GNU gettext
2530 <DD>
2531 emulate, see <CODE>include/wx/intl.h</CODE> and <CODE>src/common/intl.cpp</CODE>
2533 <DT>Extractor
2534 <DD>
2535 <CODE>xgettext</CODE>
2537 <DT>Formatting with positions
2538 <DD>
2539 wxString::Format supports positions if and only if the system has
2540 <CODE>wprintf()</CODE>, <CODE>vswprintf()</CODE> functions and they support positions
2541 according to POSIX.
2543 <DT>Portability
2544 <DD>
2545 fully portable
2547 <DT>po-mode marking
2548 <DD>
2550 </DL>
2554 <H3><A NAME="SEC296" HREF="gettext_toc.html#TOC296">15.5.16 YCP - YaST2 scripting language</A></H3>
2556 <A NAME="IDX1240"></A>
2557 <A NAME="IDX1241"></A>
2559 </P>
2560 <DL COMPACT>
2562 <DT>RPMs
2563 <DD>
2564 libycp, libycp-devel, yast2-core, yast2-core-devel
2566 <DT>File extension
2567 <DD>
2568 <CODE>ycp</CODE>
2570 <DT>String syntax
2571 <DD>
2572 <CODE>"abc"</CODE>
2574 <DT>gettext shorthand
2575 <DD>
2576 <CODE>_("abc")</CODE>
2578 <DT>gettext/ngettext functions
2579 <DD>
2580 <CODE>_()</CODE> with 1 or 3 arguments
2582 <DT>textdomain
2583 <DD>
2584 <CODE>textdomain</CODE> statement
2586 <DT>bindtextdomain
2587 <DD>
2590 <DT>setlocale
2591 <DD>
2594 <DT>Prerequisite
2595 <DD>
2598 <DT>Use or emulate GNU gettext
2599 <DD>
2602 <DT>Extractor
2603 <DD>
2604 <CODE>xgettext</CODE>
2606 <DT>Formatting with positions
2607 <DD>
2608 <CODE>sformat "%2 %1"</CODE>
2610 <DT>Portability
2611 <DD>
2612 fully portable
2614 <DT>po-mode marking
2615 <DD>
2617 </DL>
2620 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-ycp</CODE>.
2622 </P>
2625 <H3><A NAME="SEC297" HREF="gettext_toc.html#TOC297">15.5.17 Tcl - Tk's scripting language</A></H3>
2627 <A NAME="IDX1242"></A>
2628 <A NAME="IDX1243"></A>
2630 </P>
2631 <DL COMPACT>
2633 <DT>RPMs
2634 <DD>
2637 <DT>File extension
2638 <DD>
2639 <CODE>tcl</CODE>
2641 <DT>String syntax
2642 <DD>
2643 <CODE>"abc"</CODE>
2645 <DT>gettext shorthand
2646 <DD>
2647 <CODE>[_ "abc"]</CODE>
2649 <DT>gettext/ngettext functions
2650 <DD>
2651 <CODE>::msgcat::mc</CODE>
2653 <DT>textdomain
2654 <DD>
2657 <DT>bindtextdomain
2658 <DD>
2659 ---, use <CODE>::msgcat::mcload</CODE> instead
2661 <DT>setlocale
2662 <DD>
2663 automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL
2665 <DT>Prerequisite
2666 <DD>
2667 <CODE>package require msgcat</CODE>
2668 <BR><CODE>proc _ {s} {return [::msgcat::mc $s]}</CODE>
2670 <DT>Use or emulate GNU gettext
2671 <DD>
2672 ---, uses a Tcl specific message catalog format
2674 <DT>Extractor
2675 <DD>
2676 <CODE>xgettext -k_</CODE>
2678 <DT>Formatting with positions
2679 <DD>
2680 <CODE>format "%2\$d %1\$d"</CODE>
2682 <DT>Portability
2683 <DD>
2684 fully portable
2686 <DT>po-mode marking
2687 <DD>
2689 </DL>
2692 Two examples are available in the <TT>&lsquo;examples&rsquo;</TT> directory:
2693 <CODE>hello-tcl</CODE>, <CODE>hello-tcl-tk</CODE>.
2695 </P>
2697 Before marking strings as internationalizable, substitutions of variables
2698 into the string need to be converted to <CODE>format</CODE> applications. For
2699 example, <CODE>"file $filename not found"</CODE> becomes
2700 <CODE>[format "file %s not found" $filename]</CODE>.
2701 Only after this is done, can the strings be marked and extracted.
2702 After marking, this example becomes
2703 <CODE>[format [_ "file %s not found"] $filename]</CODE> or
2704 <CODE>[msgcat::mc "file %s not found" $filename]</CODE>. Note that the
2705 <CODE>msgcat::mc</CODE> function implicitly calls <CODE>format</CODE> when more than one
2706 argument is given.
2708 </P>
2711 <H3><A NAME="SEC298" HREF="gettext_toc.html#TOC298">15.5.18 Perl</A></H3>
2713 <A NAME="IDX1244"></A>
2715 </P>
2716 <DL COMPACT>
2718 <DT>RPMs
2719 <DD>
2720 perl
2722 <DT>File extension
2723 <DD>
2724 <CODE>pl</CODE>, <CODE>PL</CODE>, <CODE>pm</CODE>, <CODE>cgi</CODE>
2726 <DT>String syntax
2727 <DD>
2729 <UL>
2731 <LI><CODE>"abc"</CODE>
2733 <LI><CODE>'abc'</CODE>
2735 <LI><CODE>qq (abc)</CODE>
2737 <LI><CODE>q (abc)</CODE>
2739 <LI><CODE>qr /abc/</CODE>
2741 <LI><CODE>qx (/bin/date)</CODE>
2743 <LI><CODE>/pattern match/</CODE>
2745 <LI><CODE>?pattern match?</CODE>
2747 <LI><CODE>s/substitution/operators/</CODE>
2749 <LI><CODE>$tied_hash{"message"}</CODE>
2751 <LI><CODE>$tied_hash_reference-&#62;{"message"}</CODE>
2753 <LI>etc., issue the command <SAMP>&lsquo;man perlsyn&rsquo;</SAMP> for details
2755 </UL>
2757 <DT>gettext shorthand
2758 <DD>
2759 <CODE>__</CODE> (double underscore)
2761 <DT>gettext/ngettext functions
2762 <DD>
2763 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
2764 <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
2766 <DT>textdomain
2767 <DD>
2768 <CODE>textdomain</CODE> function
2770 <DT>bindtextdomain
2771 <DD>
2772 <CODE>bindtextdomain</CODE> function
2774 <DT>bind_textdomain_codeset
2775 <DD>
2776 <CODE>bind_textdomain_codeset</CODE> function
2778 <DT>setlocale
2779 <DD>
2780 Use <CODE>setlocale (LC_ALL, "");</CODE>
2782 <DT>Prerequisite
2783 <DD>
2784 <CODE>use POSIX;</CODE>
2785 <BR><CODE>use Locale::TextDomain;</CODE> (included in the package libintl-perl
2786 which is available on the Comprehensive Perl Archive Network CPAN,
2787 http://www.cpan.org/).
2789 <DT>Use or emulate GNU gettext
2790 <DD>
2791 platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext
2793 <DT>Extractor
2794 <DD>
2795 <CODE>xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k</CODE>
2797 <DT>Formatting with positions
2798 <DD>
2799 Both kinds of format strings support formatting with positions.
2800 <BR><CODE>printf "%2\$d %1\$d", ...</CODE> (requires Perl 5.8.0 or newer)
2801 <BR><CODE>__expand("[new] replaces [old]", old =&#62; $oldvalue, new =&#62; $newvalue)</CODE>
2803 <DT>Portability
2804 <DD>
2805 The <CODE>libintl-perl</CODE> package is platform independent but is not
2806 part of the Perl core. The programmer is responsible for
2807 providing a dummy implementation of the required functions if the
2808 package is not installed on the target system.
2810 <DT>po-mode marking
2811 <DD>
2814 <DT>Documentation
2815 <DD>
2816 Included in <CODE>libintl-perl</CODE>, available on CPAN
2817 (http://www.cpan.org/).
2819 </DL>
2822 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-perl</CODE>.
2824 </P>
2826 <A NAME="IDX1245"></A>
2828 </P>
2830 The <CODE>xgettext</CODE> parser backend for Perl differs significantly from
2831 the parser backends for other programming languages, just as Perl
2832 itself differs significantly from other programming languages. The
2833 Perl parser backend offers many more string marking facilities than
2834 the other backends but it also has some Perl specific limitations, the
2835 worst probably being its imperfectness.
2837 </P>
2841 <H4><A NAME="SEC299" HREF="gettext_toc.html#TOC299">15.5.18.1 General Problems Parsing Perl Code</A></H4>
2844 It is often heard that only Perl can parse Perl. This is not true.
2845 Perl cannot be <EM>parsed</EM> at all, it can only be <EM>executed</EM>.
2846 Perl has various built-in ambiguities that can only be resolved at runtime.
2848 </P>
2850 The following example may illustrate one common problem:
2852 </P>
2854 <PRE>
2855 print gettext "Hello World!";
2856 </PRE>
2859 Although this example looks like a bullet-proof case of a function
2860 invocation, it is not:
2862 </P>
2864 <PRE>
2865 open gettext, "&#62;testfile" or die;
2866 print gettext "Hello world!"
2867 </PRE>
2870 In this context, the string <CODE>gettext</CODE> looks more like a
2871 file handle. But not necessarily:
2873 </P>
2875 <PRE>
2876 use Locale::Messages qw (:libintl_h);
2877 open gettext "&#62;testfile" or die;
2878 print gettext "Hello world!";
2879 </PRE>
2882 Now, the file is probably syntactically incorrect, provided that the module
2883 <CODE>Locale::Messages</CODE> found first in the Perl include path exports a
2884 function <CODE>gettext</CODE>. But what if the module
2885 <CODE>Locale::Messages</CODE> really looks like this?
2887 </P>
2889 <PRE>
2890 use vars qw (*gettext);
2893 </PRE>
2896 In this case, the string <CODE>gettext</CODE> will be interpreted as a file
2897 handle again, and the above example will create a file <TT>&lsquo;testfile&rsquo;</TT>
2898 and write the string “Hello world!” into it. Even advanced
2899 control flow analysis will not really help:
2901 </P>
2903 <PRE>
2904 if (0.5 &#60; rand) {
2905 eval "use Sane";
2906 } else {
2907 eval "use InSane";
2909 print gettext "Hello world!";
2910 </PRE>
2913 If the module <CODE>Sane</CODE> exports a function <CODE>gettext</CODE> that does
2914 what we expect, and the module <CODE>InSane</CODE> opens a file for writing
2915 and associates the <EM>handle</EM> <CODE>gettext</CODE> with this output
2916 stream, we are clueless again about what will happen at runtime. It is
2917 completely unpredictable. The truth is that Perl has so many ways to
2918 fill its symbol table at runtime that it is impossible to interpret a
2919 particular piece of code without executing it.
2921 </P>
2923 Of course, <CODE>xgettext</CODE> will not execute your Perl sources while
2924 scanning for translatable strings, but rather use heuristics in order
2925 to guess what you meant.
2927 </P>
2929 Another problem is the ambiguity of the slash and the question mark.
2930 Their interpretation depends on the context:
2932 </P>
2934 <PRE>
2935 # A pattern match.
2936 print "OK\n" if /foobar/;
2938 # A division.
2939 print 1 / 2;
2941 # Another pattern match.
2942 print "OK\n" if ?foobar?;
2944 # Conditional.
2945 print $x ? "foo" : "bar";
2946 </PRE>
2949 The slash may either act as the division operator or introduce a
2950 pattern match, whereas the question mark may act as the ternary
2951 conditional operator or as a pattern match, too. Other programming
2952 languages like <CODE>awk</CODE> present similar problems, but the consequences of a
2953 misinterpretation are particularly nasty with Perl sources. In <CODE>awk</CODE>
2954 for instance, a statement can never exceed one line and the parser
2955 can recover from a parsing error at the next newline and interpret
2956 the rest of the input stream correctly. Perl is different, as a
2957 pattern match is terminated by the next appearance of the delimiter
2958 (the slash or the question mark) in the input stream, regardless of
2959 the semantic context. If a slash is really a division sign but
2960 mis-interpreted as a pattern match, the rest of the input file is most
2961 probably parsed incorrectly.
2963 </P>
2965 There are certain cases, where the ambiguity cannot be resolved at all:
2967 </P>
2969 <PRE>
2970 $x = wantarray ? 1 : 0;
2971 </PRE>
2974 The Perl built-in function <CODE>wantarray</CODE> does not accept any arguments.
2975 The Perl parser therefore knows that the question mark does not start
2976 a regular expression but is the ternary conditional operator.
2978 </P>
2980 <PRE>
2981 sub wantarrays {}
2982 $x = wantarrays ? 1 : 0;
2983 </PRE>
2986 Now the situation is different. The function <CODE>wantarrays</CODE> takes
2987 a variable number of arguments (like any non-prototyped Perl function).
2988 The question mark is now the delimiter of a pattern match, and hence
2989 the piece of code does not compile.
2991 </P>
2993 <PRE>
2994 sub wantarrays() {}
2995 $x = wantarrays ? 1 : 0;
2996 </PRE>
2999 Now the function is prototyped, Perl knows that it does not accept any
3000 arguments, and the question mark is therefore interpreted as the
3001 ternaray operator again. But that unfortunately outsmarts <CODE>xgettext</CODE>.
3003 </P>
3005 The Perl parser in <CODE>xgettext</CODE> cannot know whether a function has
3006 a prototype and what that prototype would look like. It therefore makes
3007 an educated guess. If a function is known to be a Perl built-in and
3008 this function does not accept any arguments, a following question mark
3009 or slash is treated as an operator, otherwise as the delimiter of a
3010 following regular expression. The Perl built-ins that do not accept
3011 arguments are <CODE>wantarray</CODE>, <CODE>fork</CODE>, <CODE>time</CODE>, <CODE>times</CODE>,
3012 <CODE>getlogin</CODE>, <CODE>getppid</CODE>, <CODE>getpwent</CODE>, <CODE>getgrent</CODE>,
3013 <CODE>gethostent</CODE>, <CODE>getnetent</CODE>, <CODE>getprotoent</CODE>, <CODE>getservent</CODE>,
3014 <CODE>setpwent</CODE>, <CODE>setgrent</CODE>, <CODE>endpwent</CODE>, <CODE>endgrent</CODE>,
3015 <CODE>endhostent</CODE>, <CODE>endnetent</CODE>, <CODE>endprotoent</CODE>, and
3016 <CODE>endservent</CODE>.
3018 </P>
3020 If you find that <CODE>xgettext</CODE> fails to extract strings from
3021 portions of your sources, you should therefore look out for slashes
3022 and/or question marks preceding these sections. You may have come
3023 across a bug in <CODE>xgettext</CODE>'s Perl parser (and of course you
3024 should report that bug). In the meantime you should consider to
3025 reformulate your code in a manner less challenging to <CODE>xgettext</CODE>.
3027 </P>
3029 In particular, if the parser is too dumb to see that a function
3030 does not accept arguments, use parentheses:
3032 </P>
3034 <PRE>
3035 $x = somefunc() ? 1 : 0;
3036 $y = (somefunc) ? 1 : 0;
3037 </PRE>
3040 In fact the Perl parser itself has similar problems and warns you
3041 about such constructs.
3043 </P>
3046 <H4><A NAME="SEC300" HREF="gettext_toc.html#TOC300">15.5.18.2 Which keywords will xgettext look for?</A></H4>
3048 <A NAME="IDX1246"></A>
3050 </P>
3052 Unless you instruct <CODE>xgettext</CODE> otherwise by invoking it with one
3053 of the options <CODE>--keyword</CODE> or <CODE>-k</CODE>, it will recognize the
3054 following keywords in your Perl sources:
3056 </P>
3058 <UL>
3060 <LI><CODE>gettext</CODE>
3062 <LI><CODE>dgettext</CODE>
3064 <LI><CODE>dcgettext</CODE>
3066 <LI><CODE>ngettext:1,2</CODE>
3068 The first (singular) and the second (plural) argument will be
3069 extracted.
3071 <LI><CODE>dngettext:1,2</CODE>
3073 The first (singular) and the second (plural) argument will be
3074 extracted.
3076 <LI><CODE>dcngettext:1,2</CODE>
3078 The first (singular) and the second (plural) argument will be
3079 extracted.
3081 <LI><CODE>gettext_noop</CODE>
3083 <LI><CODE>%gettext</CODE>
3085 The keys of lookups into the hash <CODE>%gettext</CODE> will be extracted.
3087 <LI><CODE>$gettext</CODE>
3089 The keys of lookups into the hash reference <CODE>$gettext</CODE> will be extracted.
3091 </UL>
3095 <H4><A NAME="SEC301" HREF="gettext_toc.html#TOC301">15.5.18.3 How to Extract Hash Keys</A></H4>
3097 <A NAME="IDX1247"></A>
3099 </P>
3101 Translating messages at runtime is normally performed by looking up the
3102 original string in the translation database and returning the
3103 translated version. The “natural” Perl implementation is a hash
3104 lookup, and, of course, <CODE>xgettext</CODE> supports such practice.
3106 </P>
3108 <PRE>
3109 print __"Hello world!";
3110 print $__{"Hello world!"};
3111 print $__-&#62;{"Hello world!"};
3112 print $$__{"Hello world!"};
3113 </PRE>
3116 The above four lines all do the same thing. The Perl module
3117 <CODE>Locale::TextDomain</CODE> exports by default a hash <CODE>%__</CODE> that
3118 is tied to the function <CODE>__()</CODE>. It also exports a reference
3119 <CODE>$__</CODE> to <CODE>%__</CODE>.
3121 </P>
3123 If an argument to the <CODE>xgettext</CODE> option <CODE>--keyword</CODE>,
3124 resp. <CODE>-k</CODE> starts with a percent sign, the rest of the keyword is
3125 interpreted as the name of a hash. If it starts with a dollar
3126 sign, the rest of the keyword is interpreted as a reference to a
3127 hash.
3129 </P>
3131 Note that you can omit the quotation marks (single or double) around
3132 the hash key (almost) whenever Perl itself allows it:
3134 </P>
3136 <PRE>
3137 print $gettext{Error};
3138 </PRE>
3141 The exact rule is: You can omit the surrounding quotes, when the hash
3142 key is a valid C (!) identifier, i.e. when it starts with an
3143 underscore or an ASCII letter and is followed by an arbitrary number
3144 of underscores, ASCII letters or digits. Other Unicode characters
3145 are <EM>not</EM> allowed, regardless of the <CODE>use utf8</CODE> pragma.
3147 </P>
3150 <H4><A NAME="SEC302" HREF="gettext_toc.html#TOC302">15.5.18.4 What are Strings And Quote-like Expressions?</A></H4>
3152 <A NAME="IDX1248"></A>
3154 </P>
3156 Perl offers a plethora of different string constructs. Those that can
3157 be used either as arguments to functions or inside braces for hash
3158 lookups are generally supported by <CODE>xgettext</CODE>.
3160 </P>
3162 <UL>
3163 <LI><STRONG>double-quoted strings</STRONG>
3165 <BR>
3167 <PRE>
3168 print gettext "Hello World!";
3169 </PRE>
3171 <LI><STRONG>single-quoted strings</STRONG>
3173 <BR>
3175 <PRE>
3176 print gettext 'Hello World!';
3177 </PRE>
3179 <LI><STRONG>the operator qq</STRONG>
3181 <BR>
3183 <PRE>
3184 print gettext qq |Hello World!|;
3185 print gettext qq &#60;E-mail: &#60;guido\@imperia.net&#62;&#62;;
3186 </PRE>
3188 The operator <CODE>qq</CODE> is fully supported. You can use arbitrary
3189 delimiters, including the four bracketing delimiters (round, angle,
3190 square, curly) that nest.
3192 <LI><STRONG>the operator q</STRONG>
3194 <BR>
3196 <PRE>
3197 print gettext q |Hello World!|;
3198 print gettext q &#60;E-mail: &#60;guido@imperia.net&#62;&#62;;
3199 </PRE>
3201 The operator <CODE>q</CODE> is fully supported. You can use arbitrary
3202 delimiters, including the four bracketing delimiters (round, angle,
3203 square, curly) that nest.
3205 <LI><STRONG>the operator qx</STRONG>
3207 <BR>
3209 <PRE>
3210 print gettext qx ;LANGUAGE=C /bin/date;
3211 print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
3212 </PRE>
3214 The operator <CODE>qx</CODE> is fully supported. You can use arbitrary
3215 delimiters, including the four bracketing delimiters (round, angle,
3216 square, curly) that nest.
3218 The example is actually a useless use of <CODE>gettext</CODE>. It will
3219 invoke the <CODE>gettext</CODE> function on the output of the command
3220 specified with the <CODE>qx</CODE> operator. The feature was included
3221 in order to make the interface consistent (the parser will extract
3222 all strings and quote-like expressions).
3224 <LI><STRONG>here documents</STRONG>
3226 <BR>
3228 <PRE>
3229 print gettext &#60;&#60;'EOF';
3230 program not found in $PATH
3233 print ngettext &#60;&#60;EOF, &#60;&#60;"EOF";
3234 one file deleted
3236 several files deleted
3238 </PRE>
3240 Here-documents are recognized. If the delimiter is enclosed in single
3241 quotes, the string is not interpolated. If it is enclosed in double
3242 quotes or has no quotes at all, the string is interpolated.
3244 Delimiters that start with a digit are not supported!
3246 </UL>
3250 <H4><A NAME="SEC303" HREF="gettext_toc.html#TOC303">15.5.18.5 Invalid Uses Of String Interpolation</A></H4>
3252 <A NAME="IDX1249"></A>
3254 </P>
3256 Perl is capable of interpolating variables into strings. This offers
3257 some nice features in localized programs but can also lead to
3258 problems.
3260 </P>
3262 A common error is a construct like the following:
3264 </P>
3266 <PRE>
3267 print gettext "This is the program $0!\n";
3268 </PRE>
3271 Perl will interpolate at runtime the value of the variable <CODE>$0</CODE>
3272 into the argument of the <CODE>gettext()</CODE> function. Hence, this
3273 argument is not a string constant but a variable argument (<CODE>$0</CODE>
3274 is a global variable that holds the name of the Perl script being
3275 executed). The interpolation is performed by Perl before the string
3276 argument is passed to <CODE>gettext()</CODE> and will therefore depend on
3277 the name of the script which can only be determined at runtime.
3278 Consequently, it is almost impossible that a translation can be looked
3279 up at runtime (except if, by accident, the interpolated string is found
3280 in the message catalog).
3282 </P>
3284 The <CODE>xgettext</CODE> program will therefore terminate parsing with a fatal
3285 error if it encounters a variable inside of an extracted string. In
3286 general, this will happen for all kinds of string interpolations that
3287 cannot be safely performed at compile time. If you absolutely know
3288 what you are doing, you can always circumvent this behavior:
3290 </P>
3292 <PRE>
3293 my $know_what_i_am_doing = "This is program $0!\n";
3294 print gettext $know_what_i_am_doing;
3295 </PRE>
3298 Since the parser only recognizes strings and quote-like expressions,
3299 but not variables or other terms, the above construct will be
3300 accepted. You will have to find another way, however, to let your
3301 original string make it into your message catalog.
3303 </P>
3305 If invoked with the option <CODE>--extract-all</CODE>, resp. <CODE>-a</CODE>,
3306 variable interpolation will be accepted. Rationale: You will
3307 generally use this option in order to prepare your sources for
3308 internationalization.
3310 </P>
3312 Please see the manual page <SAMP>&lsquo;man perlop&rsquo;</SAMP> for details of strings and
3313 quote-like expressions that are subject to interpolation and those
3314 that are not. Safe interpolations (that will not lead to a fatal
3315 error) are:
3317 </P>
3319 <UL>
3321 <LI>the escape sequences <CODE>\t</CODE> (tab, HT, TAB), <CODE>\n</CODE>
3323 (newline, NL), <CODE>\r</CODE> (return, CR), <CODE>\f</CODE> (form feed, FF),
3324 <CODE>\b</CODE> (backspace, BS), <CODE>\a</CODE> (alarm, bell, BEL), and <CODE>\e</CODE>
3325 (escape, ESC).
3327 <LI>octal chars, like <CODE>\033</CODE>
3329 <BR>
3330 Note that octal escapes in the range of 400-777 are translated into a
3331 UTF-8 representation, regardless of the presence of the <CODE>use utf8</CODE> pragma.
3333 <LI>hex chars, like <CODE>\x1b</CODE>
3335 <LI>wide hex chars, like <CODE>\x{263a}</CODE>
3337 <BR>
3338 Note that this escape is translated into a UTF-8 representation,
3339 regardless of the presence of the <CODE>use utf8</CODE> pragma.
3341 <LI>control chars, like <CODE>\c[</CODE> (CTRL-[)
3343 <LI>named Unicode chars, like <CODE>\N{LATIN CAPITAL LETTER C WITH CEDILLA}</CODE>
3345 <BR>
3346 Note that this escape is translated into a UTF-8 representation,
3347 regardless of the presence of the <CODE>use utf8</CODE> pragma.
3348 </UL>
3351 The following escapes are considered partially safe:
3353 </P>
3355 <UL>
3357 <LI><CODE>\l</CODE> lowercase next char
3359 <LI><CODE>\u</CODE> uppercase next char
3361 <LI><CODE>\L</CODE> lowercase till \E
3363 <LI><CODE>\U</CODE> uppercase till \E
3365 <LI><CODE>\E</CODE> end case modification
3367 <LI><CODE>\Q</CODE> quote non-word characters till \E
3369 </UL>
3372 These escapes are only considered safe if the string consists of
3373 ASCII characters only. Translation of characters outside the range
3374 defined by ASCII is locale-dependent and can actually only be performed
3375 at runtime; <CODE>xgettext</CODE> doesn't do these locale-dependent translations
3376 at extraction time.
3378 </P>
3380 Except for the modifier <CODE>\Q</CODE>, these translations, albeit valid,
3381 are generally useless and only obfuscate your sources. If a
3382 translation can be safely performed at compile time you can just as
3383 well write what you mean.
3385 </P>
3388 <H4><A NAME="SEC304" HREF="gettext_toc.html#TOC304">15.5.18.6 Valid Uses Of String Interpolation</A></H4>
3390 <A NAME="IDX1250"></A>
3392 </P>
3394 Perl is often used to generate sources for other programming languages
3395 or arbitrary file formats. Web applications that output HTML code
3396 make a prominent example for such usage.
3398 </P>
3400 You will often come across situations where you want to intersperse
3401 code written in the target (programming) language with translatable
3402 messages, like in the following HTML example:
3404 </P>
3406 <PRE>
3407 print gettext &#60;&#60;EOF;
3408 &#60;h1&#62;My Homepage&#60;/h1&#62;
3409 &#60;script language="JavaScript"&#62;&#60;!--
3410 for (i = 0; i &#60; 100; ++i) {
3411 alert ("Thank you so much for visiting my homepage!");
3413 //--&#62;&#60;/script&#62;
3415 </PRE>
3418 The parser will extract the entire here document, and it will appear
3419 entirely in the resulting PO file, including the JavaScript snippet
3420 embedded in the HTML code. If you exaggerate with constructs like
3421 the above, you will run the risk that the translators of your package
3422 will look out for a less challenging project. You should consider an
3423 alternative expression here:
3425 </P>
3427 <PRE>
3428 print &#60;&#60;EOF;
3429 &#60;h1&#62;$gettext{"My Homepage"}&#60;/h1&#62;
3430 &#60;script language="JavaScript"&#62;&#60;!--
3431 for (i = 0; i &#60; 100; ++i) {
3432 alert ("$gettext{'Thank you so much for visiting my homepage!'}");
3434 //--&#62;&#60;/script&#62;
3436 </PRE>
3439 Only the translatable portions of the code will be extracted here, and
3440 the resulting PO file will begrudgingly improve in terms of readability.
3442 </P>
3444 You can interpolate hash lookups in all strings or quote-like
3445 expressions that are subject to interpolation (see the manual page
3446 <SAMP>&lsquo;man perlop&rsquo;</SAMP> for details). Double interpolation is invalid, however:
3448 </P>
3450 <PRE>
3451 # TRANSLATORS: Replace "the earth" with the name of your planet.
3452 print gettext qq{Welcome to $gettext-&#62;{"the earth"}};
3453 </PRE>
3456 The <CODE>qq</CODE>-quoted string is recognized as an argument to <CODE>xgettext</CODE> in
3457 the first place, and checked for invalid variable interpolation. The
3458 dollar sign of hash-dereferencing will therefore terminate the parser
3459 with an “invalid interpolation” error.
3461 </P>
3463 It is valid to interpolate hash lookups in regular expressions:
3465 </P>
3467 <PRE>
3468 if ($var =~ /$gettext{"the earth"}/) {
3469 print gettext "Match!\n";
3471 s/$gettext{"U. S. A."}/$gettext{"U. S. A."} $gettext{"(dial +0)"}/g;
3472 </PRE>
3476 <H4><A NAME="SEC305" HREF="gettext_toc.html#TOC305">15.5.18.7 When To Use Parentheses</A></H4>
3478 <A NAME="IDX1251"></A>
3480 </P>
3482 In Perl, parentheses around function arguments are mostly optional.
3483 <CODE>xgettext</CODE> will always assume that all
3484 recognized keywords (except for hashes and hash references) are names
3485 of properly prototyped functions, and will (hopefully) only require
3486 parentheses where Perl itself requires them. All constructs in the
3487 following example are therefore ok to use:
3489 </P>
3491 <PRE>
3492 print gettext ("Hello World!\n");
3493 print gettext "Hello World!\n";
3494 print dgettext ($package =&#62; "Hello World!\n");
3495 print dgettext $package, "Hello World!\n";
3497 # The "fat comma" =&#62; turns the left-hand side argument into a
3498 # single-quoted string!
3499 print dgettext smellovision =&#62; "Hello World!\n";
3501 # The following assignment only works with prototyped functions.
3502 # Otherwise, the functions will act as "greedy" list operators and
3503 # eat up all following arguments.
3504 my $anonymous_hash = {
3505 planet =&#62; gettext "earth",
3506 cakes =&#62; ngettext "one cake", "several cakes", $n,
3507 still =&#62; $works,
3509 # The same without fat comma:
3510 my $other_hash = {
3511 'planet', gettext "earth",
3512 'cakes', ngettext "one cake", "several cakes", $n,
3513 'still', $works,
3516 # Parentheses are only significant for the first argument.
3517 print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
3518 </PRE>
3522 <H4><A NAME="SEC306" HREF="gettext_toc.html#TOC306">15.5.18.8 How To Grok with Long Lines</A></H4>
3524 <A NAME="IDX1252"></A>
3526 </P>
3528 The necessity of long messages can often lead to a cumbersome or
3529 unreadable coding style. Perl has several options that may prevent
3530 you from writing unreadable code, and
3531 <CODE>xgettext</CODE> does its best to do likewise. This is where the dot
3532 operator (the string concatenation operator) may come in handy:
3534 </P>
3536 <PRE>
3537 print gettext ("This is a very long"
3538 . " message that is still"
3539 . " readable, because"
3540 . " it is split into"
3541 . " multiple lines.\n");
3542 </PRE>
3545 Perl is smart enough to concatenate these constant string fragments
3546 into one long string at compile time, and so is
3547 <CODE>xgettext</CODE>. You will only find one long message in the resulting
3548 POT file.
3550 </P>
3552 Note that the future Perl 6 will probably use the underscore
3553 (<SAMP>&lsquo;_&rsquo;</SAMP>) as the string concatenation operator, and the dot
3554 (<SAMP>&lsquo;.&rsquo;</SAMP>) for dereferencing. This new syntax is not yet supported by
3555 <CODE>xgettext</CODE>.
3557 </P>
3559 If embedded newline characters are not an issue, or even desired, you
3560 may also insert newline characters inside quoted strings wherever you
3561 feel like it:
3563 </P>
3565 <PRE>
3566 print gettext ("&#60;em&#62;In HTML output
3567 embedded newlines are generally no
3568 problem, since adjacent whitespace
3569 is always rendered into a single
3570 space character.&#60;/em&#62;");
3571 </PRE>
3574 You may also consider to use here documents:
3576 </P>
3578 <PRE>
3579 print gettext &#60;&#60;EOF;
3580 &#60;em&#62;In HTML output
3581 embedded newlines are generally no
3582 problem, since adjacent whitespace
3583 is always rendered into a single
3584 space character.&#60;/em&#62;
3586 </PRE>
3589 Please do not forget that the line breaks are real, i.e. they
3590 translate into newline characters that will consequently show up in
3591 the resulting POT file.
3593 </P>
3596 <H4><A NAME="SEC307" HREF="gettext_toc.html#TOC307">15.5.18.9 Bugs, Pitfalls, And Things That Do Not Work</A></H4>
3598 <A NAME="IDX1253"></A>
3600 </P>
3602 The foregoing sections should have proven that
3603 <CODE>xgettext</CODE> is quite smart in extracting translatable strings from
3604 Perl sources. Yet, some more or less exotic constructs that could be
3605 expected to work, actually do not work.
3607 </P>
3609 One of the more relevant limitations can be found in the
3610 implementation of variable interpolation inside quoted strings. Only
3611 simple hash lookups can be used there:
3613 </P>
3615 <PRE>
3616 print &#60;&#60;EOF;
3617 $gettext{"The dot operator"
3618 . " does not work"
3619 . "here!"}
3620 Likewise, you cannot @{[ gettext ("interpolate function calls") ]}
3621 inside quoted strings or quote-like expressions.
3623 </PRE>
3626 This is valid Perl code and will actually trigger invocations of the
3627 <CODE>gettext</CODE> function at runtime. Yet, the Perl parser in
3628 <CODE>xgettext</CODE> will fail to recognize the strings. A less obvious
3629 example can be found in the interpolation of regular expressions:
3631 </P>
3633 <PRE>
3634 s/&#60;!--START_OF_WEEK--&#62;/gettext ("Sunday")/e;
3635 </PRE>
3638 The modifier <CODE>e</CODE> will cause the substitution to be interpreted as
3639 an evaluable statement. Consequently, at runtime the function
3640 <CODE>gettext()</CODE> is called, but again, the parser fails to extract the
3641 string “Sunday”. Use a temporary variable as a simple workaround if
3642 you really happen to need this feature:
3644 </P>
3646 <PRE>
3647 my $sunday = gettext "Sunday";
3648 s/&#60;!--START_OF_WEEK--&#62;/$sunday/;
3649 </PRE>
3652 Hash slices would also be handy but are not recognized:
3654 </P>
3656 <PRE>
3657 my @weekdays = @gettext{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
3658 'Thursday', 'Friday', 'Saturday'};
3659 # Or even:
3660 @weekdays = @gettext{qw (Sunday Monday Tuesday Wednesday Thursday
3661 Friday Saturday) };
3662 </PRE>
3665 This is perfectly valid usage of the tied hash <CODE>%gettext</CODE> but the
3666 strings are not recognized and therefore will not be extracted.
3668 </P>
3670 Another caveat of the current version is its rudimentary support for
3671 non-ASCII characters in identifiers. You may encounter serious
3672 problems if you use identifiers with characters outside the range of
3673 'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
3675 </P>
3677 Maybe some of these missing features will be implemented in future
3678 versions, but since you can always make do without them at minimal effort,
3679 these todos have very low priority.
3681 </P>
3683 A nasty problem are brace format strings that already contain braces
3684 as part of the normal text, for example the usage strings typically
3685 encountered in programs:
3687 </P>
3689 <PRE>
3690 die "usage: $0 {OPTIONS} FILENAME...\n";
3691 </PRE>
3694 If you want to internationalize this code with Perl brace format strings,
3695 you will run into a problem:
3697 </P>
3699 <PRE>
3700 die __x ("usage: {program} {OPTIONS} FILENAME...\n", program =&#62; $0);
3701 </PRE>
3704 Whereas <SAMP>&lsquo;{program}&rsquo;</SAMP> is a placeholder, <SAMP>&lsquo;{OPTIONS}&rsquo;</SAMP>
3705 is not and should probably be translated. Yet, there is no way to teach
3706 the Perl parser in <CODE>xgettext</CODE> to recognize the first one, and leave
3707 the other one alone.
3709 </P>
3711 There are two possible work-arounds for this problem. If you are
3712 sure that your program will run under Perl 5.8.0 or newer (these
3713 Perl versions handle positional parameters in <CODE>printf()</CODE>) or
3714 if you are sure that the translator will not have to reorder the arguments
3715 in her translation -- for example if you have only one brace placeholder
3716 in your string, or if it describes a syntax, like in this one --, you can
3717 mark the string as <CODE>no-perl-brace-format</CODE> and use <CODE>printf()</CODE>:
3719 </P>
3721 <PRE>
3722 # xgettext: no-perl-brace-format
3723 die sprintf ("usage: %s {OPTIONS} FILENAME...\n", $0);
3724 </PRE>
3727 If you want to use the more portable Perl brace format, you will have to do
3728 put placeholders in place of the literal braces:
3730 </P>
3732 <PRE>
3733 die __x ("usage: {program} {[}OPTIONS{]} FILENAME...\n",
3734 program =&#62; $0, '[' =&#62; '{', ']' =&#62; '}');
3735 </PRE>
3738 Perl brace format strings know no escaping mechanism. No matter how this
3739 escaping mechanism looked like, it would either give the programmer a
3740 hard time, make translating Perl brace format strings heavy-going, or
3741 result in a performance penalty at runtime, when the format directives
3742 get executed. Most of the time you will happily get along with
3743 <CODE>printf()</CODE> for this special case.
3745 </P>
3748 <H3><A NAME="SEC308" HREF="gettext_toc.html#TOC308">15.5.19 PHP Hypertext Preprocessor</A></H3>
3750 <A NAME="IDX1254"></A>
3752 </P>
3753 <DL COMPACT>
3755 <DT>RPMs
3756 <DD>
3757 mod_php4, mod_php4-core, phpdoc
3759 <DT>File extension
3760 <DD>
3761 <CODE>php</CODE>, <CODE>php3</CODE>, <CODE>php4</CODE>
3763 <DT>String syntax
3764 <DD>
3765 <CODE>"abc"</CODE>, <CODE>'abc'</CODE>
3767 <DT>gettext shorthand
3768 <DD>
3769 <CODE>_("abc")</CODE>
3771 <DT>gettext/ngettext functions
3772 <DD>
3773 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>; starting with PHP 4.2.0
3774 also <CODE>ngettext</CODE>, <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
3776 <DT>textdomain
3777 <DD>
3778 <CODE>textdomain</CODE> function
3780 <DT>bindtextdomain
3781 <DD>
3782 <CODE>bindtextdomain</CODE> function
3784 <DT>setlocale
3785 <DD>
3786 Programmer must call <CODE>setlocale (LC_ALL, "")</CODE>
3788 <DT>Prerequisite
3789 <DD>
3792 <DT>Use or emulate GNU gettext
3793 <DD>
3796 <DT>Extractor
3797 <DD>
3798 <CODE>xgettext</CODE>
3800 <DT>Formatting with positions
3801 <DD>
3802 <CODE>printf "%2\$d %1\$d"</CODE>
3804 <DT>Portability
3805 <DD>
3806 On platforms without gettext, the functions are not available.
3808 <DT>po-mode marking
3809 <DD>
3811 </DL>
3814 An example is available in the <TT>&lsquo;examples&rsquo;</TT> directory: <CODE>hello-php</CODE>.
3816 </P>
3819 <H3><A NAME="SEC309" HREF="gettext_toc.html#TOC309">15.5.20 Pike</A></H3>
3821 <A NAME="IDX1255"></A>
3823 </P>
3824 <DL COMPACT>
3826 <DT>RPMs
3827 <DD>
3828 roxen
3830 <DT>File extension
3831 <DD>
3832 <CODE>pike</CODE>
3834 <DT>String syntax
3835 <DD>
3836 <CODE>"abc"</CODE>
3838 <DT>gettext shorthand
3839 <DD>
3842 <DT>gettext/ngettext functions
3843 <DD>
3844 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>
3846 <DT>textdomain
3847 <DD>
3848 <CODE>textdomain</CODE> function
3850 <DT>bindtextdomain
3851 <DD>
3852 <CODE>bindtextdomain</CODE> function
3854 <DT>setlocale
3855 <DD>
3856 <CODE>setlocale</CODE> function
3858 <DT>Prerequisite
3859 <DD>
3860 <CODE>import Locale.Gettext;</CODE>
3862 <DT>Use or emulate GNU gettext
3863 <DD>
3866 <DT>Extractor
3867 <DD>
3870 <DT>Formatting with positions
3871 <DD>
3874 <DT>Portability
3875 <DD>
3876 On platforms without gettext, the functions are not available.
3878 <DT>po-mode marking
3879 <DD>
3881 </DL>
3885 <H3><A NAME="SEC310" HREF="gettext_toc.html#TOC310">15.5.21 GNU Compiler Collection sources</A></H3>
3887 <A NAME="IDX1256"></A>
3889 </P>
3890 <DL COMPACT>
3892 <DT>RPMs
3893 <DD>
3896 <DT>File extension
3897 <DD>
3898 <CODE>c</CODE>, <CODE>h</CODE>.
3900 <DT>String syntax
3901 <DD>
3902 <CODE>"abc"</CODE>
3904 <DT>gettext shorthand
3905 <DD>
3906 <CODE>_("abc")</CODE>
3908 <DT>gettext/ngettext functions
3909 <DD>
3910 <CODE>gettext</CODE>, <CODE>dgettext</CODE>, <CODE>dcgettext</CODE>, <CODE>ngettext</CODE>,
3911 <CODE>dngettext</CODE>, <CODE>dcngettext</CODE>
3913 <DT>textdomain
3914 <DD>
3915 <CODE>textdomain</CODE> function
3917 <DT>bindtextdomain
3918 <DD>
3919 <CODE>bindtextdomain</CODE> function
3921 <DT>setlocale
3922 <DD>
3923 Programmer must call <CODE>setlocale (LC_ALL, "")</CODE>
3925 <DT>Prerequisite
3926 <DD>
3927 <CODE>#include "intl.h"</CODE>
3929 <DT>Use or emulate GNU gettext
3930 <DD>
3933 <DT>Extractor
3934 <DD>
3935 <CODE>xgettext -k_</CODE>
3937 <DT>Formatting with positions
3938 <DD>
3941 <DT>Portability
3942 <DD>
3943 Uses autoconf macros
3945 <DT>po-mode marking
3946 <DD>
3948 </DL>
3952 <H2><A NAME="SEC311" HREF="gettext_toc.html#TOC311">15.6 Internationalizable Data</A></H2>
3955 Here is a list of other data formats which can be internationalized
3956 using GNU gettext.
3958 </P>
3962 <H3><A NAME="SEC312" HREF="gettext_toc.html#TOC312">15.6.1 POT - Portable Object Template</A></H3>
3964 <DL COMPACT>
3966 <DT>RPMs
3967 <DD>
3968 gettext
3970 <DT>File extension
3971 <DD>
3972 <CODE>pot</CODE>, <CODE>po</CODE>
3974 <DT>Extractor
3975 <DD>
3976 <CODE>xgettext</CODE>
3977 </DL>
3981 <H3><A NAME="SEC313" HREF="gettext_toc.html#TOC313">15.6.2 Resource String Table</A></H3>
3983 <A NAME="IDX1257"></A>
3985 </P>
3986 <DL COMPACT>
3988 <DT>RPMs
3989 <DD>
3992 <DT>File extension
3993 <DD>
3994 <CODE>rst</CODE>
3996 <DT>Extractor
3997 <DD>
3998 <CODE>xgettext</CODE>, <CODE>rstconv</CODE>
3999 </DL>
4003 <H3><A NAME="SEC314" HREF="gettext_toc.html#TOC314">15.6.3 Glade - GNOME user interface description</A></H3>
4005 <DL COMPACT>
4007 <DT>RPMs
4008 <DD>
4009 glade, libglade, glade2, libglade2, intltool
4011 <DT>File extension
4012 <DD>
4013 <CODE>glade</CODE>, <CODE>glade2</CODE>
4015 <DT>Extractor
4016 <DD>
4017 <CODE>xgettext</CODE>, <CODE>libglade-xgettext</CODE>, <CODE>xml-i18n-extract</CODE>, <CODE>intltool-extract</CODE>
4018 </DL>
4020 <P><HR><P>
4021 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_14.html">previous</A>, <A HREF="gettext_16.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
4022 </BODY>
4023 </HTML>