Install gettext-0.18.1.1.tar.gz
[msysgit.git] / mingw / share / doc / gettext / gettext_3.html
blob7c62e5462dbc7ff07073617b5604f2a0af7e7157
1 <HTML>
2 <HEAD>
3 <!-- This HTML file has been created by texi2html 1.52b
4 from gettext.texi on 6 June 2010 -->
6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7 <TITLE>GNU gettext utilities - 3 The Format of PO Files</TITLE>
8 </HEAD>
9 <BODY>
10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11 <P><HR><P>
14 <H1><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3 The Format of PO Files</A></H1>
15 <P>
16 <A NAME="IDX55"></A>
17 <A NAME="IDX56"></A>
19 </P>
20 <P>
21 The GNU <CODE>gettext</CODE> toolset helps programmers and translators
22 at producing, updating and using translation files, mainly those
23 PO files which are textual, editable files. This chapter explains
24 the format of PO files.
26 </P>
27 <P>
28 A PO file is made up of many entries, each entry holding the relation
29 between an original untranslated string and its corresponding
30 translation. All entries in a given PO file usually pertain
31 to a single project, and all translations are expressed in a single
32 target language. One PO file <EM>entry</EM> has the following schematic
33 structure:
35 </P>
37 <PRE>
38 <VAR>white-space</VAR>
39 # <VAR>translator-comments</VAR>
40 #. <VAR>extracted-comments</VAR>
41 #: <VAR>reference</VAR>...
42 #, <VAR>flag</VAR>...
43 #| msgid <VAR>previous-untranslated-string</VAR>
44 msgid <VAR>untranslated-string</VAR>
45 msgstr <VAR>translated-string</VAR>
46 </PRE>
48 <P>
49 The general structure of a PO file should be well understood by
50 the translator. When using PO mode, very little has to be known
51 about the format details, as PO mode takes care of them for her.
53 </P>
54 <P>
55 A simple entry can look like this:
57 </P>
59 <PRE>
60 #: lib/error.c:116
61 msgid "Unknown system error"
62 msgstr "Error desconegut del sistema"
63 </PRE>
65 <P>
66 <A NAME="IDX57"></A>
67 <A NAME="IDX58"></A>
68 <A NAME="IDX59"></A>
69 Entries begin with some optional white space. Usually, when generated
70 through GNU <CODE>gettext</CODE> tools, there is exactly one blank line
71 between entries. Then comments follow, on lines all starting with the
72 character <CODE>#</CODE>. There are two kinds of comments: those which have
73 some white space immediately following the <CODE>#</CODE> - the <VAR>translator
74 comments</VAR> -, which comments are created and maintained exclusively by the
75 translator, and those which have some non-white character just after the
76 <CODE>#</CODE> - the <VAR>automatic comments</VAR> -, which comments are created and
77 maintained automatically by GNU <CODE>gettext</CODE> tools. Comment lines
78 starting with <CODE>#.</CODE> contain comments given by the programmer, directed
79 at the translator; these comments are called <VAR>extracted comments</VAR>
80 because the <CODE>xgettext</CODE> program extracts them from the program's
81 source code. Comment lines starting with <CODE>#:</CODE> contain references to
82 the program's source code. Comment lines starting with <CODE>#,</CODE> contain
83 flags; more about these below. Comment lines starting with <CODE>#|</CODE>
84 contain the previous untranslated string for which the translator gave
85 a translation.
87 </P>
88 <P>
89 All comments, of either kind, are optional.
91 </P>
92 <P>
93 <A NAME="IDX60"></A>
94 <A NAME="IDX61"></A>
95 After white space and comments, entries show two strings, namely
96 first the untranslated string as it appears in the original program
97 sources, and then, the translation of this string. The original
98 string is introduced by the keyword <CODE>msgid</CODE>, and the translation,
99 by <CODE>msgstr</CODE>. The two strings, untranslated and translated,
100 are quoted in various ways in the PO file, using <CODE>"</CODE>
101 delimiters and <CODE>\</CODE> escapes, but the translator does not really
102 have to pay attention to the precise quoting format, as PO mode fully
103 takes care of quoting for her.
105 </P>
107 The <CODE>msgid</CODE> strings, as well as automatic comments, are produced
108 and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not
109 provide means for the translator to alter these. The most she can
110 do is merely deleting them, and only by deleting the whole entry.
111 On the other hand, the <CODE>msgstr</CODE> string, as well as translator
112 comments, are really meant for the translator, and PO mode gives her
113 the full control she needs.
115 </P>
117 The comment lines beginning with <CODE>#,</CODE> are special because they are
118 not completely ignored by the programs as comments generally are. The
119 comma separated list of <VAR>flag</VAR>s is used by the <CODE>msgfmt</CODE>
120 program to give the user some better diagnostic messages. Currently
121 there are two forms of flags defined:
123 </P>
124 <DL COMPACT>
126 <DT><CODE>fuzzy</CODE>
127 <DD>
128 <A NAME="IDX62"></A>
129 This flag can be generated by the <CODE>msgmerge</CODE> program or it can be
130 inserted by the translator herself. It shows that the <CODE>msgstr</CODE>
131 string might not be a correct translation (anymore). Only the translator
132 can judge if the translation requires further modification, or is
133 acceptable as is. Once satisfied with the translation, she then removes
134 this <CODE>fuzzy</CODE> attribute. The <CODE>msgmerge</CODE> program inserts this
135 when it combined the <CODE>msgid</CODE> and <CODE>msgstr</CODE> entries after fuzzy
136 search only. See section <A HREF="gettext_8.html#SEC64">8.3.6 Fuzzy Entries</A>.
138 <DT><CODE>c-format</CODE>
139 <DD>
140 <A NAME="IDX63"></A>
141 <DT><CODE>no-c-format</CODE>
142 <DD>
143 <A NAME="IDX64"></A>
144 These flags should not be added by a human. Instead only the
145 <CODE>xgettext</CODE> program adds them. In an automated PO file processing
146 system as proposed here, the user's changes would be thrown away again as
147 soon as the <CODE>xgettext</CODE> program generates a new template file.
149 The <CODE>c-format</CODE> flag indicates that the untranslated string and the
150 translation are supposed to be C format strings. The <CODE>no-c-format</CODE>
151 flag indicates that they are not C format strings, even though the untranslated
152 string happens to look like a C format string (with <SAMP>&lsquo;%&rsquo;</SAMP> directives).
154 When the <CODE>c-format</CODE> flag is given for a string the <CODE>msgfmt</CODE>
155 program does some more tests to check the validity of the translation.
156 See section <A HREF="gettext_10.html#SEC157">10.1 Invoking the <CODE>msgfmt</CODE> Program</A>, section <A HREF="gettext_4.html#SEC22">4.6 Special Comments preceding Keywords</A> and section <A HREF="gettext_15.html#SEC249">15.3.1 C Format Strings</A>.
158 <DT><CODE>objc-format</CODE>
159 <DD>
160 <A NAME="IDX65"></A>
161 <DT><CODE>no-objc-format</CODE>
162 <DD>
163 <A NAME="IDX66"></A>
164 Likewise for Objective C, see section <A HREF="gettext_15.html#SEC250">15.3.2 Objective C Format Strings</A>.
166 <DT><CODE>sh-format</CODE>
167 <DD>
168 <A NAME="IDX67"></A>
169 <DT><CODE>no-sh-format</CODE>
170 <DD>
171 <A NAME="IDX68"></A>
172 Likewise for Shell, see section <A HREF="gettext_15.html#SEC251">15.3.3 Shell Format Strings</A>.
174 <DT><CODE>python-format</CODE>
175 <DD>
176 <A NAME="IDX69"></A>
177 <DT><CODE>no-python-format</CODE>
178 <DD>
179 <A NAME="IDX70"></A>
180 Likewise for Python, see section <A HREF="gettext_15.html#SEC252">15.3.4 Python Format Strings</A>.
182 <DT><CODE>lisp-format</CODE>
183 <DD>
184 <A NAME="IDX71"></A>
185 <DT><CODE>no-lisp-format</CODE>
186 <DD>
187 <A NAME="IDX72"></A>
188 Likewise for Lisp, see section <A HREF="gettext_15.html#SEC253">15.3.5 Lisp Format Strings</A>.
190 <DT><CODE>elisp-format</CODE>
191 <DD>
192 <A NAME="IDX73"></A>
193 <DT><CODE>no-elisp-format</CODE>
194 <DD>
195 <A NAME="IDX74"></A>
196 Likewise for Emacs Lisp, see section <A HREF="gettext_15.html#SEC254">15.3.6 Emacs Lisp Format Strings</A>.
198 <DT><CODE>librep-format</CODE>
199 <DD>
200 <A NAME="IDX75"></A>
201 <DT><CODE>no-librep-format</CODE>
202 <DD>
203 <A NAME="IDX76"></A>
204 Likewise for librep, see section <A HREF="gettext_15.html#SEC255">15.3.7 librep Format Strings</A>.
206 <DT><CODE>scheme-format</CODE>
207 <DD>
208 <A NAME="IDX77"></A>
209 <DT><CODE>no-scheme-format</CODE>
210 <DD>
211 <A NAME="IDX78"></A>
212 Likewise for Scheme, see section <A HREF="gettext_15.html#SEC256">15.3.8 Scheme Format Strings</A>.
214 <DT><CODE>smalltalk-format</CODE>
215 <DD>
216 <A NAME="IDX79"></A>
217 <DT><CODE>no-smalltalk-format</CODE>
218 <DD>
219 <A NAME="IDX80"></A>
220 Likewise for Smalltalk, see section <A HREF="gettext_15.html#SEC257">15.3.9 Smalltalk Format Strings</A>.
222 <DT><CODE>java-format</CODE>
223 <DD>
224 <A NAME="IDX81"></A>
225 <DT><CODE>no-java-format</CODE>
226 <DD>
227 <A NAME="IDX82"></A>
228 Likewise for Java, see section <A HREF="gettext_15.html#SEC258">15.3.10 Java Format Strings</A>.
230 <DT><CODE>csharp-format</CODE>
231 <DD>
232 <A NAME="IDX83"></A>
233 <DT><CODE>no-csharp-format</CODE>
234 <DD>
235 <A NAME="IDX84"></A>
236 Likewise for C#, see section <A HREF="gettext_15.html#SEC259">15.3.11 C# Format Strings</A>.
238 <DT><CODE>awk-format</CODE>
239 <DD>
240 <A NAME="IDX85"></A>
241 <DT><CODE>no-awk-format</CODE>
242 <DD>
243 <A NAME="IDX86"></A>
244 Likewise for awk, see section <A HREF="gettext_15.html#SEC260">15.3.12 awk Format Strings</A>.
246 <DT><CODE>object-pascal-format</CODE>
247 <DD>
248 <A NAME="IDX87"></A>
249 <DT><CODE>no-object-pascal-format</CODE>
250 <DD>
251 <A NAME="IDX88"></A>
252 Likewise for Object Pascal, see section <A HREF="gettext_15.html#SEC261">15.3.13 Object Pascal Format Strings</A>.
254 <DT><CODE>ycp-format</CODE>
255 <DD>
256 <A NAME="IDX89"></A>
257 <DT><CODE>no-ycp-format</CODE>
258 <DD>
259 <A NAME="IDX90"></A>
260 Likewise for YCP, see section <A HREF="gettext_15.html#SEC262">15.3.14 YCP Format Strings</A>.
262 <DT><CODE>tcl-format</CODE>
263 <DD>
264 <A NAME="IDX91"></A>
265 <DT><CODE>no-tcl-format</CODE>
266 <DD>
267 <A NAME="IDX92"></A>
268 Likewise for Tcl, see section <A HREF="gettext_15.html#SEC263">15.3.15 Tcl Format Strings</A>.
270 <DT><CODE>perl-format</CODE>
271 <DD>
272 <A NAME="IDX93"></A>
273 <DT><CODE>no-perl-format</CODE>
274 <DD>
275 <A NAME="IDX94"></A>
276 Likewise for Perl, see section <A HREF="gettext_15.html#SEC264">15.3.16 Perl Format Strings</A>.
278 <DT><CODE>perl-brace-format</CODE>
279 <DD>
280 <A NAME="IDX95"></A>
281 <DT><CODE>no-perl-brace-format</CODE>
282 <DD>
283 <A NAME="IDX96"></A>
284 Likewise for Perl brace, see section <A HREF="gettext_15.html#SEC264">15.3.16 Perl Format Strings</A>.
286 <DT><CODE>php-format</CODE>
287 <DD>
288 <A NAME="IDX97"></A>
289 <DT><CODE>no-php-format</CODE>
290 <DD>
291 <A NAME="IDX98"></A>
292 Likewise for PHP, see section <A HREF="gettext_15.html#SEC265">15.3.17 PHP Format Strings</A>.
294 <DT><CODE>gcc-internal-format</CODE>
295 <DD>
296 <A NAME="IDX99"></A>
297 <DT><CODE>no-gcc-internal-format</CODE>
298 <DD>
299 <A NAME="IDX100"></A>
300 Likewise for the GCC sources, see section <A HREF="gettext_15.html#SEC266">15.3.18 GCC internal Format Strings</A>.
302 <DT><CODE>gfc-internal-format</CODE>
303 <DD>
304 <A NAME="IDX101"></A>
305 <DT><CODE>no-gfc-internal-format</CODE>
306 <DD>
307 <A NAME="IDX102"></A>
308 Likewise for the GNU Fortran Compiler sources, see section <A HREF="gettext_15.html#SEC267">15.3.19 GFC internal Format Strings</A>.
310 <DT><CODE>qt-format</CODE>
311 <DD>
312 <A NAME="IDX103"></A>
313 <DT><CODE>no-qt-format</CODE>
314 <DD>
315 <A NAME="IDX104"></A>
316 Likewise for Qt, see section <A HREF="gettext_15.html#SEC268">15.3.20 Qt Format Strings</A>.
318 <DT><CODE>qt-plural-format</CODE>
319 <DD>
320 <A NAME="IDX105"></A>
321 <DT><CODE>no-qt-plural-format</CODE>
322 <DD>
323 <A NAME="IDX106"></A>
324 Likewise for Qt plural forms, see section <A HREF="gettext_15.html#SEC269">15.3.21 Qt Format Strings</A>.
326 <DT><CODE>kde-format</CODE>
327 <DD>
328 <A NAME="IDX107"></A>
329 <DT><CODE>no-kde-format</CODE>
330 <DD>
331 <A NAME="IDX108"></A>
332 Likewise for KDE, see section <A HREF="gettext_15.html#SEC270">15.3.22 KDE Format Strings</A>.
334 <DT><CODE>boost-format</CODE>
335 <DD>
336 <A NAME="IDX109"></A>
337 <DT><CODE>no-boost-format</CODE>
338 <DD>
339 <A NAME="IDX110"></A>
340 Likewise for Boost, see section <A HREF="gettext_15.html#SEC271">15.3.23 Boost Format Strings</A>.
342 </DL>
345 <A NAME="IDX111"></A>
346 <A NAME="IDX112"></A>
347 It is also possible to have entries with a context specifier. They look like
348 this:
350 </P>
352 <PRE>
353 <VAR>white-space</VAR>
354 # <VAR>translator-comments</VAR>
355 #. <VAR>extracted-comments</VAR>
356 #: <VAR>reference</VAR>...
357 #, <VAR>flag</VAR>...
358 #| msgctxt <VAR>previous-context</VAR>
359 #| msgid <VAR>previous-untranslated-string</VAR>
360 msgctxt <VAR>context</VAR>
361 msgid <VAR>untranslated-string</VAR>
362 msgstr <VAR>translated-string</VAR>
363 </PRE>
366 The context serves to disambiguate messages with the same
367 <VAR>untranslated-string</VAR>. It is possible to have several entries with
368 the same <VAR>untranslated-string</VAR> in a PO file, provided that they each
369 have a different <VAR>context</VAR>. Note that an empty <VAR>context</VAR> string
370 and an absent <CODE>msgctxt</CODE> line do not mean the same thing.
372 </P>
374 <A NAME="IDX113"></A>
375 <A NAME="IDX114"></A>
376 A different kind of entries is used for translations which involve
377 plural forms.
379 </P>
381 <PRE>
382 <VAR>white-space</VAR>
383 # <VAR>translator-comments</VAR>
384 #. <VAR>extracted-comments</VAR>
385 #: <VAR>reference</VAR>...
386 #, <VAR>flag</VAR>...
387 #| msgid <VAR>previous-untranslated-string-singular</VAR>
388 #| msgid_plural <VAR>previous-untranslated-string-plural</VAR>
389 msgid <VAR>untranslated-string-singular</VAR>
390 msgid_plural <VAR>untranslated-string-plural</VAR>
391 msgstr[0] <VAR>translated-string-case-0</VAR>
393 msgstr[N] <VAR>translated-string-case-n</VAR>
394 </PRE>
397 Such an entry can look like this:
399 </P>
401 <PRE>
402 #: src/msgcmp.c:338 src/po-lex.c:699
403 #, c-format
404 msgid "found %d fatal error"
405 msgid_plural "found %d fatal errors"
406 msgstr[0] "s'ha trobat %d error fatal"
407 msgstr[1] "s'han trobat %d errors fatals"
408 </PRE>
411 Here also, a <CODE>msgctxt</CODE> context can be specified before <CODE>msgid</CODE>,
412 like above.
414 </P>
416 Here, additional kinds of flags can be used:
418 </P>
419 <DL COMPACT>
421 <DT><CODE>range:</CODE>
422 <DD>
423 <A NAME="IDX115"></A>
424 This flag is followed by a range of non-negative numbers, using the syntax
425 <CODE>range: <VAR>minimum-value</VAR>..<VAR>maximum-value</VAR></CODE>. It designates the
426 possible values that the numeric parameter of the message can take. In some
427 languages, translators may produce slightly better translations if they know
428 that the value can only take on values between 0 and 10, for example.
429 </DL>
432 The <VAR>previous-untranslated-string</VAR> is optionally inserted by the
433 <CODE>msgmerge</CODE> program, at the same time when it marks a message fuzzy.
434 It helps the translator to see which changes were done by the developers
435 on the <VAR>untranslated-string</VAR>.
437 </P>
439 It happens that some lines, usually whitespace or comments, follow the
440 very last entry of a PO file. Such lines are not part of any entry,
441 and will be dropped when the PO file is processed by the tools, or may
442 disturb some PO file editors.
444 </P>
446 The remainder of this section may be safely skipped by those using
447 a PO file editor, yet it may be interesting for everybody to have a better
448 idea of the precise format of a PO file. On the other hand, those
449 wishing to modify PO files by hand should carefully continue reading on.
451 </P>
453 Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects
454 the C syntax for a character string, including the surrounding quotes
455 and embedded backslashed escape sequences. When the time comes
456 to write multi-line strings, one should not use escaped newlines.
457 Instead, a closing quote should follow the last character on the
458 line to be continued, and an opening quote should resume the string
459 at the beginning of the following PO file line. For example:
461 </P>
463 <PRE>
464 msgid ""
465 "Here is an example of how one might continue a very long string\n"
466 "for the common case the string represents multi-line output.\n"
467 </PRE>
470 In this example, the empty string is used on the first line, to
471 allow better alignment of the <CODE>H</CODE> from the word <SAMP>&lsquo;Here&rsquo;</SAMP>
472 over the <CODE>f</CODE> from the word <SAMP>&lsquo;for&rsquo;</SAMP>. In this example, the
473 <CODE>msgid</CODE> keyword is followed by three strings, which are meant
474 to be concatenated. Concatenating the empty string does not change
475 the resulting overall string, but it is a way for us to comply with
476 the necessity of <CODE>msgid</CODE> to be followed by a string on the same
477 line, while keeping the multi-line presentation left-justified, as
478 we find this to be a cleaner disposition. The empty string could have
479 been omitted, but only if the string starting with <SAMP>&lsquo;Here&rsquo;</SAMP> was
480 promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> It was not really necessary
481 either to switch between the two last quoted strings immediately after
482 the newline <SAMP>&lsquo;\n&rsquo;</SAMP>, the switch could have occurred after <EM>any</EM>
483 other character, we just did it this way because it is neater.
485 </P>
487 <A NAME="IDX116"></A>
488 One should carefully distinguish between end of lines marked as
489 <SAMP>&lsquo;\n&rsquo;</SAMP> <EM>inside</EM> quotes, which are part of the represented
490 string, and end of lines in the PO file itself, outside string quotes,
491 which have no incidence on the represented string.
493 </P>
495 <A NAME="IDX117"></A>
496 Outside strings, white lines and comments may be used freely.
497 Comments start at the beginning of a line with <SAMP>&lsquo;#&rsquo;</SAMP> and extend
498 until the end of the PO file line. Comments written by translators
499 should have the initial <SAMP>&lsquo;#&rsquo;</SAMP> immediately followed by some white
500 space. If the <SAMP>&lsquo;#&rsquo;</SAMP> is not immediately followed by white space,
501 this comment is most likely generated and managed by specialized GNU
502 tools, and might disappear or be replaced unexpectedly when the PO
503 file is given to <CODE>msgmerge</CODE>.
505 </P>
506 <P><HR><P>
507 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
508 </BODY>
509 </HTML>