mingw/share/doc/gettext/gettext_3.html

   1 <HTML>
   2 <HEAD>
   3 <!-- This HTML file has been created by texi2html 1.52b
   4      from gettext.texi on 6 June 2010 -->
   5
   6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
   7 <TITLE>GNU gettext utilities - 3  The Format of PO Files</TITLE>
   8 </HEAD>
   9 <BODY>
  10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
  11 <P><HR><P>
  12
  13
  14 <H1><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3  The Format of PO Files</A></H1>
  15 <P>
  16 <A NAME="IDX55"></A>
  17 <A NAME="IDX56"></A>
  18
  19 </P>
  20 <P>
  21 The GNU <CODE>gettext</CODE> toolset helps programmers and translators
  22 at producing, updating and using translation files, mainly those
  23 PO files which are textual, editable files.  This chapter explains
  24 the format of PO files.
  25
  26 </P>
  27 <P>
  28 A PO file is made up of many entries, each entry holding the relation
  29 between an original untranslated string and its corresponding
  30 translation.  All entries in a given PO file usually pertain
  31 to a single project, and all translations are expressed in a single
  32 target language.  One PO file <EM>entry</EM> has the following schematic
  33 structure:
  34
  35 </P>
  36
  37 <PRE>
  38 <VAR>white-space</VAR>
  39 #  <VAR>translator-comments</VAR>
  40 #. <VAR>extracted-comments</VAR>
  41 #: <VAR>reference</VAR>...
  42 #, <VAR>flag</VAR>...
  43 #| msgid <VAR>previous-untranslated-string</VAR>
  44 msgid <VAR>untranslated-string</VAR>
  45 msgstr <VAR>translated-string</VAR>
  46 </PRE>
  47
  48 <P>
  49 The general structure of a PO file should be well understood by
  50 the translator.  When using PO mode, very little has to be known
  51 about the format details, as PO mode takes care of them for her.
  52
  53 </P>
  54 <P>
  55 A simple entry can look like this:
  56
  57 </P>
  58
  59 <PRE>
  60 #: lib/error.c:116
  61 msgid "Unknown system error"
  62 msgstr "Error desconegut del sistema"
  63 </PRE>
  64
  65 <P>
  66 <A NAME="IDX57"></A>
  67 <A NAME="IDX58"></A>
  68 <A NAME="IDX59"></A>
  69 Entries begin with some optional white space.  Usually, when generated
  70 through GNU <CODE>gettext</CODE> tools, there is exactly one blank line
  71 between entries.  Then comments follow, on lines all starting with the
  72 character <CODE>#</CODE>.  There are two kinds of comments: those which have
  73 some white space immediately following the <CODE>#</CODE> - the <VAR>translator
  74 comments</VAR> -, which comments are created and maintained exclusively by the
  75 translator, and those which have some non-white character just after the
  76 <CODE>#</CODE> - the <VAR>automatic comments</VAR> -, which comments are created and
  77 maintained automatically by GNU <CODE>gettext</CODE> tools.  Comment lines
  78 starting with <CODE>#.</CODE> contain comments given by the programmer, directed
  79 at the translator; these comments are called <VAR>extracted comments</VAR>
  80 because the <CODE>xgettext</CODE> program extracts them from the program's
  81 source code.  Comment lines starting with <CODE>#:</CODE> contain references to
  82 the program's source code.  Comment lines starting with <CODE>#,</CODE> contain
  83 flags; more about these below.  Comment lines starting with <CODE>#|</CODE>
  84 contain the previous untranslated string for which the translator gave
  85 a translation.
  86
  87 </P>
  88 <P>
  89 All comments, of either kind, are optional.
  90
  91 </P>
  92 <P>
  93 <A NAME="IDX60"></A>
  94 <A NAME="IDX61"></A>
  95 After white space and comments, entries show two strings, namely
  96 first the untranslated string as it appears in the original program
  97 sources, and then, the translation of this string.  The original
  98 string is introduced by the keyword <CODE>msgid</CODE>, and the translation,
  99 by <CODE>msgstr</CODE>.  The two strings, untranslated and translated,
 100 are quoted in various ways in the PO file, using <CODE>"</CODE>
 101 delimiters and <CODE>\</CODE> escapes, but the translator does not really
 102 have to pay attention to the precise quoting format, as PO mode fully
 103 takes care of quoting for her.
 104
 105 </P>
 106 <P>
 107 The <CODE>msgid</CODE> strings, as well as automatic comments, are produced
 108 and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not
 109 provide means for the translator to alter these.  The most she can
 110 do is merely deleting them, and only by deleting the whole entry.
 111 On the other hand, the <CODE>msgstr</CODE> string, as well as translator
 112 comments, are really meant for the translator, and PO mode gives her
 113 the full control she needs.
 114
 115 </P>
 116 <P>
 117 The comment lines beginning with <CODE>#,</CODE> are special because they are
 118 not completely ignored by the programs as comments generally are.  The
 119 comma separated list of <VAR>flag</VAR>s is used by the <CODE>msgfmt</CODE>
 120 program to give the user some better diagnostic messages.  Currently
 121 there are two forms of flags defined:
 122
 123 </P>
 124 <DL COMPACT>
 125
 126 <DT><CODE>fuzzy</CODE>
 127 <DD>
 128 <A NAME="IDX62"></A>
 129 This flag can be generated by the <CODE>msgmerge</CODE> program or it can be
 130 inserted by the translator herself.  It shows that the <CODE>msgstr</CODE>
 131 string might not be a correct translation (anymore).  Only the translator
 132 can judge if the translation requires further modification, or is
 133 acceptable as is.  Once satisfied with the translation, she then removes
 134 this <CODE>fuzzy</CODE> attribute.  The <CODE>msgmerge</CODE> program inserts this
 135 when it combined the <CODE>msgid</CODE> and <CODE>msgstr</CODE> entries after fuzzy
 136 search only.  See section <A HREF="gettext_8.html#SEC64">8.3.6  Fuzzy Entries</A>.
 137
 138 <DT><CODE>c-format</CODE>
 139 <DD>
 140 <A NAME="IDX63"></A>
 141 <DT><CODE>no-c-format</CODE>
 142 <DD>
 143 <A NAME="IDX64"></A>
 144 These flags should not be added by a human.  Instead only the
 145 <CODE>xgettext</CODE> program adds them.  In an automated PO file processing
 146 system as proposed here, the user's changes would be thrown away again as
 147 soon as the <CODE>xgettext</CODE> program generates a new template file.
 148
 149 The <CODE>c-format</CODE> flag indicates that the untranslated string and the
 150 translation are supposed to be C format strings.  The <CODE>no-c-format</CODE>
 151 flag indicates that they are not C format strings, even though the untranslated
 152 string happens to look like a C format string (with <SAMP>&lsquo;%&rsquo;</SAMP> directives).
 153
 154 When the <CODE>c-format</CODE> flag is given for a string the <CODE>msgfmt</CODE>
 155 program does some more tests to check the validity of the translation.
 156 See section <A HREF="gettext_10.html#SEC157">10.1  Invoking the <CODE>msgfmt</CODE> Program</A>, section <A HREF="gettext_4.html#SEC22">4.6  Special Comments preceding Keywords</A> and section <A HREF="gettext_15.html#SEC249">15.3.1  C Format Strings</A>.
 157
 158 <DT><CODE>objc-format</CODE>
 159 <DD>
 160 <A NAME="IDX65"></A>
 161 <DT><CODE>no-objc-format</CODE>
 162 <DD>
 163 <A NAME="IDX66"></A>
 164 Likewise for Objective C, see section <A HREF="gettext_15.html#SEC250">15.3.2  Objective C Format Strings</A>.
 165
 166 <DT><CODE>sh-format</CODE>
 167 <DD>
 168 <A NAME="IDX67"></A>
 169 <DT><CODE>no-sh-format</CODE>
 170 <DD>
 171 <A NAME="IDX68"></A>
 172 Likewise for Shell, see section <A HREF="gettext_15.html#SEC251">15.3.3  Shell Format Strings</A>.
 173
 174 <DT><CODE>python-format</CODE>
 175 <DD>
 176 <A NAME="IDX69"></A>
 177 <DT><CODE>no-python-format</CODE>
 178 <DD>
 179 <A NAME="IDX70"></A>
 180 Likewise for Python, see section <A HREF="gettext_15.html#SEC252">15.3.4  Python Format Strings</A>.
 181
 182 <DT><CODE>lisp-format</CODE>
 183 <DD>
 184 <A NAME="IDX71"></A>
 185 <DT><CODE>no-lisp-format</CODE>
 186 <DD>
 187 <A NAME="IDX72"></A>
 188 Likewise for Lisp, see section <A HREF="gettext_15.html#SEC253">15.3.5  Lisp Format Strings</A>.
 189
 190 <DT><CODE>elisp-format</CODE>
 191 <DD>
 192 <A NAME="IDX73"></A>
 193 <DT><CODE>no-elisp-format</CODE>
 194 <DD>
 195 <A NAME="IDX74"></A>
 196 Likewise for Emacs Lisp, see section <A HREF="gettext_15.html#SEC254">15.3.6  Emacs Lisp Format Strings</A>.
 197
 198 <DT><CODE>librep-format</CODE>
 199 <DD>
 200 <A NAME="IDX75"></A>
 201 <DT><CODE>no-librep-format</CODE>
 202 <DD>
 203 <A NAME="IDX76"></A>
 204 Likewise for librep, see section <A HREF="gettext_15.html#SEC255">15.3.7  librep Format Strings</A>.
 205
 206 <DT><CODE>scheme-format</CODE>
 207 <DD>
 208 <A NAME="IDX77"></A>
 209 <DT><CODE>no-scheme-format</CODE>
 210 <DD>
 211 <A NAME="IDX78"></A>
 212 Likewise for Scheme, see section <A HREF="gettext_15.html#SEC256">15.3.8  Scheme Format Strings</A>.
 213
 214 <DT><CODE>smalltalk-format</CODE>
 215 <DD>
 216 <A NAME="IDX79"></A>
 217 <DT><CODE>no-smalltalk-format</CODE>
 218 <DD>
 219 <A NAME="IDX80"></A>
 220 Likewise for Smalltalk, see section <A HREF="gettext_15.html#SEC257">15.3.9  Smalltalk Format Strings</A>.
 221
 222 <DT><CODE>java-format</CODE>
 223 <DD>
 224 <A NAME="IDX81"></A>
 225 <DT><CODE>no-java-format</CODE>
 226 <DD>
 227 <A NAME="IDX82"></A>
 228 Likewise for Java, see section <A HREF="gettext_15.html#SEC258">15.3.10  Java Format Strings</A>.
 229
 230 <DT><CODE>csharp-format</CODE>
 231 <DD>
 232 <A NAME="IDX83"></A>
 233 <DT><CODE>no-csharp-format</CODE>
 234 <DD>
 235 <A NAME="IDX84"></A>
 236 Likewise for C#, see section <A HREF="gettext_15.html#SEC259">15.3.11  C# Format Strings</A>.
 237
 238 <DT><CODE>awk-format</CODE>
 239 <DD>
 240 <A NAME="IDX85"></A>
 241 <DT><CODE>no-awk-format</CODE>
 242 <DD>
 243 <A NAME="IDX86"></A>
 244 Likewise for awk, see section <A HREF="gettext_15.html#SEC260">15.3.12  awk Format Strings</A>.
 245
 246 <DT><CODE>object-pascal-format</CODE>
 247 <DD>
 248 <A NAME="IDX87"></A>
 249 <DT><CODE>no-object-pascal-format</CODE>
 250 <DD>
 251 <A NAME="IDX88"></A>
 252 Likewise for Object Pascal, see section <A HREF="gettext_15.html#SEC261">15.3.13  Object Pascal Format Strings</A>.
 253
 254 <DT><CODE>ycp-format</CODE>
 255 <DD>
 256 <A NAME="IDX89"></A>
 257 <DT><CODE>no-ycp-format</CODE>
 258 <DD>
 259 <A NAME="IDX90"></A>
 260 Likewise for YCP, see section <A HREF="gettext_15.html#SEC262">15.3.14  YCP Format Strings</A>.
 261
 262 <DT><CODE>tcl-format</CODE>
 263 <DD>
 264 <A NAME="IDX91"></A>
 265 <DT><CODE>no-tcl-format</CODE>
 266 <DD>
 267 <A NAME="IDX92"></A>
 268 Likewise for Tcl, see section <A HREF="gettext_15.html#SEC263">15.3.15  Tcl Format Strings</A>.
 269
 270 <DT><CODE>perl-format</CODE>
 271 <DD>
 272 <A NAME="IDX93"></A>
 273 <DT><CODE>no-perl-format</CODE>
 274 <DD>
 275 <A NAME="IDX94"></A>
 276 Likewise for Perl, see section <A HREF="gettext_15.html#SEC264">15.3.16  Perl Format Strings</A>.
 277
 278 <DT><CODE>perl-brace-format</CODE>
 279 <DD>
 280 <A NAME="IDX95"></A>
 281 <DT><CODE>no-perl-brace-format</CODE>
 282 <DD>
 283 <A NAME="IDX96"></A>
 284 Likewise for Perl brace, see section <A HREF="gettext_15.html#SEC264">15.3.16  Perl Format Strings</A>.
 285
 286 <DT><CODE>php-format</CODE>
 287 <DD>
 288 <A NAME="IDX97"></A>
 289 <DT><CODE>no-php-format</CODE>
 290 <DD>
 291 <A NAME="IDX98"></A>
 292 Likewise for PHP, see section <A HREF="gettext_15.html#SEC265">15.3.17  PHP Format Strings</A>.
 293
 294 <DT><CODE>gcc-internal-format</CODE>
 295 <DD>
 296 <A NAME="IDX99"></A>
 297 <DT><CODE>no-gcc-internal-format</CODE>
 298 <DD>
 299 <A NAME="IDX100"></A>
 300 Likewise for the GCC sources, see section <A HREF="gettext_15.html#SEC266">15.3.18  GCC internal Format Strings</A>.
 301
 302 <DT><CODE>gfc-internal-format</CODE>
 303 <DD>
 304 <A NAME="IDX101"></A>
 305 <DT><CODE>no-gfc-internal-format</CODE>
 306 <DD>
 307 <A NAME="IDX102"></A>
 308 Likewise for the GNU Fortran Compiler sources, see section <A HREF="gettext_15.html#SEC267">15.3.19  GFC internal Format Strings</A>.
 309
 310 <DT><CODE>qt-format</CODE>
 311 <DD>
 312 <A NAME="IDX103"></A>
 313 <DT><CODE>no-qt-format</CODE>
 314 <DD>
 315 <A NAME="IDX104"></A>
 316 Likewise for Qt, see section <A HREF="gettext_15.html#SEC268">15.3.20  Qt Format Strings</A>.
 317
 318 <DT><CODE>qt-plural-format</CODE>
 319 <DD>
 320 <A NAME="IDX105"></A>
 321 <DT><CODE>no-qt-plural-format</CODE>
 322 <DD>
 323 <A NAME="IDX106"></A>
 324 Likewise for Qt plural forms, see section <A HREF="gettext_15.html#SEC269">15.3.21  Qt Format Strings</A>.
 325
 326 <DT><CODE>kde-format</CODE>
 327 <DD>
 328 <A NAME="IDX107"></A>
 329 <DT><CODE>no-kde-format</CODE>
 330 <DD>
 331 <A NAME="IDX108"></A>
 332 Likewise for KDE, see section <A HREF="gettext_15.html#SEC270">15.3.22  KDE Format Strings</A>.
 333
 334 <DT><CODE>boost-format</CODE>
 335 <DD>
 336 <A NAME="IDX109"></A>
 337 <DT><CODE>no-boost-format</CODE>
 338 <DD>
 339 <A NAME="IDX110"></A>
 340 Likewise for Boost, see section <A HREF="gettext_15.html#SEC271">15.3.23  Boost Format Strings</A>.
 341
 342 </DL>
 343
 344 <P>
 345 <A NAME="IDX111"></A>
 346 <A NAME="IDX112"></A>
 347 It is also possible to have entries with a context specifier. They look like
 348 this:
 349
 350 </P>
 351
 352 <PRE>
 353 <VAR>white-space</VAR>
 354 #  <VAR>translator-comments</VAR>
 355 #. <VAR>extracted-comments</VAR>
 356 #: <VAR>reference</VAR>...
 357 #, <VAR>flag</VAR>...
 358 #| msgctxt <VAR>previous-context</VAR>
 359 #| msgid <VAR>previous-untranslated-string</VAR>
 360 msgctxt <VAR>context</VAR>
 361 msgid <VAR>untranslated-string</VAR>
 362 msgstr <VAR>translated-string</VAR>
 363 </PRE>
 364
 365 <P>
 366 The context serves to disambiguate messages with the same
 367 <VAR>untranslated-string</VAR>.  It is possible to have several entries with
 368 the same <VAR>untranslated-string</VAR> in a PO file, provided that they each
 369 have a different <VAR>context</VAR>.  Note that an empty <VAR>context</VAR> string
 370 and an absent <CODE>msgctxt</CODE> line do not mean the same thing.
 371
 372 </P>
 373 <P>
 374 <A NAME="IDX113"></A>
 375 <A NAME="IDX114"></A>
 376 A different kind of entries is used for translations which involve
 377 plural forms.
 378
 379 </P>
 380
 381 <PRE>
 382 <VAR>white-space</VAR>
 383 #  <VAR>translator-comments</VAR>
 384 #. <VAR>extracted-comments</VAR>
 385 #: <VAR>reference</VAR>...
 386 #, <VAR>flag</VAR>...
 387 #| msgid <VAR>previous-untranslated-string-singular</VAR>
 388 #| msgid_plural <VAR>previous-untranslated-string-plural</VAR>
 389 msgid <VAR>untranslated-string-singular</VAR>
 390 msgid_plural <VAR>untranslated-string-plural</VAR>
 391 msgstr[0] <VAR>translated-string-case-0</VAR>
 392 ...
 393 msgstr[N] <VAR>translated-string-case-n</VAR>
 394 </PRE>
 395
 396 <P>
 397 Such an entry can look like this:
 398
 399 </P>
 400
 401 <PRE>
 402 #: src/msgcmp.c:338 src/po-lex.c:699
 403 #, c-format
 404 msgid "found %d fatal error"
 405 msgid_plural "found %d fatal errors"
 406 msgstr[0] "s'ha trobat %d error fatal"
 407 msgstr[1] "s'han trobat %d errors fatals"
 408 </PRE>
 409
 410 <P>
 411 Here also, a <CODE>msgctxt</CODE> context can be specified before <CODE>msgid</CODE>,
 412 like above.
 413
 414 </P>
 415 <P>
 416 Here, additional kinds of flags can be used:
 417
 418 </P>
 419 <DL COMPACT>
 420
 421 <DT><CODE>range:</CODE>
 422 <DD>
 423 <A NAME="IDX115"></A>
 424 This flag is followed by a range of non-negative numbers, using the syntax
 425 <CODE>range: <VAR>minimum-value</VAR>..<VAR>maximum-value</VAR></CODE>.  It designates the
 426 possible values that the numeric parameter of the message can take.  In some
 427 languages, translators may produce slightly better translations if they know
 428 that the value can only take on values between 0 and 10, for example.
 429 </DL>
 430
 431 <P>
 432 The <VAR>previous-untranslated-string</VAR> is optionally inserted by the
 433 <CODE>msgmerge</CODE> program, at the same time when it marks a message fuzzy.
 434 It helps the translator to see which changes were done by the developers
 435 on the <VAR>untranslated-string</VAR>.
 436
 437 </P>
 438 <P>
 439 It happens that some lines, usually whitespace or comments, follow the
 440 very last entry of a PO file.  Such lines are not part of any entry,
 441 and will be dropped when the PO file is processed by the tools, or may
 442 disturb some PO file editors.
 443
 444 </P>
 445 <P>
 446 The remainder of this section may be safely skipped by those using
 447 a PO file editor, yet it may be interesting for everybody to have a better
 448 idea of the precise format of a PO file.  On the other hand, those
 449 wishing to modify PO files by hand should carefully continue reading on.
 450
 451 </P>
 452 <P>
 453 Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects
 454 the C syntax for a character string, including the surrounding quotes
 455 and embedded backslashed escape sequences.  When the time comes
 456 to write multi-line strings, one should not use escaped newlines.
 457 Instead, a closing quote should follow the last character on the
 458 line to be continued, and an opening quote should resume the string
 459 at the beginning of the following PO file line.  For example:
 460
 461 </P>
 462
 463 <PRE>
 464 msgid ""
 465 "Here is an example of how one might continue a very long string\n"
 466 "for the common case the string represents multi-line output.\n"
 467 </PRE>
 468
 469 <P>
 470 In this example, the empty string is used on the first line, to
 471 allow better alignment of the <CODE>H</CODE> from the word <SAMP>&lsquo;Here&rsquo;</SAMP>
 472 over the <CODE>f</CODE> from the word <SAMP>&lsquo;for&rsquo;</SAMP>.  In this example, the
 473 <CODE>msgid</CODE> keyword is followed by three strings, which are meant
 474 to be concatenated.  Concatenating the empty string does not change
 475 the resulting overall string, but it is a way for us to comply with
 476 the necessity of <CODE>msgid</CODE> to be followed by a string on the same
 477 line, while keeping the multi-line presentation left-justified, as
 478 we find this to be a cleaner disposition.  The empty string could have
 479 been omitted, but only if the string starting with <SAMP>&lsquo;Here&rsquo;</SAMP> was
 480 promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> It was not really necessary
 481 either to switch between the two last quoted strings immediately after
 482 the newline <SAMP>&lsquo;\n&rsquo;</SAMP>, the switch could have occurred after <EM>any</EM>
 483 other character, we just did it this way because it is neater.
 484
 485 </P>
 486 <P>
 487 <A NAME="IDX116"></A>
 488 One should carefully distinguish between end of lines marked as
 489 <SAMP>&lsquo;\n&rsquo;</SAMP> <EM>inside</EM> quotes, which are part of the represented
 490 string, and end of lines in the PO file itself, outside string quotes,
 491 which have no incidence on the represented string.
 492
 493 </P>
 494 <P>
 495 <A NAME="IDX117"></A>
 496 Outside strings, white lines and comments may be used freely.
 497 Comments start at the beginning of a line with <SAMP>&lsquo;#&rsquo;</SAMP> and extend
 498 until the end of the PO file line.  Comments written by translators
 499 should have the initial <SAMP>&lsquo;#&rsquo;</SAMP> immediately followed by some white
 500 space.  If the <SAMP>&lsquo;#&rsquo;</SAMP> is not immediately followed by white space,
 501 this comment is most likely generated and managed by specialized GNU
 502 tools, and might disappear or be replaced unexpectedly when the PO
 503 file is given to <CODE>msgmerge</CODE>.
 504
 505 </P>
 506 <P><HR><P>
 507 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
 508 </BODY>
 509 </HTML>