Mark msysGit as obsolete
[msysgit.git] / mingw / share / doc / gettext / gettext_6.html
blob6a38a6859fa9fe1a495122cab9c504a3c782013c
1 <HTML>
2 <HEAD>
3 <!-- This HTML file has been created by texi2html 1.52b
4 from gettext.texi on 6 June 2010 -->
6 <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7 <TITLE>GNU gettext utilities - 6 Creating a New PO File</TITLE>
8 </HEAD>
9 <BODY>
10 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11 <P><HR><P>
14 <H1><A NAME="SEC37" HREF="gettext_toc.html#TOC37">6 Creating a New PO File</A></H1>
15 <P>
16 <A NAME="IDX238"></A>
18 </P>
19 <P>
20 When starting a new translation, the translator creates a file called
21 <TT>&lsquo;<VAR>LANG</VAR>.po&rsquo;</TT>, as a copy of the <TT>&lsquo;<VAR>package</VAR>.pot&rsquo;</TT> template
22 file with modifications in the initial comments (at the beginning of the file)
23 and in the header entry (the first entry, near the beginning of the file).
25 </P>
26 <P>
27 The easiest way to do so is by use of the <SAMP>&lsquo;msginit&rsquo;</SAMP> program.
28 For example:
30 </P>
32 <PRE>
33 $ cd <VAR>PACKAGE</VAR>-<VAR>VERSION</VAR>
34 $ cd po
35 $ msginit
36 </PRE>
38 <P>
39 The alternative way is to do the copy and modifications by hand.
40 To do so, the translator copies <TT>&lsquo;<VAR>package</VAR>.pot&rsquo;</TT> to
41 <TT>&lsquo;<VAR>LANG</VAR>.po&rsquo;</TT>. Then she modifies the initial comments and
42 the header entry of this file.
44 </P>
48 <H2><A NAME="SEC38" HREF="gettext_toc.html#TOC38">6.1 Invoking the <CODE>msginit</CODE> Program</A></H2>
50 <P>
51 <A NAME="IDX239"></A>
52 <A NAME="IDX240"></A>
54 <PRE>
55 msginit [<VAR>option</VAR>]
56 </PRE>
58 <P>
59 <A NAME="IDX241"></A>
60 <A NAME="IDX242"></A>
61 The <CODE>msginit</CODE> program creates a new PO file, initializing the meta
62 information with values from the user's environment.
64 </P>
67 <H3><A NAME="SEC39" HREF="gettext_toc.html#TOC39">6.1.1 Input file location</A></H3>
69 <DL COMPACT>
71 <DT><SAMP>&lsquo;-i <VAR>inputfile</VAR>&rsquo;</SAMP>
72 <DD>
73 <DT><SAMP>&lsquo;--input=<VAR>inputfile</VAR>&rsquo;</SAMP>
74 <DD>
75 <A NAME="IDX243"></A>
76 <A NAME="IDX244"></A>
77 Input POT file.
79 </DL>
81 <P>
82 If no <VAR>inputfile</VAR> is given, the current directory is searched for the
83 POT file. If it is <SAMP>&lsquo;-&rsquo;</SAMP>, standard input is read.
85 </P>
88 <H3><A NAME="SEC40" HREF="gettext_toc.html#TOC40">6.1.2 Output file location</A></H3>
90 <DL COMPACT>
92 <DT><SAMP>&lsquo;-o <VAR>file</VAR>&rsquo;</SAMP>
93 <DD>
94 <DT><SAMP>&lsquo;--output-file=<VAR>file</VAR>&rsquo;</SAMP>
95 <DD>
96 <A NAME="IDX245"></A>
97 <A NAME="IDX246"></A>
98 Write output to specified PO file.
100 </DL>
103 If no output file is given, it depends on the <SAMP>&lsquo;--locale&rsquo;</SAMP> option or the
104 user's locale setting. If it is <SAMP>&lsquo;-&rsquo;</SAMP>, the results are written to
105 standard output.
107 </P>
110 <H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">6.1.3 Input file syntax</A></H3>
112 <DL COMPACT>
114 <DT><SAMP>&lsquo;-P&rsquo;</SAMP>
115 <DD>
116 <DT><SAMP>&lsquo;--properties-input&rsquo;</SAMP>
117 <DD>
118 <A NAME="IDX247"></A>
119 <A NAME="IDX248"></A>
120 Assume the input file is a Java ResourceBundle in Java <CODE>.properties</CODE>
121 syntax, not in PO file syntax.
123 <DT><SAMP>&lsquo;--stringtable-input&rsquo;</SAMP>
124 <DD>
125 <A NAME="IDX249"></A>
126 Assume the input file is a NeXTstep/GNUstep localized resource file in
127 <CODE>.strings</CODE> syntax, not in PO file syntax.
129 </DL>
133 <H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">6.1.4 Output details</A></H3>
135 <DL COMPACT>
137 <DT><SAMP>&lsquo;-l <VAR>ll_CC</VAR>&rsquo;</SAMP>
138 <DD>
139 <DT><SAMP>&lsquo;--locale=<VAR>ll_CC</VAR>&rsquo;</SAMP>
140 <DD>
141 <A NAME="IDX250"></A>
142 <A NAME="IDX251"></A>
143 Set target locale. <VAR>ll</VAR> should be a language code, and <VAR>CC</VAR> should
144 be a country code. The command <SAMP>&lsquo;locale -a&rsquo;</SAMP> can be used to output a list
145 of all installed locales. The default is the user's locale setting.
147 <DT><SAMP>&lsquo;--no-translator&rsquo;</SAMP>
148 <DD>
149 <A NAME="IDX252"></A>
150 Declares that the PO file will not have a human translator and is instead
151 automatically generated.
153 <DT><SAMP>&lsquo;--color&rsquo;</SAMP>
154 <DD>
155 <DT><SAMP>&lsquo;--color=<VAR>when</VAR>&rsquo;</SAMP>
156 <DD>
157 <A NAME="IDX253"></A>
158 Specify whether or when to use colors and other text attributes.
159 See section <A HREF="gettext_9.html#SEC150">9.11.1 The <CODE>--color</CODE> option</A> for details.
161 <DT><SAMP>&lsquo;--style=<VAR>style_file</VAR>&rsquo;</SAMP>
162 <DD>
163 <A NAME="IDX254"></A>
164 Specify the CSS style rule file to use for <CODE>--color</CODE>.
165 See section <A HREF="gettext_9.html#SEC152">9.11.3 The <CODE>--style</CODE> option</A> for details.
167 <DT><SAMP>&lsquo;-p&rsquo;</SAMP>
168 <DD>
169 <DT><SAMP>&lsquo;--properties-output&rsquo;</SAMP>
170 <DD>
171 <A NAME="IDX255"></A>
172 <A NAME="IDX256"></A>
173 Write out a Java ResourceBundle in Java <CODE>.properties</CODE> syntax. Note
174 that this file format doesn't support plural forms and silently drops
175 obsolete messages.
177 <DT><SAMP>&lsquo;--stringtable-output&rsquo;</SAMP>
178 <DD>
179 <A NAME="IDX257"></A>
180 Write out a NeXTstep/GNUstep localized resource file in <CODE>.strings</CODE> syntax.
181 Note that this file format doesn't support plural forms.
183 <DT><SAMP>&lsquo;-w <VAR>number</VAR>&rsquo;</SAMP>
184 <DD>
185 <DT><SAMP>&lsquo;--width=<VAR>number</VAR>&rsquo;</SAMP>
186 <DD>
187 <A NAME="IDX258"></A>
188 <A NAME="IDX259"></A>
189 Set the output page width. Long strings in the output files will be
190 split across multiple lines in order to ensure that each line's width
191 (= number of screen columns) is less or equal to the given <VAR>number</VAR>.
193 <DT><SAMP>&lsquo;--no-wrap&rsquo;</SAMP>
194 <DD>
195 <A NAME="IDX260"></A>
196 Do not break long message lines. Message lines whose width exceeds the
197 output page width will not be split into several lines. Only file reference
198 lines which are wider than the output page width will be split.
200 </DL>
204 <H3><A NAME="SEC43" HREF="gettext_toc.html#TOC43">6.1.5 Informative output</A></H3>
206 <DL COMPACT>
208 <DT><SAMP>&lsquo;-h&rsquo;</SAMP>
209 <DD>
210 <DT><SAMP>&lsquo;--help&rsquo;</SAMP>
211 <DD>
212 <A NAME="IDX261"></A>
213 <A NAME="IDX262"></A>
214 Display this help and exit.
216 <DT><SAMP>&lsquo;-V&rsquo;</SAMP>
217 <DD>
218 <DT><SAMP>&lsquo;--version&rsquo;</SAMP>
219 <DD>
220 <A NAME="IDX263"></A>
221 <A NAME="IDX264"></A>
222 Output version information and exit.
224 </DL>
228 <H2><A NAME="SEC44" HREF="gettext_toc.html#TOC44">6.2 Filling in the Header Entry</A></H2>
230 <A NAME="IDX265"></A>
232 </P>
234 The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
235 "FIRST AUTHOR &#60;EMAIL@ADDRESS&#62;, YEAR" ought to be replaced by sensible
236 information. This can be done in any text editor; if Emacs is used
237 and it switched to PO mode automatically (because it has recognized
238 the file's suffix), you can disable it by typing <KBD>M-x fundamental-mode</KBD>.
240 </P>
242 Modifying the header entry can already be done using PO mode: in Emacs,
243 type <KBD>M-x po-mode RET</KBD> and then <KBD>RET</KBD> again to start editing the
244 entry. You should fill in the following fields.
246 </P>
247 <DL COMPACT>
249 <DT>Project-Id-Version
250 <DD>
251 This is the name and version of the package. Fill it in if it has not
252 already been filled in by <CODE>xgettext</CODE>.
254 <DT>Report-Msgid-Bugs-To
255 <DD>
256 This has already been filled in by <CODE>xgettext</CODE>. It contains an email
257 address or URL where you can report bugs in the untranslated strings:
260 <UL>
261 <LI>Strings which are not entire sentences, see the maintainer guidelines
263 in section <A HREF="gettext_4.html#SEC19">4.3 Preparing Translatable Strings</A>.
264 <LI>Strings which use unclear terms or require additional context to be
266 understood.
267 <LI>Strings which make invalid assumptions about notation of date, time or
269 money.
270 <LI>Pluralisation problems.
272 <LI>Incorrect English spelling.
274 <LI>Incorrect formatting.
276 </UL>
278 <DT>POT-Creation-Date
279 <DD>
280 This has already been filled in by <CODE>xgettext</CODE>.
282 <DT>PO-Revision-Date
283 <DD>
284 You don't need to fill this in. It will be filled by the PO file editor
285 when you save the file.
287 <DT>Last-Translator
288 <DD>
289 Fill in your name and email address (without double quotes).
291 <DT>Language-Team
292 <DD>
293 Fill in the English name of the language, and the email address or
294 homepage URL of the language team you are part of.
296 Before starting a translation, it is a good idea to get in touch with
297 your translation team, not only to make sure you don't do duplicated work,
298 but also to coordinate difficult linguistic issues.
300 <A NAME="IDX266"></A>
301 In the Free Translation Project, each translation team has its own mailing
302 list. The up-to-date list of teams can be found at the Free Translation
303 Project's homepage, <A HREF="http://translationproject.org/">http://translationproject.org/</A>, in the "Teams"
304 area.
306 <DT>Language
307 <DD>
308 Fill in the language code of the language. This can be in one of three
309 forms:
312 <UL>
313 <LI>
315 <SAMP>&lsquo;<VAR>ll</VAR>&rsquo;</SAMP>, an ISO 639 two-letter language code (lowercase).
316 See section <A HREF="gettext_17.html#SEC318">A Language Codes</A> for the list of codes.
318 <LI>
320 <SAMP>&lsquo;<VAR>ll</VAR>_<VAR>CC</VAR>&rsquo;</SAMP>, where <SAMP>&lsquo;<VAR>ll</VAR>&rsquo;</SAMP> is an ISO 639 two-letter
321 language code (lowercase) and <SAMP>&lsquo;<VAR>CC</VAR>&rsquo;</SAMP> is an ISO 3166 two-letter
322 country code (uppercase). The country code specification is not redundant:
323 Some languages have dialects in different countries. For example,
324 <SAMP>&lsquo;de_AT&rsquo;</SAMP> is used for Austria, and <SAMP>&lsquo;pt_BR&rsquo;</SAMP> for Brazil. The country
325 code serves to distinguish the dialects. See section <A HREF="gettext_17.html#SEC318">A Language Codes</A> and
326 section <A HREF="gettext_18.html#SEC321">B Country Codes</A> for the lists of codes.
328 <LI>
330 <SAMP>&lsquo;<VAR>ll</VAR>_<VAR>CC</VAR>@<VAR>variant</VAR>&rsquo;</SAMP>, where <SAMP>&lsquo;<VAR>ll</VAR>&rsquo;</SAMP> is an
331 ISO 639 two-letter language code (lowercase), <SAMP>&lsquo;<VAR>CC</VAR>&rsquo;</SAMP> is an
332 ISO 3166 two-letter country code (uppercase), and <SAMP>&lsquo;<VAR>variant</VAR>&rsquo;</SAMP> is
333 a variant designator. The variant designator (lowercase) can be a script
334 designator, such as <SAMP>&lsquo;latin&rsquo;</SAMP> or <SAMP>&lsquo;cyrillic&rsquo;</SAMP>.
335 </UL>
337 The naming convention <SAMP>&lsquo;<VAR>ll</VAR>_<VAR>CC</VAR>&rsquo;</SAMP> is also the way locales are
338 named on systems based on GNU libc. But there are three important differences:
341 <UL>
342 <LI>
344 In this PO file field, but not in locale names, <SAMP>&lsquo;<VAR>ll</VAR>_<VAR>CC</VAR>&rsquo;</SAMP>
345 combinations denoting a language's main dialect are abbreviated as
346 <SAMP>&lsquo;<VAR>ll</VAR>&rsquo;</SAMP>. For example, <SAMP>&lsquo;de&rsquo;</SAMP> is equivalent to <SAMP>&lsquo;de_DE&rsquo;</SAMP>
347 (German as spoken in Germany), and <SAMP>&lsquo;pt&rsquo;</SAMP> to <SAMP>&lsquo;pt_PT&rsquo;</SAMP> (Portuguese as
348 spoken in Portugal) in this context.
350 <LI>
352 In this PO file field, suffixes like <SAMP>&lsquo;.<VAR>encoding</VAR>&rsquo;</SAMP> are not used.
354 <LI>
356 In this PO file field, variant designators that are not relevant to message
357 translation, such as <SAMP>&lsquo;@euro&rsquo;</SAMP>, are not used.
358 </UL>
360 So, if your locale name is <SAMP>&lsquo;de_DE.UTF-8&rsquo;</SAMP>, the language specification in
361 PO files is just <SAMP>&lsquo;de&rsquo;</SAMP>.
363 <DT>Content-Type
364 <DD>
365 <A NAME="IDX267"></A>
366 <A NAME="IDX268"></A>
367 Replace <SAMP>&lsquo;CHARSET&rsquo;</SAMP> with the character encoding used for your language,
368 in your locale, or UTF-8. This field is needed for correct operation of the
369 <CODE>msgmerge</CODE> and <CODE>msgfmt</CODE> programs, as well as for users whose
370 locale's character encoding differs from yours (see section <A HREF="gettext_11.html#SEC186">11.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A>).
372 <A NAME="IDX269"></A>
373 You get the character encoding of your locale by running the shell command
374 <SAMP>&lsquo;locale charmap&rsquo;</SAMP>. If the result is <SAMP>&lsquo;C&rsquo;</SAMP> or <SAMP>&lsquo;ANSI_X3.4-1968&rsquo;</SAMP>,
375 which is equivalent to <SAMP>&lsquo;ASCII&rsquo;</SAMP> (= <SAMP>&lsquo;US-ASCII&rsquo;</SAMP>), it means that your
376 locale is not correctly configured. In this case, ask your translation
377 team which charset to use. <SAMP>&lsquo;ASCII&rsquo;</SAMP> is not usable for any language
378 except Latin.
380 <A NAME="IDX270"></A>
381 Because the PO files must be portable to operating systems with less advanced
382 internationalization facilities, the character encodings that can be used
383 are limited to those supported by both GNU <CODE>libc</CODE> and GNU
384 <CODE>libiconv</CODE>. These are:
385 <CODE>ASCII</CODE>, <CODE>ISO-8859-1</CODE>, <CODE>ISO-8859-2</CODE>, <CODE>ISO-8859-3</CODE>,
386 <CODE>ISO-8859-4</CODE>, <CODE>ISO-8859-5</CODE>, <CODE>ISO-8859-6</CODE>, <CODE>ISO-8859-7</CODE>,
387 <CODE>ISO-8859-8</CODE>, <CODE>ISO-8859-9</CODE>, <CODE>ISO-8859-13</CODE>, <CODE>ISO-8859-14</CODE>,
388 <CODE>ISO-8859-15</CODE>,
389 <CODE>KOI8-R</CODE>, <CODE>KOI8-U</CODE>, <CODE>KOI8-T</CODE>,
390 <CODE>CP850</CODE>, <CODE>CP866</CODE>, <CODE>CP874</CODE>,
391 <CODE>CP932</CODE>, <CODE>CP949</CODE>, <CODE>CP950</CODE>, <CODE>CP1250</CODE>, <CODE>CP1251</CODE>,
392 <CODE>CP1252</CODE>, <CODE>CP1253</CODE>, <CODE>CP1254</CODE>, <CODE>CP1255</CODE>, <CODE>CP1256</CODE>,
393 <CODE>CP1257</CODE>, <CODE>GB2312</CODE>, <CODE>EUC-JP</CODE>, <CODE>EUC-KR</CODE>, <CODE>EUC-TW</CODE>,
394 <CODE>BIG5</CODE>, <CODE>BIG5-HKSCS</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE>, <CODE>SHIFT_JIS</CODE>,
395 <CODE>JOHAB</CODE>, <CODE>TIS-620</CODE>, <CODE>VISCII</CODE>, <CODE>GEORGIAN-PS</CODE>, <CODE>UTF-8</CODE>.
397 <A NAME="IDX271"></A>
398 In the GNU system, the following encodings are frequently used for the
399 corresponding languages.
401 <A NAME="IDX272"></A>
403 <UL>
404 <LI><CODE>ISO-8859-1</CODE> for
406 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
407 English, Estonian, Faroese, Finnish, French, Galician, German,
408 Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
409 Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
410 Walloon,
411 <LI><CODE>ISO-8859-2</CODE> for
413 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
414 Slovenian,
415 <LI><CODE>ISO-8859-3</CODE> for Maltese,
417 <LI><CODE>ISO-8859-5</CODE> for Macedonian, Serbian,
419 <LI><CODE>ISO-8859-6</CODE> for Arabic,
421 <LI><CODE>ISO-8859-7</CODE> for Greek,
423 <LI><CODE>ISO-8859-8</CODE> for Hebrew,
425 <LI><CODE>ISO-8859-9</CODE> for Turkish,
427 <LI><CODE>ISO-8859-13</CODE> for Latvian, Lithuanian, Maori,
429 <LI><CODE>ISO-8859-14</CODE> for Welsh,
431 <LI><CODE>ISO-8859-15</CODE> for
433 Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
434 Italian, Portuguese, Spanish, Swedish, Walloon,
435 <LI><CODE>KOI8-R</CODE> for Russian,
437 <LI><CODE>KOI8-U</CODE> for Ukrainian,
439 <LI><CODE>KOI8-T</CODE> for Tajik,
441 <LI><CODE>CP1251</CODE> for Bulgarian, Belarusian,
443 <LI><CODE>GB2312</CODE>, <CODE>GBK</CODE>, <CODE>GB18030</CODE>
445 for simplified writing of Chinese,
446 <LI><CODE>BIG5</CODE>, <CODE>BIG5-HKSCS</CODE>
448 for traditional writing of Chinese,
449 <LI><CODE>EUC-JP</CODE> for Japanese,
451 <LI><CODE>EUC-KR</CODE> for Korean,
453 <LI><CODE>TIS-620</CODE> for Thai,
455 <LI><CODE>GEORGIAN-PS</CODE> for Georgian,
457 <LI><CODE>UTF-8</CODE> for any language, including those listed above.
459 </UL>
461 <A NAME="IDX273"></A>
462 <A NAME="IDX274"></A>
463 When single quote characters or double quote characters are used in
464 translations for your language, and your locale's encoding is one of the
465 ISO-8859-* charsets, it is best if you create your PO files in UTF-8
466 encoding, instead of your locale's encoding. This is because in UTF-8
467 the real quote characters can be represented (single quote characters:
468 U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
469 ISO-8859-* charsets has them all. Users in UTF-8 locales will see the
470 real quote characters, whereas users in ISO-8859-* locales will see the
471 vertical apostrophe and the vertical double quote instead (because that's
472 what the character set conversion will transliterate them to).
474 <A NAME="IDX275"></A>
475 To enter such quote characters under X11, you can change your keyboard
476 mapping using the <CODE>xmodmap</CODE> program. The X11 names of the quote
477 characters are "leftsinglequotemark", "rightsinglequotemark",
478 "leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
479 "doublelowquotemark".
481 Note that only recent versions of GNU Emacs support the UTF-8 encoding:
482 Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't
483 support the UTF-8 encoding.
485 The character encoding name can be written in either upper or lower case.
486 Usually upper case is preferred.
488 <DT>Content-Transfer-Encoding
489 <DD>
490 Set this to <CODE>8bit</CODE>.
492 <DT>Plural-Forms
493 <DD>
494 This field is optional. It is only needed if the PO file has plural forms.
495 You can find them by searching for the <SAMP>&lsquo;msgid_plural&rsquo;</SAMP> keyword. The
496 format of the plural forms field is described in section <A HREF="gettext_11.html#SEC188">11.2.6 Additional functions for plural forms</A> and
497 section <A HREF="gettext_12.html#SEC209">12.6 Translating plural forms</A>.
498 </DL>
500 <P><HR><P>
501 Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
502 </BODY>
503 </HTML>