release/src/router/gettext/gettext-tools/doc/gettext.texi

   1 \input texinfo          @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename gettext.info
   4 @settitle GNU @code{gettext} utilities
   5 @finalout
   6 @c Indices:
   7 @c   am = autoconf macro  @amindex
   8 @c   cp = concept         @cindex
   9 @c   ef = emacs function  @efindex
  10 @c   em = emacs mode      @emindex
  11 @c   ev = emacs variable  @evindex
  12 @c   fn = function        @findex
  13 @c   kw = keyword         @kwindex
  14 @c   op = option          @opindex
  15 @c   pg = program         @pindex
  16 @c   vr = variable        @vindex
  17 @c Unused predefined indices:
  18 @c   tp = type            @tindex
  19 @c   ky = keystroke       @kindex
  20 @defcodeindex am
  21 @defcodeindex ef
  22 @defindex em
  23 @defcodeindex ev
  24 @defcodeindex kw
  25 @defcodeindex op
  26 @syncodeindex ef em
  27 @syncodeindex ev em
  28 @syncodeindex fn cp
  29 @syncodeindex kw cp
  30 @c %**end of header
  31
  32 @include version.texi
  33
  34 @dircategory GNU Gettext Utilities
  35 @direntry
  36 * gettext: (gettext).                          GNU gettext utilities.
  37 * autopoint: (gettext)autopoint Invocation.    Copy gettext infrastructure.
  38 * envsubst: (gettext)envsubst Invocation.      Expand environment variables.
  39 * gettextize: (gettext)gettextize Invocation.  Prepare a package for gettext.
  40 * msgattrib: (gettext)msgattrib Invocation.    Select part of a PO file.
  41 * msgcat: (gettext)msgcat Invocation.          Combine several PO files.
  42 * msgcmp: (gettext)msgcmp Invocation.          Compare a PO file and template.
  43 * msgcomm: (gettext)msgcomm Invocation.        Match two PO files.
  44 * msgconv: (gettext)msgconv Invocation.        Convert PO file to encoding.
  45 * msgen: (gettext)msgen Invocation.            Create an English PO file.
  46 * msgexec: (gettext)msgexec Invocation.        Process a PO file.
  47 * msgfilter: (gettext)msgfilter Invocation.    Pipe a PO file through a filter.
  48 * msgfmt: (gettext)msgfmt Invocation.          Make MO files out of PO files.
  49 * msggrep: (gettext)msggrep Invocation.        Select part of a PO file.
  50 * msginit: (gettext)msginit Invocation.        Create a fresh PO file.
  51 * msgmerge: (gettext)msgmerge Invocation.      Update a PO file from template.
  52 * msgunfmt: (gettext)msgunfmt Invocation.      Uncompile MO file into PO file.
  53 * msguniq: (gettext)msguniq Invocation.        Unify duplicates for PO file.
  54 * ngettext: (gettext)ngettext Invocation.      Translate a message with plural.
  55 * xgettext: (gettext)xgettext Invocation.      Extract strings into a PO file.
  56 * ISO639: (gettext)Language Codes.             ISO 639 language codes.
  57 * ISO3166: (gettext)Country Codes.             ISO 3166 country codes.
  58 @end direntry
  59
  60 @ifinfo
  61 This file provides documentation for GNU @code{gettext} utilities.
  62 It also serves as a reference for the free Translation Project.
  63
  64 Copyright (C) 1995-1998, 2001-2005 Free Software Foundation, Inc.
  65
  66 Permission is granted to make and distribute verbatim copies of
  67 this manual provided the copyright notice and this permission notice
  68 are preserved on all copies.
  69
  70 @ignore
  71 Permission is granted to process this file through TeX and print the
  72 results, provided the printed document carries copying permission
  73 notice identical to this one except for the removal of this paragraph
  74 (this paragraph not being relevant to the printed manual).
  75
  76 @end ignore
  77 Permission is granted to copy and distribute modified versions of this
  78 manual under the conditions for verbatim copying, provided that the entire
  79 resulting derived work is distributed under the terms of a permission
  80 notice identical to this one.
  81
  82 Permission is granted to copy and distribute translations of this manual
  83 into another language, under the above conditions for modified versions,
  84 except that this permission notice may be stated in a translation approved
  85 by the Foundation.
  86 @end ifinfo
  87
  88 @titlepage
  89 @title GNU gettext tools, version @value{VERSION}
  90 @subtitle Native Language Support Library and Tools
  91 @subtitle Edition @value{EDITION}, @value{UPDATED}
  92 @author Ulrich Drepper
  93 @author Jim Meyering
  94 @author Fran@,{c}ois Pinard
  95 @author Bruno Haible
  96
  97 @page
  98 @vskip 0pt plus 1filll
  99 Copyright @copyright{} 1995-1998, 2001-2003 Free Software Foundation, Inc.
 100
 101 Permission is granted to make and distribute verbatim copies of
 102 this manual provided the copyright notice and this permission notice
 103 are preserved on all copies.
 104
 105 Permission is granted to copy and distribute modified versions of this
 106 manual under the conditions for verbatim copying, provided that the entire
 107 resulting derived work is distributed under the terms of a permission
 108 notice identical to this one.
 109
 110 Permission is granted to copy and distribute translations of this manual
 111 into another language, under the above conditions for modified versions,
 112 except that this permission notice may be stated in a translation approved
 113 by the Foundation.
 114 @end titlepage
 115
 116 @ifnottex
 117 @c Table of Contents
 118 @contents
 119 @end ifnottex
 120
 121 @ifinfo
 122 @node Top, Introduction, (dir), (dir)
 123 @top GNU @code{gettext} utilities
 124
 125 This manual documents the GNU gettext tools and the GNU libintl library,
 126 version @value{VERSION}.
 127
 128 @menu
 129 * Introduction::                Introduction
 130 * Basics::                      PO Files and PO Mode Basics
 131 * Sources::                     Preparing Program Sources
 132 * Template::                    Making the PO Template File
 133 * Creating::                    Creating a New PO File
 134 * Updating::                    Updating Existing PO Files
 135 * Manipulating::                Manipulating PO Files
 136 * Binaries::                    Producing Binary MO Files
 137 * Users::                       The User's View
 138 * Programmers::                 The Programmer's View
 139 * Translators::                 The Translator's View
 140 * Maintainers::                 The Maintainer's View
 141 * Programming Languages::       Other Programming Languages
 142 * Conclusion::                  Concluding Remarks
 143
 144 * Language Codes::              ISO 639 language codes
 145 * Country Codes::               ISO 3166 country codes
 146
 147 * Program Index::               Index of Programs
 148 * Option Index::                Index of Command-Line Options
 149 * Variable Index::              Index of Environment Variables
 150 * PO Mode Index::               Index of Emacs PO Mode Commands
 151 * Autoconf Macro Index::        Index of Autoconf Macros
 152 * Index::                       General Index
 153
 154 @detailmenu
 155  --- The Detailed Node Listing ---
 156
 157 Introduction
 158
 159 * Why::                         The Purpose of GNU @code{gettext}
 160 * Concepts::                    I18n, L10n, and Such
 161 * Aspects::                     Aspects in Native Language Support
 162 * Files::                       Files Conveying Translations
 163 * Overview::                    Overview of GNU @code{gettext}
 164
 165 PO Files and PO Mode Basics
 166
 167 * Installation::                Completing GNU @code{gettext} Installation
 168 * PO Files::                    The Format of PO Files
 169 * Main PO Commands::            Main Commands
 170 * Entry Positioning::           Entry Positioning
 171 * Normalizing::                 Normalizing Strings in Entries
 172
 173 Preparing Program Sources
 174
 175 * Triggering::                  Triggering @code{gettext} Operations
 176 * Preparing Strings::           Preparing Translatable Strings
 177 * Mark Keywords::               How Marks Appear in Sources
 178 * Marking::                     Marking Translatable Strings
 179 * c-format Flag::               Telling something about the following string
 180 * Special cases::               Special Cases of Translatable Strings
 181 * Names::                       Marking Proper Names for Translation
 182 * Libraries::                   Preparing Library Sources
 183
 184 Making the PO Template File
 185
 186 * xgettext Invocation::         Invoking the @code{xgettext} Program
 187
 188 Creating a New PO File
 189
 190 * msginit Invocation::          Invoking the @code{msginit} Program
 191 * Header Entry::                Filling in the Header Entry
 192
 193 Updating Existing PO Files
 194
 195 * msgmerge Invocation::         Invoking the @code{msgmerge} Program
 196 * Translated Entries::          Translated Entries
 197 * Fuzzy Entries::               Fuzzy Entries
 198 * Untranslated Entries::        Untranslated Entries
 199 * Obsolete Entries::            Obsolete Entries
 200 * Modifying Translations::      Modifying Translations
 201 * Modifying Comments::          Modifying Comments
 202 * Subedit::                     Mode for Editing Translations
 203 * C Sources Context::           C Sources Context
 204 * Auxiliary::                   Consulting Auxiliary PO Files
 205 * Compendium::                  Using Translation Compendia
 206
 207 Using Translation Compendia
 208
 209 * Creating Compendia::          Merging translations for later use
 210 * Using Compendia::             Using older translations if they fit
 211
 212 Manipulating PO Files
 213
 214 * msgcat Invocation::           Invoking the @code{msgcat} Program
 215 * msgconv Invocation::          Invoking the @code{msgconv} Program
 216 * msggrep Invocation::          Invoking the @code{msggrep} Program
 217 * msgfilter Invocation::        Invoking the @code{msgfilter} Program
 218 * msguniq Invocation::          Invoking the @code{msguniq} Program
 219 * msgcomm Invocation::          Invoking the @code{msgcomm} Program
 220 * msgcmp Invocation::           Invoking the @code{msgcmp} Program
 221 * msgattrib Invocation::        Invoking the @code{msgattrib} Program
 222 * msgen Invocation::            Invoking the @code{msgen} Program
 223 * msgexec Invocation::          Invoking the @code{msgexec} Program
 224 * libgettextpo::                Writing your own programs that process PO files
 225
 226 Producing Binary MO Files
 227
 228 * msgfmt Invocation::           Invoking the @code{msgfmt} Program
 229 * msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
 230 * MO Files::                    The Format of GNU MO Files
 231
 232 The User's View
 233
 234 * Matrix::                      The Current @file{ABOUT-NLS} Matrix
 235 * Installers::                  Magic for Installers
 236 * End Users::                   Magic for End Users
 237
 238 The Programmer's View
 239
 240 * catgets::                     About @code{catgets}
 241 * gettext::                     About @code{gettext}
 242 * Comparison::                  Comparing the two interfaces
 243 * Using libintl.a::             Using libintl.a in own programs
 244 * gettext grok::                Being a @code{gettext} grok
 245 * Temp Programmers::            Temporary Notes for the Programmers Chapter
 246
 247 About @code{catgets}
 248
 249 * Interface to catgets::        The interface
 250 * Problems with catgets::       Problems with the @code{catgets} interface?!
 251
 252 About @code{gettext}
 253
 254 * Interface to gettext::        The interface
 255 * Ambiguities::                 Solving ambiguities
 256 * Locating Catalogs::           Locating message catalog files
 257 * Charset conversion::          How to request conversion to Unicode
 258 * Plural forms::                Additional functions for handling plurals
 259 * GUI program problems::        Another technique for solving ambiguities
 260 * Optimized gettext::           Optimization of the *gettext functions
 261
 262 Temporary Notes for the Programmers Chapter
 263
 264 * Temp Implementations::        Temporary - Two Possible Implementations
 265 * Temp catgets::                Temporary - About @code{catgets}
 266 * Temp WSI::                    Temporary - Why a single implementation
 267 * Temp Notes::                  Temporary - Notes
 268
 269 The Translator's View
 270
 271 * Trans Intro 0::               Introduction 0
 272 * Trans Intro 1::               Introduction 1
 273 * Discussions::                 Discussions
 274 * Organization::                Organization
 275 * Information Flow::            Information Flow
 276 * Prioritizing messages::       How to find which messages to translate first
 277
 278 Organization
 279
 280 * Central Coordination::        Central Coordination
 281 * National Teams::              National Teams
 282 * Mailing Lists::               Mailing Lists
 283
 284 National Teams
 285
 286 * Sub-Cultures::                Sub-Cultures
 287 * Organizational Ideas::        Organizational Ideas
 288
 289 The Maintainer's View
 290
 291 * Flat and Non-Flat::           Flat or Non-Flat Directory Structures
 292 * Prerequisites::               Prerequisite Works
 293 * gettextize Invocation::       Invoking the @code{gettextize} Program
 294 * Adjusting Files::             Files You Must Create or Alter
 295 * autoconf macros::             Autoconf macros for use in @file{configure.in}
 296 * CVS Issues::                  Integrating with CVS
 297 * Release Management::          Creating a Distribution Tarball
 298
 299 Files You Must Create or Alter
 300
 301 * po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
 302 * po/LINGUAS::                  @file{LINGUAS} in @file{po/}
 303 * po/Makevars::                 @file{Makevars} in @file{po/}
 304 * configure.in::                @file{configure.in} at top level
 305 * config.guess::                @file{config.guess}, @file{config.sub} at top level
 306 * mkinstalldirs::               @file{mkinstalldirs} at top level
 307 * aclocal::                     @file{aclocal.m4} at top level
 308 * acconfig::                    @file{acconfig.h} at top level
 309 * config.h.in::                 @file{config.h.in} at top level
 310 * Makefile::                    @file{Makefile.in} at top level
 311 * src/Makefile::                @file{Makefile.in} in @file{src/}
 312 * lib/gettext.h::               @file{gettext.h} in @file{lib/}
 313
 314 Autoconf macros for use in @file{configure.in}
 315
 316 * AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
 317 * AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
 318 * AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
 319 * AM_ICONV::                    AM_ICONV in @file{iconv.m4}
 320
 321 Integrating with CVS
 322
 323 * Distributed CVS::             Avoiding version mismatch in distributed development
 324 * Files under CVS::             Files to put under CVS version control
 325 * autopoint Invocation::        Invoking the @code{autopoint} Program
 326
 327 Other Programming Languages
 328
 329 * Language Implementors::       The Language Implementor's View
 330 * Programmers for other Languages::  The Programmer's View
 331 * Translators for other Languages::  The Translator's View
 332 * Maintainers for other Languages::  The Maintainer's View
 333 * List of Programming Languages::  Individual Programming Languages
 334 * List of Data Formats::        Internationalizable Data
 335
 336 The Translator's View
 337
 338 * c-format::                    C Format Strings
 339 * objc-format::                 Objective C Format Strings
 340 * sh-format::                   Shell Format Strings
 341 * python-format::               Python Format Strings
 342 * lisp-format::                 Lisp Format Strings
 343 * elisp-format::                Emacs Lisp Format Strings
 344 * librep-format::               librep Format Strings
 345 * scheme-format::               Scheme Format Strings
 346 * smalltalk-format::            Smalltalk Format Strings
 347 * java-format::                 Java Format Strings
 348 * csharp-format::               C# Format Strings
 349 * awk-format::                  awk Format Strings
 350 * object-pascal-format::        Object Pascal Format Strings
 351 * ycp-format::                  YCP Format Strings
 352 * tcl-format::                  Tcl Format Strings
 353 * perl-format::                 Perl Format Strings
 354 * php-format::                  PHP Format Strings
 355 * gcc-internal-format::         GCC internal Format Strings
 356 * qt-format::                   Qt Format Strings
 357
 358 Individual Programming Languages
 359
 360 * C::                           C, C++, Objective C
 361 * sh::                          sh - Shell Script
 362 * bash::                        bash - Bourne-Again Shell Script
 363 * Python::                      Python
 364 * Common Lisp::                 GNU clisp - Common Lisp
 365 * clisp C::                     GNU clisp C sources
 366 * Emacs Lisp::                  Emacs Lisp
 367 * librep::                      librep
 368 * Scheme::                      GNU guile - Scheme
 369 * Smalltalk::                   GNU Smalltalk
 370 * Java::                        Java
 371 * C#::                          C#
 372 * gawk::                        GNU awk
 373 * Pascal::                      Pascal - Free Pascal Compiler
 374 * wxWindows::                   wxWindows library
 375 * YCP::                         YCP - YaST2 scripting language
 376 * Tcl::                         Tcl - Tk's scripting language
 377 * Perl::                        Perl
 378 * PHP::                         PHP Hypertext Preprocessor
 379 * Pike::                        Pike
 380 * GCC-source::                  GNU Compiler Collection sources
 381
 382 sh - Shell Script
 383
 384 * Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
 385 * gettext.sh::                  Contents of @code{gettext.sh}
 386 * gettext Invocation::          Invoking the @code{gettext} program
 387 * ngettext Invocation::         Invoking the @code{ngettext} program
 388 * envsubst Invocation::         Invoking the @code{envsubst} program
 389 * eval_gettext Invocation::     Invoking the @code{eval_gettext} function
 390 * eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
 391
 392 Perl
 393
 394 * General Problems::            General Problems Parsing Perl Code
 395 * Default Keywords::            Which Keywords Will xgettext Look For?
 396 * Special Keywords::            How to Extract Hash Keys
 397 * Quote-like Expressions::      What are Strings And Quote-like Expressions?
 398 * Interpolation I::             Invalid String Interpolation
 399 * Interpolation II::            Valid String Interpolation
 400 * Parentheses::                 When To Use Parentheses
 401 * Long Lines::                  How To Grok with Long Lines
 402 * Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
 403
 404 Internationalizable Data
 405
 406 * POT::                         POT - Portable Object Template
 407 * RST::                         Resource String Table
 408 * Glade::                       Glade - GNOME user interface description
 409
 410 Concluding Remarks
 411
 412 * History::                     History of GNU @code{gettext}
 413 * References::                  Related Readings
 414
 415 @end detailmenu
 416 @end menu
 417
 418 @end ifinfo
 419
 420 @node Introduction, Basics, Top, Top
 421 @chapter Introduction
 422
 423 @quotation
 424 This manual is still in @emph{DRAFT} state.  Some sections are still
 425 empty, or almost.  We keep merging material from other sources
 426 (essentially e-mail folders) while the proper integration of this
 427 material is delayed.
 428 @end quotation
 429
 430 @cindex sex
 431 @cindex he, she, and they
 432 @cindex she, he, and they
 433 In this manual, we use @emph{he} when speaking of the programmer or
 434 maintainer, @emph{she} when speaking of the translator, and @emph{they}
 435 when speaking of the installers or end users of the translated program.
 436 This is only a convenience for clarifying the documentation.  It is
 437 @emph{absolutely} not meant to imply that some roles are more appropriate
 438 to males or females.  Besides, as you might guess, GNU @code{gettext}
 439 is meant to be useful for people using computers, whatever their sex,
 440 race, religion or nationality!
 441
 442 This chapter explains the goals sought in the creation
 443 of GNU @code{gettext} and the free Translation Project.
 444 Then, it explains a few broad concepts around
 445 Native Language Support, and positions message translation with regard
 446 to other aspects of national and cultural variance, as they apply to
 447 to programs.  It also surveys those files used to convey the
 448 translations.  It explains how the various tools interact in the
 449 initial generation of these files, and later, how the maintenance
 450 cycle should usually operate.
 451
 452 @cindex bug report address
 453 Please send suggestions and corrections to:
 454
 455 @example
 456 @group
 457 @r{Internet address:}
 458     bug-gnu-gettext@@gnu.org
 459 @end group
 460 @end example
 461
 462 @noindent
 463 Please include the manual's edition number and update date in your messages.
 464
 465 @menu
 466 * Why::                         The Purpose of GNU @code{gettext}
 467 * Concepts::                    I18n, L10n, and Such
 468 * Aspects::                     Aspects in Native Language Support
 469 * Files::                       Files Conveying Translations
 470 * Overview::                    Overview of GNU @code{gettext}
 471 @end menu
 472
 473 @node Why, Concepts, Introduction, Introduction
 474 @section The Purpose of GNU @code{gettext}
 475
 476 Usually, programs are written and documented in English, and use
 477 English at execution time to interact with users.  This is true
 478 not only of GNU software, but also of a great deal of commercial
 479 and free software.  Using a common language is quite handy for
 480 communication between developers, maintainers and users from all
 481 countries.  On the other hand, most people are less comfortable with
 482 English than with their own native language, and would prefer to
 483 use their mother tongue for day to day's work, as far as possible.
 484 Many would simply @emph{love} to see their computer screen showing
 485 a lot less of English, and far more of their own language.
 486
 487 @cindex Translation Project
 488 However, to many people, this dream might appear so far fetched that
 489 they may believe it is not even worth spending time thinking about
 490 it.  They have no confidence at all that the dream might ever
 491 become true.  Yet some have not lost hope, and have organized themselves.
 492 The Translation Project is a formalization of this hope into a
 493 workable structure, which has a good chance to get all of us nearer
 494 the achievement of a truly multi-lingual set of programs.
 495
 496 GNU @code{gettext} is an important step for the Translation Project,
 497 as it is an asset on which we may build many other steps.  This package
 498 offers to programmers, translators and even users, a well integrated
 499 set of tools and documentation.  Specifically, the GNU @code{gettext}
 500 utilities are a set of tools that provides a framework within which
 501 other free packages may produce multi-lingual messages.  These tools
 502 include
 503
 504 @itemize @bullet
 505 @item
 506 A set of conventions about how programs should be written to support
 507 message catalogs.
 508
 509 @item
 510 A directory and file naming organization for the message catalogs
 511 themselves.
 512
 513 @item
 514 A runtime library supporting the retrieval of translated messages.
 515
 516 @item
 517 A few stand-alone programs to massage in various ways the sets of
 518 translatable strings, or already translated strings.
 519
 520 @item
 521 A special mode for Emacs@footnote{In this manual, all mentions of Emacs
 522 refers to either GNU Emacs or to XEmacs, which people sometimes call FSF
 523 Emacs and Lucid Emacs, respectively.} which helps preparing these sets
 524 and bringing them up to date.
 525 @end itemize
 526
 527 GNU @code{gettext} is designed to minimize the impact of
 528 internationalization on program sources, keeping this impact as small
 529 and hardly noticeable as possible.  Internationalization has better
 530 chances of succeeding if it is very light weighted, or at least,
 531 appear to be so, when looking at program sources.
 532
 533 The Translation Project also uses the GNU @code{gettext} distribution
 534 as a vehicle for documenting its structure and methods.  This goes
 535 beyond the strict technicalities of documenting the GNU @code{gettext}
 536 proper.  By so doing, translators will find in a single place, as
 537 far as possible, all they need to know for properly doing their
 538 translating work.  Also, this supplemental documentation might also
 539 help programmers, and even curious users, in understanding how GNU
 540 @code{gettext} is related to the remainder of the Translation
 541 Project, and consequently, have a glimpse at the @emph{big picture}.
 542
 543 @node Concepts, Aspects, Why, Introduction
 544 @section I18n, L10n, and Such
 545
 546 @cindex i18n
 547 @cindex l10n
 548 Two long words appear all the time when we discuss support of native
 549 language in programs, and these words have a precise meaning, worth
 550 being explained here, once and for all in this document.  The words are
 551 @emph{internationalization} and @emph{localization}.  Many people,
 552 tired of writing these long words over and over again, took the
 553 habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first
 554 and last letter of each word, and replacing the run of intermediate
 555 letters by a number merely telling how many such letters there are.
 556 But in this manual, in the sake of clarity, we will patiently write
 557 the names in full, each time@dots{}
 558
 559 @cindex internationalization
 560 By @dfn{internationalization}, one refers to the operation by which a
 561 program, or a set of programs turned into a package, is made aware of and
 562 able to support multiple languages.  This is a generalization process,
 563 by which the programs are untied from calling only English strings or
 564 other English specific habits, and connected to generic ways of doing
 565 the same, instead.  Program developers may use various techniques to
 566 internationalize their programs.  Some of these have been standardized.
 567 GNU @code{gettext} offers one of these standards.  @xref{Programmers}.
 568
 569 @cindex localization
 570 By @dfn{localization}, one means the operation by which, in a set
 571 of programs already internationalized, one gives the program all
 572 needed information so that it can adapt itself to handle its input
 573 and output in a fashion which is correct for some native language and
 574 cultural habits.  This is a particularisation process, by which generic
 575 methods already implemented in an internationalized program are used
 576 in specific ways.  The programming environment puts several functions
 577 to the programmers disposal which allow this runtime configuration.
 578 The formal description of specific set of cultural habits for some
 579 country, together with all associated translations targeted to the
 580 same native language, is called the @dfn{locale} for this language
 581 or country.  Users achieve localization of programs by setting proper
 582 values to special environment variables, prior to executing those
 583 programs, identifying which locale should be used.
 584
 585 In fact, locale message support is only one component of the cultural
 586 data that makes up a particular locale.  There are a whole host of
 587 routines and functions provided to aid programmers in developing
 588 internationalized software and which allow them to access the data
 589 stored in a particular locale.  When someone presently refers to a
 590 particular locale, they are obviously referring to the data stored
 591 within that particular locale.  Similarly, if a programmer is referring
 592 to ``accessing the locale routines'', they are referring to the
 593 complete suite of routines that access all of the locale's information.
 594
 595 @cindex NLS
 596 @cindex Native Language Support
 597 @cindex Natural Language Support
 598 One uses the expression @dfn{Native Language Support}, or merely NLS,
 599 for speaking of the overall activity or feature encompassing both
 600 internationalization and localization, allowing for multi-lingual
 601 interactions in a program.  In a nutshell, one could say that
 602 internationalization is the operation by which further localizations
 603 are made possible.
 604
 605 Also, very roughly said, when it comes to multi-lingual messages,
 606 internationalization is usually taken care of by programmers, and
 607 localization is usually taken care of by translators.
 608
 609 @node Aspects, Files, Concepts, Introduction
 610 @section Aspects in Native Language Support
 611
 612 @cindex translation aspects
 613 For a totally multi-lingual distribution, there are many things to
 614 translate beyond output messages.
 615
 616 @itemize @bullet
 617 @item
 618 As of today, GNU @code{gettext} offers a complete toolset for
 619 translating messages output by C programs.  Perl scripts and shell
 620 scripts will also need to be translated.  Even if there are today some hooks
 621 by which this can be done, these hooks are not integrated as well as they
 622 should be.
 623
 624 @item
 625 Some programs, like @code{autoconf} or @code{bison}, are able
 626 to produce other programs (or scripts).  Even if the generating
 627 programs themselves are internationalized, the generated programs they
 628 produce may need internationalization on their own, and this indirect
 629 internationalization could be automated right from the generating
 630 program.  In fact, quite usually, generating and generated programs
 631 could be internationalized independently, as the effort needed is
 632 fairly orthogonal.
 633
 634 @item
 635 A few programs include textual tables which might need translation
 636 themselves, independently of the strings contained in the program
 637 itself.  For example, @w{RFC 1345} gives an English description for each
 638 character which the @code{recode} program is able to reconstruct at execution.
 639 Since these descriptions are extracted from the RFC by mechanical means,
 640 translating them properly would require a prior translation of the RFC
 641 itself.
 642
 643 @item
 644 Almost all programs accept options, which are often worded out so to
 645 be descriptive for the English readers; one might want to consider
 646 offering translated versions for program options as well.
 647
 648 @item
 649 Many programs read, interpret, compile, or are somewhat driven by
 650 input files which are texts containing keywords, identifiers, or
 651 replies which are inherently translatable.  For example, one may want
 652 @code{gcc} to allow diacriticized characters in identifiers or use
 653 translated keywords; @samp{rm -i} might accept something else than
 654 @samp{y} or @samp{n} for replies, etc.  Even if the program will
 655 eventually make most of its output in the foreign languages, one has
 656 to decide whether the input syntax, option values, etc., are to be
 657 localized or not.
 658
 659 @item
 660 The manual accompanying a package, as well as all documentation files
 661 in the distribution, could surely be translated, too.  Translating a
 662 manual, with the intent of later keeping up with updates, is a major
 663 undertaking in itself, generally.
 664
 665 @end itemize
 666
 667 As we already stressed, translation is only one aspect of locales.
 668 Other internationalization aspects are system services and are handled
 669 in GNU @code{libc}.  There
 670 are many attributes that are needed to define a country's cultural
 671 conventions.  These attributes include beside the country's native
 672 language, the formatting of the date and time, the representation of
 673 numbers, the symbols for currency, etc.  These local @dfn{rules} are
 674 termed the country's locale.  The locale represents the knowledge
 675 needed to support the country's native attributes.
 676
 677 @cindex locale facets
 678 There are a few major areas which may vary between countries and
 679 hence, define what a locale must describe.  The following list helps
 680 putting multi-lingual messages into the proper context of other tasks
 681 related to locales.  See the GNU @code{libc} manual for details.
 682
 683 @table @emph
 684
 685 @item Characters and Codesets
 686 @cindex codeset
 687 @cindex encoding
 688 @cindex character encoding
 689 @cindex locale facet, LC_CTYPE
 690
 691 The codeset most commonly used through out the USA and most English
 692 speaking parts of the world is the ASCII codeset.  However, there are
 693 many characters needed by various locales that are not found within
 694 this codeset.  The 8-bit @w{ISO 8859-1} code set has most of the special
 695 characters needed to handle the major European languages.  However, in
 696 many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it
 697 doesn't even handle the major European currency.  Hence each locale
 698 will need to specify which codeset they need to use and will need
 699 to have the appropriate character handling routines to cope with
 700 the codeset.
 701
 702 @item Currency
 703 @cindex currency symbols
 704 @cindex locale facet, LC_MONETARY
 705
 706 The symbols used vary from country to country as does the position
 707 used by the symbol.  Software needs to be able to transparently
 708 display currency figures in the native mode for each locale.
 709
 710 @item Dates
 711 @cindex date format
 712 @cindex locale facet, LC_TIME
 713
 714 The format of date varies between locales.  For example, Christmas day
 715 in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
 716 Other countries might use @w{ISO 8601} dates, etc.
 717
 718 Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},
 719 or otherwise.  Some locales require time to be specified in 24-hour
 720 mode rather than as AM or PM.  Further, the nature and yearly extent
 721 of the Daylight Saving correction vary widely between countries.
 722
 723 @item Numbers
 724 @cindex number format
 725 @cindex locale facet, LC_NUMERIC
 726
 727 Numbers can be represented differently in different locales.
 728 For example, the following numbers are all written correctly for
 729 their respective locales:
 730
 731 @example
 732 12,345.67       English
 733 12.345,67       German
 734  12345,67       French
 735 1,2345.67       Asia
 736 @end example
 737
 738 Some programs could go further and use different unit systems, like
 739 English units or Metric units, or even take into account variants
 740 about how numbers are spelled in full.
 741
 742 @item Messages
 743 @cindex messages
 744 @cindex locale facet, LC_MESSAGES
 745
 746 The most obvious area is the language support within a locale.  This is
 747 where GNU @code{gettext} provides the means for developers and users to
 748 easily change the language that the software uses to communicate to
 749 the user.
 750
 751 @end table
 752
 753 @cindex Linux
 754 Components of locale outside of message handling are standardized in
 755 the ISO C standard and the SUSV2 specification.  GNU @code{libc}
 756 fully implements this, and most other modern systems provide a more
 757 or less reasonable support for at least some of the missing components.
 758
 759 @node Files, Overview, Aspects, Introduction
 760 @section Files Conveying Translations
 761
 762 @cindex files, @file{.po} and @file{.mo}
 763 The letters PO in @file{.po} files means Portable Object, to
 764 distinguish it from @file{.mo} files, where MO stands for Machine
 765 Object.  This paradigm, as well as the PO file format, is inspired
 766 by the NLS standard developed by Uniforum, and first implemented by
 767 Sun in their Solaris system.
 768
 769 PO files are meant to be read and edited by humans, and associate each
 770 original, translatable string of a given package with its translation
 771 in a particular target language.  A single PO file is dedicated to
 772 a single target language.  If a package supports many languages,
 773 there is one such PO file per language supported, and each package
 774 has its own set of PO files.  These PO files are best created by
 775 the @code{xgettext} program, and later updated or refreshed through
 776 the @code{msgmerge} program.  Program @code{xgettext} extracts all
 777 marked messages from a set of C files and initializes a PO file with
 778 empty translations.  Program @code{msgmerge} takes care of adjusting
 779 PO files between releases of the corresponding sources, commenting
 780 obsolete entries, initializing new ones, and updating all source
 781 line references.  Files ending with @file{.pot} are kind of base
 782 translation files found in distributions, in PO file format.
 783
 784 MO files are meant to be read by programs, and are binary in nature.
 785 A few systems already offer tools for creating and handling MO files
 786 as part of the Native Language Support coming with the system, but the
 787 format of these MO files is often different from system to system,
 788 and non-portable.  The tools already provided with these systems don't
 789 support all the features of GNU @code{gettext}.  Therefore GNU
 790 @code{gettext} uses its own format for MO files.  Files ending with
 791 @file{.gmo} are really MO files, when it is known that these files use
 792 the GNU format.
 793
 794 @node Overview,  , Files, Introduction
 795 @section Overview of GNU @code{gettext}
 796
 797 @cindex overview of @code{gettext}
 798 @cindex big picture
 799 @cindex tutorial of @code{gettext} usage
 800 The following diagram summarizes the relation between the files
 801 handled by GNU @code{gettext} and the tools acting on these files.
 802 It is followed by somewhat detailed explanations, which you should
 803 read while keeping an eye on the diagram.  Having a clear understanding
 804 of these interrelations will surely help programmers, translators
 805 and maintainers.
 806
 807 @example
 808 @group
 809 Original C Sources ---> PO mode ---> Marked C Sources ---.
 810                                                          |
 811               .---------<--- GNU gettext Library         |
 812 .--- make <---+                                          |
 813 |             `---------<--------------------+-----------'
 814 |                                            |
 815 |   .-----<--- PACKAGE.pot <--- xgettext <---'   .---<--- PO Compendium
 816 |   |                                            |             ^
 817 |   |                                            `---.         |
 818 |   `---.                                            +---> PO mode ---.
 819 |       +----> msgmerge ------> LANG.po ---->--------'                |
 820 |   .---'                                                             |
 821 |   |                                                                 |
 822 |   `-------------<---------------.                                   |
 823 |                                 +--- New LANG.po <------------------'
 824 |   .--- LANG.gmo <--- msgfmt <---'
 825 |   |
 826 |   `---> install ---> /.../LANG/PACKAGE.mo ---.
 827 |                                              +---> "Hello world!"
 828 `-------> install ---> /.../bin/PROGRAM -------'
 829 @end group
 830 @end example
 831
 832 The indication @samp{PO mode} appears in two places in this picture,
 833 and you may safely read it as merely meaning ``hand editing'', using
 834 any editor of your choice, really.  However, for those of you being
 835 the lucky users of Emacs, PO mode has been specifically created
 836 for providing a cozy environment for editing or modifying PO files.
 837 While editing a PO file, PO mode allows for the easy browsing of
 838 auxiliary and compendium PO files, as well as for following references into
 839 the set of C program sources from which PO files have been derived.
 840 It has a few special features, among which are the interactive marking
 841 of program strings as translatable, and the validation of PO files
 842 with easy repositioning to PO file lines showing errors.
 843
 844 @cindex marking translatable strings
 845 As a programmer, the first step to bringing GNU @code{gettext}
 846 into your package is identifying, right in the C sources, those strings
 847 which are meant to be translatable, and those which are untranslatable.
 848 This tedious job can be done a little more comfortably using emacs PO
 849 mode, but you can use any means familiar to you for modifying your
 850 C sources.  Beside this some other simple, standard changes are needed to
 851 properly initialize the translation library.  @xref{Sources}, for
 852 more information about all this.
 853
 854 For newly written software the strings of course can and should be
 855 marked while writing it.  The @code{gettext} approach makes this
 856 very easy.  Simply put the following lines at the beginning of each file
 857 or in a central header file:
 858
 859 @example
 860 @group
 861 #define _(String) (String)
 862 #define N_(String) String
 863 #define textdomain(Domain)
 864 #define bindtextdomain(Package, Directory)
 865 @end group
 866 @end example
 867
 868 @noindent
 869 Doing this allows you to prepare the sources for internationalization.
 870 Later when you feel ready for the step to use the @code{gettext} library
 871 simply replace these definitions by the following:
 872
 873 @cindex include file @file{libintl.h}
 874 @example
 875 @group
 876 #include <libintl.h>
 877 #define _(String) gettext (String)
 878 #define gettext_noop(String) String
 879 #define N_(String) gettext_noop (String)
 880 @end group
 881 @end example
 882
 883 @cindex link with @file{libintl}
 884 @cindex Linux
 885 @noindent
 886 and link against @file{libintl.a} or @file{libintl.so}.  Note that on
 887 GNU systems, you don't need to link with @code{libintl} because the
 888 @code{gettext} library functions are already contained in GNU libc.
 889 That is all you have to change.
 890
 891 @cindex template PO file
 892 @cindex files, @file{.pot}
 893 Once the C sources have been modified, the @code{xgettext} program
 894 is used to find and extract all translatable strings, and create a
 895 PO template file out of all these.  This @file{@var{package}.pot} file
 896 contains all original program strings.  It has sets of pointers to
 897 exactly where in C sources each string is used.  All translations
 898 are set to empty.  The letter @code{t} in @file{.pot} marks this as
 899 a Template PO file, not yet oriented towards any particular language.
 900 @xref{xgettext Invocation}, for more details about how one calls the
 901 @code{xgettext} program.  If you are @emph{really} lazy, you might
 902 be interested at working a lot more right away, and preparing the
 903 whole distribution setup (@pxref{Maintainers}).  By doing so, you
 904 spare yourself typing the @code{xgettext} command, as @code{make}
 905 should now generate the proper things automatically for you!
 906
 907 The first time through, there is no @file{@var{lang}.po} yet, so the
 908 @code{msgmerge} step may be skipped and replaced by a mere copy of
 909 @file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang}
 910 represents the target language.  See @ref{Creating} for details.
 911
 912 Then comes the initial translation of messages.  Translation in
 913 itself is a whole matter, still exclusively meant for humans,
 914 and whose complexity far overwhelms the level of this manual.
 915 Nevertheless, a few hints are given in some other chapter of this
 916 manual (@pxref{Translators}).  You will also find there indications
 917 about how to contact translating teams, or becoming part of them,
 918 for sharing your translating concerns with others who target the same
 919 native language.
 920
 921 While adding the translated messages into the @file{@var{lang}.po}
 922 PO file, if you do not have Emacs handy, you are on your own
 923 for ensuring that your efforts fully respect the PO file format, and quoting
 924 conventions (@pxref{PO Files}).  This is surely not an impossible task,
 925 as this is the way many people have handled PO files already for Uniforum or
 926 Solaris.  On the other hand, by using PO mode in Emacs, most details
 927 of PO file format are taken care of for you, but you have to acquire
 928 some familiarity with PO mode itself.  Besides main PO mode commands
 929 (@pxref{Main PO Commands}), you should know how to move between entries
 930 (@pxref{Entry Positioning}), and how to handle untranslated entries
 931 (@pxref{Untranslated Entries}).
 932
 933 If some common translations have already been saved into a compendium
 934 PO file, translators may use PO mode for initializing untranslated
 935 entries from the compendium, and also save selected translations into
 936 the compendium, updating it (@pxref{Compendium}).  Compendium files
 937 are meant to be exchanged between members of a given translation team.
 938
 939 Programs, or packages of programs, are dynamic in nature: users write
 940 bug reports and suggestion for improvements, maintainers react by
 941 modifying programs in various ways.  The fact that a package has
 942 already been internationalized should not make maintainers shy
 943 of adding new strings, or modifying strings already translated.
 944 They just do their job the best they can.  For the Translation
 945 Project to work smoothly, it is important that maintainers do not
 946 carry translation concerns on their already loaded shoulders, and that
 947 translators be kept as free as possible of programming concerns.
 948
 949 The only concern maintainers should have is carefully marking new
 950 strings as translatable, when they should be, and do not otherwise
 951 worry about them being translated, as this will come in proper time.
 952 Consequently, when programs and their strings are adjusted in various
 953 ways by maintainers, and for matters usually unrelated to translation,
 954 @code{xgettext} would construct @file{@var{package}.pot} files which are
 955 evolving over time, so the translations carried by @file{@var{lang}.po}
 956 are slowly fading out of date.
 957
 958 @cindex evolution of packages
 959 It is important for translators (and even maintainers) to understand
 960 that package translation is a continuous process in the lifetime of a
 961 package, and not something which is done once and for all at the start.
 962 After an initial burst of translation activity for a given package,
 963 interventions are needed once in a while, because here and there,
 964 translated entries become obsolete, and new untranslated entries
 965 appear, needing translation.
 966
 967 The @code{msgmerge} program has the purpose of refreshing an already
 968 existing @file{@var{lang}.po} file, by comparing it with a newer
 969 @file{@var{package}.pot} template file, extracted by @code{xgettext}
 970 out of recent C sources.  The refreshing operation adjusts all
 971 references to C source locations for strings, since these strings
 972 move as programs are modified.  Also, @code{msgmerge} comments out as
 973 obsolete, in @file{@var{lang}.po}, those already translated entries
 974 which are no longer used in the program sources (@pxref{Obsolete
 975 Entries}).  It finally discovers new strings and inserts them in
 976 the resulting PO file as untranslated entries (@pxref{Untranslated
 977 Entries}).  @xref{msgmerge Invocation}, for more information about what
 978 @code{msgmerge} really does.
 979
 980 Whatever route or means taken, the goal is to obtain an updated
 981 @file{@var{lang}.po} file offering translations for all strings.
 982
 983 The temporal mobility, or fluidity of PO files, is an integral part of
 984 the translation game, and should be well understood, and accepted.
 985 People resisting it will have a hard time participating in the
 986 Translation Project, or will give a hard time to other participants!  In
 987 particular, maintainers should relax and include all available official
 988 PO files in their distributions, even if these have not recently been
 989 updated, without exerting pressure on the translator teams to get the
 990 job done.  The pressure should rather come
 991 from the community of users speaking a particular language, and
 992 maintainers should consider themselves fairly relieved of any concern
 993 about the adequacy of translation files.  On the other hand, translators
 994 should reasonably try updating the PO files they are responsible for,
 995 while the package is undergoing pretest, prior to an official
 996 distribution.
 997
 998 Once the PO file is complete and dependable, the @code{msgfmt} program
 999 is used for turning the PO file into a machine-oriented format, which
1000 may yield efficient retrieval of translations by the programs of the
1001 package, whenever needed at runtime (@pxref{MO Files}).  @xref{msgfmt
1002 Invocation}, for more information about all modes of execution
1003 for the @code{msgfmt} program.
1004
1005 Finally, the modified and marked C sources are compiled and linked
1006 with the GNU @code{gettext} library, usually through the operation of
1007 @code{make}, given a suitable @file{Makefile} exists for the project,
1008 and the resulting executable is installed somewhere users will find it.
1009 The MO files themselves should also be properly installed.  Given the
1010 appropriate environment variables are set (@pxref{End Users}), the
1011 program should localize itself automatically, whenever it executes.
1012
1013 The remainder of this manual has the purpose of explaining in depth the various
1014 steps outlined above.
1015
1016 @node Basics, Sources, Introduction, Top
1017 @chapter PO Files and PO Mode Basics
1018
1019 The GNU @code{gettext} toolset helps programmers and translators
1020 at producing, updating and using translation files, mainly those
1021 PO files which are textual, editable files.  This chapter stresses
1022 the format of PO files, and contains a PO mode starter.  PO mode
1023 description is spread throughout this manual instead of being concentrated
1024 in one place.  Here we present only the basics of PO mode.
1025
1026 @menu
1027 * Installation::                Completing GNU @code{gettext} Installation
1028 * PO Files::                    The Format of PO Files
1029 * Main PO Commands::            Main Commands
1030 * Entry Positioning::           Entry Positioning
1031 * Normalizing::                 Normalizing Strings in Entries
1032 @end menu
1033
1034 @node Installation, PO Files, Basics, Basics
1035 @section Completing GNU @code{gettext} Installation
1036
1037 @cindex installing @code{gettext}
1038 @cindex @code{gettext} installation
1039 Once you have received, unpacked, configured and compiled the GNU
1040 @code{gettext} distribution, the @samp{make install} command puts in
1041 place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and
1042 @code{msgmerge}, as well as their available message catalogs.  To
1043 top off a comfortable installation, you might also want to make the
1044 PO mode available to your Emacs users.
1045
1046 @emindex @file{.emacs} customizations
1047 @emindex installing PO mode
1048 During the installation of the PO mode, you might want to modify your
1049 file @file{.emacs}, once and for all, so it contains a few lines looking
1050 like:
1051
1052 @example
1053 (setq auto-mode-alist
1054       (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
1055 (autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
1056 @end example
1057
1058 Later, whenever you edit some @file{.po}
1059 file, or any file having the string @samp{.po.} within its name,
1060 Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and
1061 automatically activates PO mode commands for the associated buffer.
1062 The string @emph{PO} appears in the mode line for any buffer for
1063 which PO mode is active.  Many PO files may be active at once in a
1064 single Emacs session.
1065
1066 If you are using Emacs version 20 or newer, and have already installed
1067 the appropriate international fonts on your system, you may also tell
1068 Emacs how to determine automatically the coding system of every PO file.
1069 This will often (but not always) cause the necessary fonts to be loaded
1070 and used for displaying the translations on your Emacs screen.  For this
1071 to happen, add the lines:
1072
1073 @example
1074 (modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
1075                             'po-find-file-coding-system)
1076 (autoload 'po-find-file-coding-system "po-mode")
1077 @end example
1078
1079 @noindent
1080 to your @file{.emacs} file.  If, with this, you still see boxes instead
1081 of international characters, try a different font set (via Shift Mouse
1082 button 1).
1083
1084 @node PO Files, Main PO Commands, Installation, Basics
1085 @section The Format of PO Files
1086 @cindex PO files' format
1087 @cindex file format, @file{.po}
1088
1089 A PO file is made up of many entries, each entry holding the relation
1090 between an original untranslated string and its corresponding
1091 translation.  All entries in a given PO file usually pertain
1092 to a single project, and all translations are expressed in a single
1093 target language.  One PO file @dfn{entry} has the following schematic
1094 structure:
1095
1096 @example
1097 @var{white-space}
1098 #  @var{translator-comments}
1099 #. @var{automatic-comments}
1100 #: @var{reference}@dots{}
1101 #, @var{flag}@dots{}
1102 msgid @var{untranslated-string}
1103 msgstr @var{translated-string}
1104 @end example
1105
1106 The general structure of a PO file should be well understood by
1107 the translator.  When using PO mode, very little has to be known
1108 about the format details, as PO mode takes care of them for her.
1109
1110 A simple entry can look like this:
1111
1112 @example
1113 #: lib/error.c:116
1114 msgid "Unknown system error"
1115 msgstr "Error desconegut del sistema"
1116 @end example
1117
1118 Entries begin with some optional white space.  Usually, when generated
1119 through GNU @code{gettext} tools, there is exactly one blank line
1120 between entries.  Then comments follow, on lines all starting with the
1121 character @code{#}.  There are two kinds of comments: those which have
1122 some white space immediately following the @code{#}, which comments are
1123 created and maintained exclusively by the translator, and those which
1124 have some non-white character just after the @code{#}, which comments
1125 are created and maintained automatically by GNU @code{gettext} tools.
1126 All comments, of either kind, are optional.
1127
1128 @kwindex msgid
1129 @kwindex msgstr
1130 After white space and comments, entries show two strings, namely
1131 first the untranslated string as it appears in the original program
1132 sources, and then, the translation of this string.  The original
1133 string is introduced by the keyword @code{msgid}, and the translation,
1134 by @code{msgstr}.  The two strings, untranslated and translated,
1135 are quoted in various ways in the PO file, using @code{"}
1136 delimiters and @code{\} escapes, but the translator does not really
1137 have to pay attention to the precise quoting format, as PO mode fully
1138 takes care of quoting for her.
1139
1140 The @code{msgid} strings, as well as automatic comments, are produced
1141 and managed by other GNU @code{gettext} tools, and PO mode does not
1142 provide means for the translator to alter these.  The most she can
1143 do is merely deleting them, and only by deleting the whole entry.
1144 On the other hand, the @code{msgstr} string, as well as translator
1145 comments, are really meant for the translator, and PO mode gives her
1146 the full control she needs.
1147
1148 The comment lines beginning with @code{#,} are special because they are
1149 not completely ignored by the programs as comments generally are.  The
1150 comma separated list of @var{flag}s is used by the @code{msgfmt}
1151 program to give the user some better diagnostic messages.  Currently
1152 there are two forms of flags defined:
1153
1154 @table @code
1155 @item fuzzy
1156 @kwindex fuzzy@r{ flag}
1157 This flag can be generated by the @code{msgmerge} program or it can be
1158 inserted by the translator herself.  It shows that the @code{msgstr}
1159 string might not be a correct translation (anymore).  Only the translator
1160 can judge if the translation requires further modification, or is
1161 acceptable as is.  Once satisfied with the translation, she then removes
1162 this @code{fuzzy} attribute.  The @code{msgmerge} program inserts this
1163 when it combined the @code{msgid} and @code{msgstr} entries after fuzzy
1164 search only.  @xref{Fuzzy Entries}.
1165
1166 @item c-format
1167 @kwindex c-format@r{ flag}
1168 @itemx no-c-format
1169 @kwindex no-c-format@r{ flag}
1170 These flags should not be added by a human.  Instead only the
1171 @code{xgettext} program adds them.  In an automated PO file processing
1172 system as proposed here the user changes would be thrown away again as
1173 soon as the @code{xgettext} program generates a new template file.
1174
1175 The @code{c-format} flag tells that the untranslated string and the
1176 translation are supposed to be C format strings.  The @code{no-c-format}
1177 flag tells that they are not C format strings, even though the untranslated
1178 string happens to look like a C format string (with @samp{%} directives).
1179
1180 In case the @code{c-format} flag is given for a string the @code{msgfmt}
1181 does some more tests to check to validity of the translation.
1182 @xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}.
1183
1184 @item objc-format
1185 @kwindex objc-format@r{ flag}
1186 @itemx no-objc-format
1187 @kwindex no-objc-format@r{ flag}
1188 Likewise for Objective C, see @ref{objc-format}.
1189
1190 @item sh-format
1191 @kwindex sh-format@r{ flag}
1192 @itemx no-sh-format
1193 @kwindex no-sh-format@r{ flag}
1194 Likewise for Shell, see @ref{sh-format}.
1195
1196 @item python-format
1197 @kwindex python-format@r{ flag}
1198 @itemx no-python-format
1199 @kwindex no-python-format@r{ flag}
1200 Likewise for Python, see @ref{python-format}.
1201
1202 @item lisp-format
1203 @kwindex lisp-format@r{ flag}
1204 @itemx no-lisp-format
1205 @kwindex no-lisp-format@r{ flag}
1206 Likewise for Lisp, see @ref{lisp-format}.
1207
1208 @item elisp-format
1209 @kwindex elisp-format@r{ flag}
1210 @itemx no-elisp-format
1211 @kwindex no-elisp-format@r{ flag}
1212 Likewise for Emacs Lisp, see @ref{elisp-format}.
1213
1214 @item librep-format
1215 @kwindex librep-format@r{ flag}
1216 @itemx no-librep-format
1217 @kwindex no-librep-format@r{ flag}
1218 Likewise for librep, see @ref{librep-format}.
1219
1220 @item scheme-format
1221 @kwindex scheme-format@r{ flag}
1222 @itemx no-scheme-format
1223 @kwindex no-scheme-format@r{ flag}
1224 Likewise for Scheme, see @ref{scheme-format}.
1225
1226 @item smalltalk-format
1227 @kwindex smalltalk-format@r{ flag}
1228 @itemx no-smalltalk-format
1229 @kwindex no-smalltalk-format@r{ flag}
1230 Likewise for Smalltalk, see @ref{smalltalk-format}.
1231
1232 @item java-format
1233 @kwindex java-format@r{ flag}
1234 @itemx no-java-format
1235 @kwindex no-java-format@r{ flag}
1236 Likewise for Java, see @ref{java-format}.
1237
1238 @item csharp-format
1239 @kwindex csharp-format@r{ flag}
1240 @itemx no-csharp-format
1241 @kwindex no-csharp-format@r{ flag}
1242 Likewise for C#, see @ref{csharp-format}.
1243
1244 @item awk-format
1245 @kwindex awk-format@r{ flag}
1246 @itemx no-awk-format
1247 @kwindex no-awk-format@r{ flag}
1248 Likewise for awk, see @ref{awk-format}.
1249
1250 @item object-pascal-format
1251 @kwindex object-pascal-format@r{ flag}
1252 @itemx no-object-pascal-format
1253 @kwindex no-object-pascal-format@r{ flag}
1254 Likewise for Object Pascal, see @ref{object-pascal-format}.
1255
1256 @item ycp-format
1257 @kwindex ycp-format@r{ flag}
1258 @itemx no-ycp-format
1259 @kwindex no-ycp-format@r{ flag}
1260 Likewise for YCP, see @ref{ycp-format}.
1261
1262 @item tcl-format
1263 @kwindex tcl-format@r{ flag}
1264 @itemx no-tcl-format
1265 @kwindex no-tcl-format@r{ flag}
1266 Likewise for Tcl, see @ref{tcl-format}.
1267
1268 @item perl-format
1269 @kwindex perl-format@r{ flag}
1270 @itemx no-perl-format
1271 @kwindex no-perl-format@r{ flag}
1272 Likewise for Perl, see @ref{perl-format}.
1273
1274 @item perl-brace-format
1275 @kwindex perl-brace-format@r{ flag}
1276 @itemx no-perl-brace-format
1277 @kwindex no-perl-brace-format@r{ flag}
1278 Likewise for Perl brace, see @ref{perl-format}.
1279
1280 @item php-format
1281 @kwindex php-format@r{ flag}
1282 @itemx no-php-format
1283 @kwindex no-php-format@r{ flag}
1284 Likewise for PHP, see @ref{php-format}.
1285
1286 @item gcc-internal-format
1287 @kwindex gcc-internal-format@r{ flag}
1288 @itemx no-gcc-internal-format
1289 @kwindex no-gcc-internal-format@r{ flag}
1290 Likewise for the GCC sources, see @ref{gcc-internal-format}.
1291
1292 @item qt-format
1293 @kwindex qt-format@r{ flag}
1294 @itemx no-qt-format
1295 @kwindex no-qt-format@r{ flag}
1296 Likewise for Qt, see @ref{qt-format}.
1297
1298 @end table
1299
1300 @kwindex msgid_plural
1301 @cindex plural forms, in PO files
1302 A different kind of entries is used for translations which involve
1303 plural forms.
1304
1305 @example
1306 @var{white-space}
1307 #  @var{translator-comments}
1308 #. @var{automatic-comments}
1309 #: @var{reference}@dots{}
1310 #, @var{flag}@dots{}
1311 msgid @var{untranslated-string-singular}
1312 msgid_plural @var{untranslated-string-plural}
1313 msgstr[0] @var{translated-string-case-0}
1314 ...
1315 msgstr[N] @var{translated-string-case-n}
1316 @end example
1317
1318 Such an entry can look like this:
1319
1320 @example
1321 #: src/msgcmp.c:338 src/po-lex.c:699
1322 #, c-format
1323 msgid "found %d fatal error"
1324 msgid_plural "found %d fatal errors"
1325 msgstr[0] "s'ha trobat %d error fatal"
1326 msgstr[1] "s'han trobat %d errors fatals"
1327 @end example
1328
1329 @efindex po-normalize@r{, PO Mode command}
1330 It happens that some lines, usually whitespace or comments, follow the
1331 very last entry of a PO file.  Such lines are not part of any entry,
1332 and PO mode is unable to take action on those lines.  By using the
1333 PO mode function @w{@kbd{M-x po-normalize}}, the translator may get
1334 rid of those spurious lines.  @xref{Normalizing}.
1335
1336 The remainder of this section may be safely skipped by those using
1337 PO mode, yet it may be interesting for everybody to have a better
1338 idea of the precise format of a PO file.  On the other hand, those
1339 not having Emacs handy should carefully continue reading on.
1340
1341 Each of @var{untranslated-string} and @var{translated-string} respects
1342 the C syntax for a character string, including the surrounding quotes
1343 and embedded backslashed escape sequences.  When the time comes
1344 to write multi-line strings, one should not use escaped newlines.
1345 Instead, a closing quote should follow the last character on the
1346 line to be continued, and an opening quote should resume the string
1347 at the beginning of the following PO file line.  For example:
1348
1349 @example
1350 msgid ""
1351 "Here is an example of how one might continue a very long string\n"
1352 "for the common case the string represents multi-line output.\n"
1353 @end example
1354
1355 @noindent
1356 In this example, the empty string is used on the first line, to
1357 allow better alignment of the @code{H} from the word @samp{Here}
1358 over the @code{f} from the word @samp{for}.  In this example, the
1359 @code{msgid} keyword is followed by three strings, which are meant
1360 to be concatenated.  Concatenating the empty string does not change
1361 the resulting overall string, but it is a way for us to comply with
1362 the necessity of @code{msgid} to be followed by a string on the same
1363 line, while keeping the multi-line presentation left-justified, as
1364 we find this to be a cleaner disposition.  The empty string could have
1365 been omitted, but only if the string starting with @samp{Here} was
1366 promoted on the first line, right after @code{msgid}.@footnote{This
1367 limitation is not imposed by GNU @code{gettext}, but is for compatibility
1368 with the @code{msgfmt} implementation on Solaris.} It was not really necessary
1369 either to switch between the two last quoted strings immediately after
1370 the newline @samp{\n}, the switch could have occurred after @emph{any}
1371 other character, we just did it this way because it is neater.
1372
1373 @cindex newlines in PO files
1374 One should carefully distinguish between end of lines marked as
1375 @samp{\n} @emph{inside} quotes, which are part of the represented
1376 string, and end of lines in the PO file itself, outside string quotes,
1377 which have no incidence on the represented string.
1378
1379 @cindex comments in PO files
1380 Outside strings, white lines and comments may be used freely.
1381 Comments start at the beginning of a line with @samp{#} and extend
1382 until the end of the PO file line.  Comments written by translators
1383 should have the initial @samp{#} immediately followed by some white
1384 space.  If the @samp{#} is not immediately followed by white space,
1385 this comment is most likely generated and managed by specialized GNU
1386 tools, and might disappear or be replaced unexpectedly when the PO
1387 file is given to @code{msgmerge}.
1388
1389 @node Main PO Commands, Entry Positioning, PO Files, Basics
1390 @section Main PO mode Commands
1391
1392 @cindex PO mode (Emacs) commands
1393 @emindex commands
1394 After setting up Emacs with something similar to the lines in
1395 @ref{Installation}, PO mode is activated for a window when Emacs finds a
1396 PO file in that window.  This puts the window read-only and establishes a
1397 po-mode-map, which is a genuine Emacs mode, in a way that is not derived
1398 from text mode in any way.  Functions found on @code{po-mode-hook},
1399 if any, will be executed.
1400
1401 When PO mode is active in a window, the letters @samp{PO} appear
1402 in the mode line for that window.  The mode line also displays how
1403 many entries of each kind are held in the PO file.  For example,
1404 the string @samp{132t+3f+10u+2o} would tell the translator that the
1405 PO mode contains 132 translated entries (@pxref{Translated Entries},
1406 3 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries
1407 (@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete
1408 Entries}).  Zero-coefficients items are not shown.  So, in this example, if
1409 the fuzzy entries were unfuzzied, the untranslated entries were translated
1410 and the obsolete entries were deleted, the mode line would merely display
1411 @samp{145t} for the counters.
1412
1413 The main PO commands are those which do not fit into the other categories of
1414 subsequent sections.  These allow for quitting PO mode or for managing windows
1415 in special ways.
1416
1417 @table @kbd
1418 @item _
1419 @efindex _@r{, PO Mode command}
1420 Undo last modification to the PO file (@code{po-undo}).
1421
1422 @item Q
1423 @efindex Q@r{, PO Mode command}
1424 Quit processing and save the PO file (@code{po-quit}).
1425
1426 @item q
1427 @efindex q@r{, PO Mode command}
1428 Quit processing, possibly after confirmation (@code{po-confirm-and-quit}).
1429
1430 @item 0
1431 @efindex 0@r{, PO Mode command}
1432 Temporary leave the PO file window (@code{po-other-window}).
1433
1434 @item ?
1435 @itemx h
1436 @efindex ?@r{, PO Mode command}
1437 @efindex h@r{, PO Mode command}
1438 Show help about PO mode (@code{po-help}).
1439
1440 @item =
1441 @efindex =@r{, PO Mode command}
1442 Give some PO file statistics (@code{po-statistics}).
1443
1444 @item V
1445 @efindex V@r{, PO Mode command}
1446 Batch validate the format of the whole PO file (@code{po-validate}).
1447
1448 @end table
1449
1450 @efindex _@r{, PO Mode command}
1451 @efindex po-undo@r{, PO Mode command}
1452 The command @kbd{_} (@code{po-undo}) interfaces to the Emacs
1453 @emph{undo} facility.  @xref{Undo, , Undoing Changes, emacs, The Emacs
1454 Editor}.  Each time @kbd{U} is typed, modifications which the translator
1455 did to the PO file are undone a little more.  For the purpose of
1456 undoing, each PO mode command is atomic.  This is especially true for
1457 the @kbd{@key{RET}} command: the whole edition made by using a single
1458 use of this command is undone at once, even if the edition itself
1459 implied several actions.  However, while in the editing window, one
1460 can undo the edition work quite parsimoniously.
1461
1462 @efindex Q@r{, PO Mode command}
1463 @efindex q@r{, PO Mode command}
1464 @efindex po-quit@r{, PO Mode command}
1465 @efindex po-confirm-and-quit@r{, PO Mode command}
1466 The commands @kbd{Q} (@code{po-quit}) and @kbd{q}
1467 (@code{po-confirm-and-quit}) are used when the translator is done with the
1468 PO file.  The former is a bit less verbose than the latter.  If the file
1469 has been modified, it is saved to disk first.  In both cases, and prior to
1470 all this, the commands check if any untranslated messages remain in the
1471 PO file and, if so, the translator is asked if she really wants to leave
1472 off working with this PO file.  This is the preferred way of getting rid
1473 of an Emacs PO file buffer.  Merely killing it through the usual command
1474 @w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.
1475
1476 @efindex 0@r{, PO Mode command}
1477 @efindex po-other-window@r{, PO Mode command}
1478 The command @kbd{0} (@code{po-other-window}) is another, softer way,
1479 to leave PO mode, temporarily.  It just moves the cursor to some other
1480 Emacs window, and pops one if necessary.  For example, if the translator
1481 just got PO mode to show some source context in some other, she might
1482 discover some apparent bug in the program source that needs correction.
1483 This command allows the translator to change sex, become a programmer,
1484 and have the cursor right into the window containing the program she
1485 (or rather @emph{he}) wants to modify.  By later getting the cursor back
1486 in the PO file window, or by asking Emacs to edit this file once again,
1487 PO mode is then recovered.
1488
1489 @efindex ?@r{, PO Mode command}
1490 @efindex h@r{, PO Mode command}
1491 @efindex po-help@r{, PO Mode command}
1492 The command @kbd{h} (@code{po-help}) displays a summary of all available PO
1493 mode commands.  The translator should then type any character to resume
1494 normal PO mode operations.  The command @kbd{?} has the same effect
1495 as @kbd{h}.
1496
1497 @efindex =@r{, PO Mode command}
1498 @efindex po-statistics@r{, PO Mode command}
1499 The command @kbd{=} (@code{po-statistics}) computes the total number of
1500 entries in the PO file, the ordinal of the current entry (counted from
1501 1), the number of untranslated entries, the number of obsolete entries,
1502 and displays all these numbers.
1503
1504 @efindex V@r{, PO Mode command}
1505 @efindex po-validate@r{, PO Mode command}
1506 The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in
1507 checking and verbose
1508 mode over the current PO file.  This command first offers to save the
1509 current PO file on disk.  The @code{msgfmt} tool, from GNU @code{gettext},
1510 has the purpose of creating a MO file out of a PO file, and PO mode uses
1511 the features of this program for checking the overall format of a PO file,
1512 as well as all individual entries.
1513
1514 @efindex next-error@r{, stepping through PO file validation results}
1515 The program @code{msgfmt} runs asynchronously with Emacs, so the
1516 translator regains control immediately while her PO file is being studied.
1517 Error output is collected in the Emacs @samp{*compilation*} buffer,
1518 displayed in another window.  The regular Emacs command @kbd{C-x`}
1519 (@code{next-error}), as well as other usual compile commands, allow the
1520 translator to reposition quickly to the offending parts of the PO file.
1521 Once the cursor is on the line in error, the translator may decide on
1522 any PO mode action which would help correcting the error.
1523
1524 @node Entry Positioning, Normalizing, Main PO Commands, Basics
1525 @section Entry Positioning
1526
1527 @emindex current entry of a PO file
1528 The cursor in a PO file window is almost always part of
1529 an entry.  The only exceptions are the special case when the cursor
1530 is after the last entry in the file, or when the PO file is
1531 empty.  The entry where the cursor is found to be is said to be the
1532 current entry.  Many PO mode commands operate on the current entry,
1533 so moving the cursor does more than allowing the translator to browse
1534 the PO file, this also selects on which entry commands operate.
1535
1536 @emindex moving through a PO file
1537 Some PO mode commands alter the position of the cursor in a specialized
1538 way.  A few of those special purpose positioning are described here,
1539 the others are described in following sections (for a complete list try
1540 @kbd{C-h m}):
1541
1542 @table @kbd
1543
1544 @item .
1545 @efindex .@r{, PO Mode command}
1546 Redisplay the current entry (@code{po-current-entry}).
1547
1548 @item n
1549 @efindex n@r{, PO Mode command}
1550 Select the entry after the current one (@code{po-next-entry}).
1551
1552 @item p
1553 @efindex p@r{, PO Mode command}
1554 Select the entry before the current one (@code{po-previous-entry}).
1555
1556 @item <
1557 @efindex <@r{, PO Mode command}
1558 Select the first entry in the PO file (@code{po-first-entry}).
1559
1560 @item >
1561 @efindex >@r{, PO Mode command}
1562 Select the last entry in the PO file (@code{po-last-entry}).
1563
1564 @item m
1565 @efindex m@r{, PO Mode command}
1566 Record the location of the current entry for later use
1567 (@code{po-push-location}).
1568
1569 @item r
1570 @efindex r@r{, PO Mode command}
1571 Return to a previously saved entry location (@code{po-pop-location}).
1572
1573 @item x
1574 @efindex x@r{, PO Mode command}
1575 Exchange the current entry location with the previously saved one
1576 (@code{po-exchange-location}).
1577
1578 @end table
1579
1580 @efindex .@r{, PO Mode command}
1581 @efindex po-current-entry@r{, PO Mode command}
1582 Any Emacs command able to reposition the cursor may be used
1583 to select the current entry in PO mode, including commands which
1584 move by characters, lines, paragraphs, screens or pages, and search
1585 commands.  However, there is a kind of standard way to display the
1586 current entry in PO mode, which usual Emacs commands moving
1587 the cursor do not especially try to enforce.  The command @kbd{.}
1588 (@code{po-current-entry}) has the sole purpose of redisplaying the
1589 current entry properly, after the current entry has been changed by
1590 means external to PO mode, or the Emacs screen otherwise altered.
1591
1592 It is yet to be decided if PO mode helps the translator, or otherwise
1593 irritates her, by forcing a rigid window disposition while she
1594 is doing her work.  We originally had quite precise ideas about
1595 how windows should behave, but on the other hand, anyone used to
1596 Emacs is often happy to keep full control.  Maybe a fixed window
1597 disposition might be offered as a PO mode option that the translator
1598 might activate or deactivate at will, so it could be offered on an
1599 experimental basis.  If nobody feels a real need for using it, or
1600 a compulsion for writing it, we should drop this whole idea.
1601 The incentive for doing it should come from translators rather than
1602 programmers, as opinions from an experienced translator are surely
1603 more worth to me than opinions from programmers @emph{thinking} about
1604 how @emph{others} should do translation.
1605
1606 @efindex n@r{, PO Mode command}
1607 @efindex po-next-entry@r{, PO Mode command}
1608 @efindex p@r{, PO Mode command}
1609 @efindex po-previous-entry@r{, PO Mode command}
1610 The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}
1611 (@code{po-previous-entry}) move the cursor the entry following,
1612 or preceding, the current one.  If @kbd{n} is given while the
1613 cursor is on the last entry of the PO file, or if @kbd{p}
1614 is given while the cursor is on the first entry, no move is done.
1615
1616 @efindex <@r{, PO Mode command}
1617 @efindex po-first-entry@r{, PO Mode command}
1618 @efindex >@r{, PO Mode command}
1619 @efindex po-last-entry@r{, PO Mode command}
1620 The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}
1621 (@code{po-last-entry}) move the cursor to the first entry, or last
1622 entry, of the PO file.  When the cursor is located past the last
1623 entry in a PO file, most PO mode commands will return an error saying
1624 @samp{After last entry}.  Moreover, the commands @kbd{<} and @kbd{>}
1625 have the special property of being able to work even when the cursor
1626 is not into some PO file entry, and one may use them for nicely
1627 correcting this situation.  But even these commands will fail on a
1628 truly empty PO file.  There are development plans for the PO mode for it
1629 to interactively fill an empty PO file from sources.  @xref{Marking}.
1630
1631 The translator may decide, before working at the translation of
1632 a particular entry, that she needs to browse the remainder of the
1633 PO file, maybe for finding the terminology or phraseology used
1634 in related entries.  She can of course use the standard Emacs idioms
1635 for saving the current cursor location in some register, and use that
1636 register for getting back, or else, use the location ring.
1637
1638 @efindex m@r{, PO Mode command}
1639 @efindex po-push-location@r{, PO Mode command}
1640 @efindex r@r{, PO Mode command}
1641 @efindex po-pop-location@r{, PO Mode command}
1642 PO mode offers another approach, by which cursor locations may be saved
1643 onto a special stack.  The command @kbd{m} (@code{po-push-location})
1644 merely adds the location of current entry to the stack, pushing
1645 the already saved locations under the new one.  The command
1646 @kbd{r} (@code{po-pop-location}) consumes the top stack element and
1647 repositions the cursor to the entry associated with that top element.
1648 This position is then lost, for the next @kbd{r} will move the cursor
1649 to the previously saved location, and so on until no locations remain
1650 on the stack.
1651
1652 If the translator wants the position to be kept on the location stack,
1653 maybe for taking a look at the entry associated with the top
1654 element, then go elsewhere with the intent of getting back later, she
1655 ought to use @kbd{m} immediately after @kbd{r}.
1656
1657 @efindex x@r{, PO Mode command}
1658 @efindex po-exchange-location@r{, PO Mode command}
1659 The command @kbd{x} (@code{po-exchange-location}) simultaneously
1660 repositions the cursor to the entry associated with the top element of
1661 the stack of saved locations, and replaces that top element with the
1662 location of the current entry before the move.  Consequently, repeating
1663 the @kbd{x} command toggles alternatively between two entries.
1664 For achieving this, the translator will position the cursor on the
1665 first entry, use @kbd{m}, then position to the second entry, and
1666 merely use @kbd{x} for making the switch.
1667
1668 @node Normalizing,  , Entry Positioning, Basics
1669 @section Normalizing Strings in Entries
1670 @cindex string normalization in entries
1671
1672 There are many different ways for encoding a particular string into a
1673 PO file entry, because there are so many different ways to split and
1674 quote multi-line strings, and even, to represent special characters
1675 by backslashed escaped sequences.  Some features of PO mode rely on
1676 the ability for PO mode to scan an already existing PO file for a
1677 particular string encoded into the @code{msgid} field of some entry.
1678 Even if PO mode has internally all the built-in machinery for
1679 implementing this recognition easily, doing it fast is technically
1680 difficult.  To facilitate a solution to this efficiency problem,
1681 we decided on a canonical representation for strings.
1682
1683 A conventional representation of strings in a PO file is currently
1684 under discussion, and PO mode experiments with a canonical representation.
1685 Having both @code{xgettext} and PO mode converging towards a uniform
1686 way of representing equivalent strings would be useful, as the internal
1687 normalization needed by PO mode could be automatically satisfied
1688 when using @code{xgettext} from GNU @code{gettext}.  An explicit
1689 PO mode normalization should then be only necessary for PO files
1690 imported from elsewhere, or for when the convention itself evolves.
1691
1692 So, for achieving normalization of at least the strings of a given
1693 PO file needing a canonical representation, the following PO mode
1694 command is available:
1695
1696 @emindex string normalization in entries
1697 @table @kbd
1698 @item M-x po-normalize
1699 @efindex po-normalize@r{, PO Mode command}
1700 Tidy the whole PO file by making entries more uniform.
1701
1702 @end table
1703
1704 The special command @kbd{M-x po-normalize}, which has no associated
1705 keys, revises all entries, ensuring that strings of both original
1706 and translated entries use uniform internal quoting in the PO file.
1707 It also removes any crumb after the last entry.  This command may be
1708 useful for PO files freshly imported from elsewhere, or if we ever
1709 improve on the canonical quoting format we use.  This canonical format
1710 is not only meant for getting cleaner PO files, but also for greatly
1711 speeding up @code{msgid} string lookup for some other PO mode commands.
1712
1713 @kbd{M-x po-normalize} presently makes three passes over the entries.
1714 The first implements heuristics for converting PO files for GNU
1715 @code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}
1716 fields were using K&R style C string syntax for multi-line strings.
1717 These heuristics may fail for comments not related to obsolete
1718 entries and ending with a backslash; they also depend on subsequent
1719 passes for finalizing the proper commenting of continued lines for
1720 obsolete entries.  This first pass might disappear once all oldish PO
1721 files would have been adjusted.  The second and third pass normalize
1722 all @code{msgid} and @code{msgstr} strings respectively.  They also
1723 clean out those trailing backslashes used by XView's @code{msgfmt}
1724 for continued lines.
1725
1726 @cindex importing PO files
1727 Having such an explicit normalizing command allows for importing PO
1728 files from other sources, but also eases the evolution of the current
1729 convention, evolution driven mostly by aesthetic concerns, as of now.
1730 It is easy to make suggested adjustments at a later time, as the
1731 normalizing command and eventually, other GNU @code{gettext} tools
1732 should greatly automate conformance.  A description of the canonical
1733 string format is given below, for the particular benefit of those not
1734 having Emacs handy, and who would nevertheless want to handcraft
1735 their PO files in nice ways.
1736
1737 @cindex multi-line strings
1738 Right now, in PO mode, strings are single line or multi-line.  A string
1739 goes multi-line if and only if it has @emph{embedded} newlines, that
1740 is, if it matches @samp{[^\n]\n+[^\n]}.  So, we would have:
1741
1742 @example
1743 msgstr "\n\nHello, world!\n\n\n"
1744 @end example
1745
1746 but, replacing the space by a newline, this becomes:
1747
1748 @example
1749 msgstr ""
1750 "\n"
1751 "\n"
1752 "Hello,\n"
1753 "world!\n"
1754 "\n"
1755 "\n"
1756 @end example
1757
1758 We are deliberately using a caricatural example, here, to make the
1759 point clearer.  Usually, multi-lines are not that bad looking.
1760 It is probable that we will implement the following suggestion.
1761 We might lump together all initial newlines into the empty string,
1762 and also all newlines introducing empty lines (that is, for @w{@var{n}
1763 > 1}, the @var{n}-1'th last newlines would go together on a separate
1764 string), so making the previous example appear:
1765
1766 @example
1767 msgstr "\n\n"
1768 "Hello,\n"
1769 "world!\n"
1770 "\n\n"
1771 @end example
1772
1773 There are a few yet undecided little points about string normalization,
1774 to be documented in this manual, once these questions settle.
1775
1776 @node Sources, Template, Basics, Top
1777 @chapter Preparing Program Sources
1778 @cindex preparing programs for translation
1779
1780 @c FIXME: Rewrite (the whole chapter).
1781
1782 For the programmer, changes to the C source code fall into three
1783 categories.  First, you have to make the localization functions
1784 known to all modules needing message translation.  Second, you should
1785 properly trigger the operation of GNU @code{gettext} when the program
1786 initializes, usually from the @code{main} function.  Last, you should
1787 identify and especially mark all constant strings in your program
1788 needing translation.
1789
1790 Presuming that your set of programs, or package, has been adjusted
1791 so all needed GNU @code{gettext} files are available, and your
1792 @file{Makefile} files are adjusted (@pxref{Maintainers}), each C module
1793 having translated C strings should contain the line:
1794
1795 @cindex include file @file{libintl.h}
1796 @example
1797 #include <libintl.h>
1798 @end example
1799
1800 Similarly, each C module containing @code{printf()}/@code{fprintf()}/...
1801 calls with a format string that could be a translated C string (even if
1802 the C string comes from a different C module) should contain the line:
1803
1804 @example
1805 #include <libintl.h>
1806 @end example
1807
1808 The remaining changes to your C sources are discussed in the further
1809 sections of this chapter.
1810
1811 @menu
1812 * Triggering::                  Triggering @code{gettext} Operations
1813 * Preparing Strings::           Preparing Translatable Strings
1814 * Mark Keywords::               How Marks Appear in Sources
1815 * Marking::                     Marking Translatable Strings
1816 * c-format Flag::               Telling something about the following string
1817 * Special cases::               Special Cases of Translatable Strings
1818 * Names::                       Marking Proper Names for Translation
1819 * Libraries::                   Preparing Library Sources
1820 @end menu
1821
1822 @node Triggering, Preparing Strings, Sources, Sources
1823 @section Triggering @code{gettext} Operations
1824
1825 @cindex initialization
1826 The initialization of locale data should be done with more or less
1827 the same code in every program, as demonstrated below:
1828
1829 @example
1830 @group
1831 int
1832 main (int argc, char *argv[])
1833 @{
1834   @dots{}
1835   setlocale (LC_ALL, "");
1836   bindtextdomain (PACKAGE, LOCALEDIR);
1837   textdomain (PACKAGE);
1838   @dots{}
1839 @}
1840 @end group
1841 @end example
1842
1843 @var{PACKAGE} and @var{LOCALEDIR} should be provided either by
1844 @file{config.h} or by the Makefile.  For now consult the @code{gettext}
1845 or @code{hello} sources for more information.
1846
1847 @cindex locale facet, LC_ALL
1848 @cindex locale facet, LC_CTYPE
1849 The use of @code{LC_ALL} might not be appropriate for you.
1850 @code{LC_ALL} includes all locale categories and especially
1851 @code{LC_CTYPE}.  This later category is responsible for determining
1852 character classes with the @code{isalnum} etc. functions from
1853 @file{ctype.h} which could especially for programs, which process some
1854 kind of input language, be wrong.  For example this would mean that a
1855 source code using the @,{c} (c-cedilla character) is runnable in
1856 France but not in the U.S.
1857
1858 Some systems also have problems with parsing numbers using the
1859 @code{scanf} functions if an other but the @code{LC_ALL} locale is used.
1860 The standards say that additional formats but the one known in the
1861 @code{"C"} locale might be recognized.  But some systems seem to reject
1862 numbers in the @code{"C"} locale format.  In some situation, it might
1863 also be a problem with the notation itself which makes it impossible to
1864 recognize whether the number is in the @code{"C"} locale or the local
1865 format.  This can happen if thousands separator characters are used.
1866 Some locales define this character according to the national
1867 conventions to @code{'.'} which is the same character used in the
1868 @code{"C"} locale to denote the decimal point.
1869
1870 So it is sometimes necessary to replace the @code{LC_ALL} line in the
1871 code above by a sequence of @code{setlocale} lines
1872
1873 @example
1874 @group
1875 @{
1876   @dots{}
1877   setlocale (LC_CTYPE, "");
1878   setlocale (LC_MESSAGES, "");
1879   @dots{}
1880 @}
1881 @end group
1882 @end example
1883
1884 @cindex locale facet, LC_CTYPE
1885 @cindex locale facet, LC_COLLATE
1886 @cindex locale facet, LC_MONETARY
1887 @cindex locale facet, LC_NUMERIC
1888 @cindex locale facet, LC_TIME
1889 @cindex locale facet, LC_MESSAGES
1890 @cindex locale facet, LC_RESPONSES
1891 @noindent
1892 On all POSIX conformant systems the locale categories @code{LC_CTYPE},
1893 @code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY},
1894 @code{LC_NUMERIC}, and @code{LC_TIME} are available.  On some systems
1895 which are only ISO C compliant, @code{LC_MESSAGES} is missing, but
1896 a substitute for it is defined in GNU gettext's @code{<libintl.h>}.
1897
1898 Note that changing the @code{LC_CTYPE} also affects the functions
1899 declared in the @code{<ctype.h>} standard header.  If this is not
1900 desirable in your application (for example in a compiler's parser),
1901 you can use a set of substitute functions which hardwire the C locale,
1902 such as found in the @code{<c-ctype.h>} and @code{<c-ctype.c>} files
1903 in the gettext source distribution.
1904
1905 It is also possible to switch the locale forth and back between the
1906 environment dependent locale and the C locale, but this approach is
1907 normally avoided because a @code{setlocale} call is expensive,
1908 because it is tedious to determine the places where a locale switch
1909 is needed in a large program's source, and because switching a locale
1910 is not multithread-safe.
1911
1912 @node Preparing Strings, Mark Keywords, Triggering, Sources
1913 @section Preparing Translatable Strings
1914
1915 @cindex marking strings, preparations
1916 Before strings can be marked for translations, they sometimes need to
1917 be adjusted.  Usually preparing a string for translation is done right
1918 before marking it, during the marking phase which is described in the
1919 next sections.  What you have to keep in mind while doing that is the
1920 following.
1921
1922 @itemize @bullet
1923 @item
1924 Decent English style.
1925
1926 @item
1927 Entire sentences.
1928
1929 @item
1930 Split at paragraphs.
1931
1932 @item
1933 Use format strings instead of string concatenation.
1934 @end itemize
1935
1936 @noindent
1937 Let's look at some examples of these guidelines.
1938
1939 @cindex style
1940 Translatable strings should be in good English style.  If slang language
1941 with abbreviations and shortcuts is used, often translators will not
1942 understand the message and will produce very inappropriate translations.
1943
1944 @example
1945 "%s: is parameter\n"
1946 @end example
1947
1948 @noindent
1949 This is nearly untranslatable: Is the displayed item @emph{a} parameter or
1950 @emph{the} parameter?
1951
1952 @example
1953 "No match"
1954 @end example
1955
1956 @noindent
1957 The ambiguity in this message makes it ununderstandable: Is the program
1958 attempting to set something on fire? Does it mean "The given object does
1959 not match the template"? Does it mean "The template does not fit for any
1960 of the objects"?
1961
1962 @cindex ambiguities
1963 In both cases, adding more words to the message will help both the
1964 translator and the English speaking user.
1965
1966 @cindex sentences
1967 Translatable strings should be entire sentences.  It is often not possible
1968 to translate single verbs or adjectives in a substitutable way.
1969
1970 @example
1971 printf ("File %s is %s protected", filename, rw ? "write" : "read");
1972 @end example
1973
1974 @noindent
1975 Most translators will not look at the source and will thus only see the
1976 string @code{"File %s is %s protected"}, which is unintelligible.  Change
1977 this to
1978
1979 @example
1980 printf (rw ? "File %s is write protected" : "File %s is read protected",
1981         filename);
1982 @end example
1983
1984 @noindent
1985 This way the translator will not only understand the message, she will
1986 also be able to find the appropriate grammatical construction.  The French
1987 translator for example translates "write protected" like "protected
1988 against writing".
1989
1990 Entire sentences are also important because in many languages, the
1991 declination of some word in a sentence depends on the gender or the
1992 number (singular/plural) of another part of the sentence.  There are
1993 usually more interdependencies between words than in English.  The
1994 consequence is that asking a translator to translate two half-sentences
1995 and then combining these two half-sentences through dumb string concatenation
1996 will not work, for many languages, even though it would work for English.
1997 That's why translators need to handle entire sentences.
1998
1999 Often sentences don't fit into a single line.  If a sentence is output
2000 using two subsequent @code{printf} statements, like this
2001
2002 @example
2003 printf ("Locale charset \"%s\" is different from\n", lcharset);
2004 printf ("input file charset \"%s\".\n", fcharset);
2005 @end example
2006
2007 @noindent
2008 the translator would have to translate two half sentences, but nothing
2009 in the POT file would tell her that the two half sentences belong together.
2010 It is necessary to merge the two @code{printf} statements so that the
2011 translator can handle the entire sentence at once and decide at which
2012 place to insert a line break in the translation (if at all):
2013
2014 @example
2015 printf ("Locale charset \"%s\" is different from\n\
2016 input file charset \"%s\".\n", lcharset, fcharset);
2017 @end example
2018
2019 You may now ask: how about two or more adjacent sentences? Like in this case:
2020
2021 @example
2022 puts ("Apollo 13 scenario: Stack overflow handling failed.");
2023 puts ("On the next stack overflow we will crash!!!");
2024 @end example
2025
2026 @noindent
2027 Should these two statements merged into a single one? I would recommend to
2028 merge them if the two sentences are related to each other, because then it
2029 makes it easier for the translator to understand and translate both.  On
2030 the other hand, if one of the two messages is a stereotypic one, occurring
2031 in other places as well, you will do a favour to the translator by not
2032 merging the two.  (Identical messages occurring in several places are
2033 combined by xgettext, so the translator has to handle them once only.)
2034
2035 @cindex paragraphs
2036 Translatable strings should be limited to one paragraph; don't let a
2037 single message be longer than ten lines.  The reason is that when the
2038 translatable string changes, the translator is faced with the task of
2039 updating the entire translated string.  Maybe only a single word will
2040 have changed in the English string, but the translator doesn't see that
2041 (with the current translation tools), therefore she has to proofread
2042 the entire message.
2043
2044 @cindex help option
2045 Many GNU programs have a @samp{--help} output that extends over several
2046 screen pages.  It is a courtesy towards the translators to split such a
2047 message into several ones of five to ten lines each.  While doing that,
2048 you can also attempt to split the documented options into groups,
2049 such as the input options, the output options, and the informative
2050 output options.  This will help every user to find the option he is
2051 looking for.
2052
2053 @cindex string concatenation
2054 @cindex concatenation of strings
2055 Hardcoded string concatenation is sometimes used to construct English
2056 strings:
2057
2058 @example
2059 strcpy (s, "Replace ");
2060 strcat (s, object1);
2061 strcat (s, " with ");
2062 strcat (s, object2);
2063 strcat (s, "?");
2064 @end example
2065
2066 @noindent
2067 In order to present to the translator only entire sentences, and also
2068 because in some languages the translator might want to swap the order
2069 of @code{object1} and @code{object2}, it is necessary to change this
2070 to use a format string:
2071
2072 @example
2073 sprintf (s, "Replace %s with %s?", object1, object2);
2074 @end example
2075
2076 @cindex @code{inttypes.h}
2077 A similar case is compile time concatenation of strings.  The ISO C 99
2078 include file @code{<inttypes.h>} contains a macro @code{PRId64} that
2079 can be used as a formatting directive for outputting an @samp{int64_t}
2080 integer through @code{printf}.  It expands to a constant string, usually
2081 "d" or "ld" or "lld" or something like this, depending on the platform.
2082 Assume you have code like
2083
2084 @example
2085 printf ("The amount is %0" PRId64 "\n", number);
2086 @end example
2087
2088 @noindent
2089 The @code{gettext} tools and library have special support for these
2090 @code{<inttypes.h>} macros.  You can therefore simply write
2091
2092 @example
2093 printf (gettext ("The amount is %0" PRId64 "\n"), number);
2094 @end example
2095
2096 @noindent
2097 The PO file will contain the string "The amount is %0<PRId64>\n".
2098 The translators will provide a translation containing "%0<PRId64>"
2099 as well, and at runtime the @code{gettext} function's result will
2100 contain the appropriate constant string, "d" or "ld" or "lld".
2101
2102 This works only for the predefined @code{<inttypes.h>} macros.  If
2103 you have defined your own similar macros, let's say @samp{MYPRId64},
2104 that are not known to @code{xgettext}, the solution for this problem
2105 is to change the code like this:
2106
2107 @example
2108 char buf1[100];
2109 sprintf (buf1, "%0" MYPRId64, number);
2110 printf (gettext ("The amount is %s\n"), buf1);
2111 @end example
2112
2113 This means, you put the platform dependent code in one statement, and the
2114 internationalization code in a different statement.  Note that a buffer length
2115 of 100 is safe, because all available hardware integer types are limited to
2116 128 bits, and to print a 128 bit integer one needs at most 54 characters,
2117 regardless whether in decimal, octal or hexadecimal.
2118
2119 @cindex Java, string concatenation
2120 @cindex C#, string concatenation
2121 All this applies to other programming languages as well.  For example, in
2122 Java and C#, string contenation is very frequently used, because it is a
2123 compiler built-in operator.  Like in C, in Java, you would change
2124
2125 @example
2126 System.out.println("Replace "+object1+" with "+object2+"?");
2127 @end example
2128
2129 @noindent
2130 into a statement involving a format string:
2131
2132 @example
2133 System.out.println(
2134     MessageFormat.format("Replace @{0@} with @{1@}?",
2135                          new Object[] @{ object1, object2 @}));
2136 @end example
2137
2138 @noindent
2139 Similarly, in C#, you would change
2140
2141 @example
2142 Console.WriteLine("Replace "+object1+" with "+object2+"?");
2143 @end example
2144
2145 @noindent
2146 into a statement involving a format string:
2147
2148 @example
2149 Console.WriteLine(
2150     String.Format("Replace @{0@} with @{1@}?", object1, object2));
2151 @end example
2152
2153 @node Mark Keywords, Marking, Preparing Strings, Sources
2154 @section How Marks Appear in Sources
2155 @cindex marking strings that require translation
2156
2157 All strings requiring translation should be marked in the C sources.  Marking
2158 is done in such a way that each translatable string appears to be
2159 the sole argument of some function or preprocessor macro.  There are
2160 only a few such possible functions or macros meant for translation,
2161 and their names are said to be marking keywords.  The marking is
2162 attached to strings themselves, rather than to what we do with them.
2163 This approach has more uses.  A blatant example is an error message
2164 produced by formatting.  The format string needs translation, as
2165 well as some strings inserted through some @samp{%s} specification
2166 in the format, while the result from @code{sprintf} may have so many
2167 different instances that it is impractical to list them all in some
2168 @samp{error_string_out()} routine, say.
2169
2170 This marking operation has two goals.  The first goal of marking
2171 is for triggering the retrieval of the translation, at run time.
2172 The keyword are possibly resolved into a routine able to dynamically
2173 return the proper translation, as far as possible or wanted, for the
2174 argument string.  Most localizable strings are found in executable
2175 positions, that is, attached to variables or given as parameters to
2176 functions.  But this is not universal usage, and some translatable
2177 strings appear in structured initializations.  @xref{Special cases}.
2178
2179 The second goal of the marking operation is to help @code{xgettext}
2180 at properly extracting all translatable strings when it scans a set
2181 of program sources and produces PO file templates.
2182
2183 The canonical keyword for marking translatable strings is
2184 @samp{gettext}, it gave its name to the whole GNU @code{gettext}
2185 package.  For packages making only light use of the @samp{gettext}
2186 keyword, macro or function, it is easily used @emph{as is}.  However,
2187 for packages using the @code{gettext} interface more heavily, it
2188 is usually more convenient to give the main keyword a shorter, less
2189 obtrusive name.  Indeed, the keyword might appear on a lot of strings
2190 all over the package, and programmers usually do not want nor need
2191 their program sources to remind them forcefully, all the time, that they
2192 are internationalized.  Further, a long keyword has the disadvantage
2193 of using more horizontal space, forcing more indentation work on
2194 sources for those trying to keep them within 79 or 80 columns.
2195
2196 @cindex @code{_}, a macro to mark strings for translation
2197 Many packages use @samp{_} (a simple underline) as a keyword,
2198 and write @samp{_("Translatable string")} instead of @samp{gettext
2199 ("Translatable string")}.  Further, the coding rule, from GNU standards,
2200 wanting that there is a space between the keyword and the opening
2201 parenthesis is relaxed, in practice, for this particular usage.
2202 So, the textual overhead per translatable string is reduced to
2203 only three characters: the underline and the two parentheses.
2204 However, even if GNU @code{gettext} uses this convention internally,
2205 it does not offer it officially.  The real, genuine keyword is truly
2206 @samp{gettext} indeed.  It is fairly easy for those wanting to use
2207 @samp{_} instead of @samp{gettext} to declare:
2208
2209 @example
2210 #include <libintl.h>
2211 #define _(String) gettext (String)
2212 @end example
2213
2214 @noindent
2215 instead of merely using @samp{#include <libintl.h>}.
2216
2217 Later on, the maintenance is relatively easy.  If, as a programmer,
2218 you add or modify a string, you will have to ask yourself if the
2219 new or altered string requires translation, and include it within
2220 @samp{_()} if you think it should be translated.  @samp{"%s: %d"} is
2221 an example of string @emph{not} requiring translation!
2222
2223 @node Marking, c-format Flag, Mark Keywords, Sources
2224 @section Marking Translatable Strings
2225 @emindex marking strings for translation
2226
2227 In PO mode, one set of features is meant more for the programmer than
2228 for the translator, and allows him to interactively mark which strings,
2229 in a set of program sources, are translatable, and which are not.
2230 Even if it is a fairly easy job for a programmer to find and mark
2231 such strings by other means, using any editor of his choice, PO mode
2232 makes this work more comfortable.  Further, this gives translators
2233 who feel a little like programmers, or programmers who feel a little
2234 like translators, a tool letting them work at marking translatable
2235 strings in the program sources, while simultaneously producing a set of
2236 translation in some language, for the package being internationalized.
2237
2238 @emindex @code{etags}, using for marking strings
2239 The set of program sources, targetted by the PO mode commands describe
2240 here, should have an Emacs tags table constructed for your project,
2241 prior to using these PO file commands.  This is easy to do.  In any
2242 shell window, change the directory to the root of your project, then
2243 execute a command resembling:
2244
2245 @example
2246 etags src/*.[hc] lib/*.[hc]
2247 @end example
2248
2249 @noindent
2250 presuming here you want to process all @file{.h} and @file{.c} files
2251 from the @file{src/} and @file{lib/} directories.  This command will
2252 explore all said files and create a @file{TAGS} file in your root
2253 directory, somewhat summarizing the contents using a special file
2254 format Emacs can understand.
2255
2256 @emindex @file{TAGS}, and marking translatable strings
2257 For packages following the GNU coding standards, there is
2258 a make goal @code{tags} or @code{TAGS} which constructs the tag files in
2259 all directories and for all files containing source code.
2260
2261 Once your @file{TAGS} file is ready, the following commands assist
2262 the programmer at marking translatable strings in his set of sources.
2263 But these commands are necessarily driven from within a PO file
2264 window, and it is likely that you do not even have such a PO file yet.
2265 This is not a problem at all, as you may safely open a new, empty PO
2266 file, mainly for using these commands.  This empty PO file will slowly
2267 fill in while you mark strings as translatable in your program sources.
2268
2269 @table @kbd
2270 @item ,
2271 @efindex ,@r{, PO Mode command}
2272 Search through program sources for a string which looks like a
2273 candidate for translation (@code{po-tags-search}).
2274
2275 @item M-,
2276 @efindex M-,@r{, PO Mode command}
2277 Mark the last string found with @samp{_()} (@code{po-mark-translatable}).
2278
2279 @item M-.
2280 @efindex M-.@r{, PO Mode command}
2281 Mark the last string found with a keyword taken from a set of possible
2282 keywords.  This command with a prefix allows some management of these
2283 keywords (@code{po-select-mark-and-mark}).
2284
2285 @end table
2286
2287 @efindex po-tags-search@r{, PO Mode command}
2288 The @kbd{,} (@code{po-tags-search}) command searches for the next
2289 occurrence of a string which looks like a possible candidate for
2290 translation, and displays the program source in another Emacs window,
2291 positioned in such a way that the string is near the top of this other
2292 window.  If the string is too big to fit whole in this window, it is
2293 positioned so only its end is shown.  In any case, the cursor
2294 is left in the PO file window.  If the shown string would be better
2295 presented differently in different native languages, you may mark it
2296 using @kbd{M-,} or @kbd{M-.}.  Otherwise, you might rather ignore it
2297 and skip to the next string by merely repeating the @kbd{,} command.
2298
2299 A string is a good candidate for translation if it contains a sequence
2300 of three or more letters.  A string containing at most two letters in
2301 a row will be considered as a candidate if it has more letters than
2302 non-letters.  The command disregards strings containing no letters,
2303 or isolated letters only.  It also disregards strings within comments,
2304 or strings already marked with some keyword PO mode knows (see below).
2305
2306 If you have never told Emacs about some @file{TAGS} file to use, the
2307 command will request that you specify one from the minibuffer, the
2308 first time you use the command.  You may later change your @file{TAGS}
2309 file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},
2310 which will ask you to name the precise @file{TAGS} file you want
2311 to use.  @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.
2312
2313 Each time you use the @kbd{,} command, the search resumes from where it was
2314 left by the previous search, and goes through all program sources,
2315 obeying the @file{TAGS} file, until all sources have been processed.
2316 However, by giving a prefix argument to the command @w{(@kbd{C-u
2317 ,})}, you may request that the search be restarted all over again
2318 from the first program source; but in this case, strings that you
2319 recently marked as translatable will be automatically skipped.
2320
2321 Using this @kbd{,} command does not prevent using of other regular
2322 Emacs tags commands.  For example, regular @code{tags-search} or
2323 @code{tags-query-replace} commands may be used without disrupting the
2324 independent @kbd{,} search sequence.  However, as implemented, the
2325 @emph{initial} @kbd{,} command (or the @kbd{,} command is used with a
2326 prefix) might also reinitialize the regular Emacs tags searching to the
2327 first tags file, this reinitialization might be considered spurious.
2328
2329 @efindex po-mark-translatable@r{, PO Mode command}
2330 @efindex po-select-mark-and-mark@r{, PO Mode command}
2331 The @kbd{M-,} (@code{po-mark-translatable}) command will mark the
2332 recently found string with the @samp{_} keyword.  The @kbd{M-.}
2333 (@code{po-select-mark-and-mark}) command will request that you type
2334 one keyword from the minibuffer and use that keyword for marking
2335 the string.  Both commands will automatically create a new PO file
2336 untranslated entry for the string being marked, and make it the
2337 current entry (making it easy for you to immediately proceed to its
2338 translation, if you feel like doing it right away).  It is possible
2339 that the modifications made to the program source by @kbd{M-,} or
2340 @kbd{M-.} render some source line longer than 80 columns, forcing you
2341 to break and re-indent this line differently.  You may use the @kbd{O}
2342 command from PO mode, or any other window changing command from
2343 Emacs, to break out into the program source window, and do any
2344 needed adjustments.  You will have to use some regular Emacs command
2345 to return the cursor to the PO file window, if you want command
2346 @kbd{,} for the next string, say.
2347
2348 The @kbd{M-.} command has a few built-in speedups, so you do not
2349 have to explicitly type all keywords all the time.  The first such
2350 speedup is that you are presented with a @emph{preferred} keyword,
2351 which you may accept by merely typing @kbd{@key{RET}} at the prompt.
2352 The second speedup is that you may type any non-ambiguous prefix of the
2353 keyword you really mean, and the command will complete it automatically
2354 for you.  This also means that PO mode has to @emph{know} all
2355 your possible keywords, and that it will not accept mistyped keywords.
2356
2357 If you reply @kbd{?} to the keyword request, the command gives a
2358 list of all known keywords, from which you may choose.  When the
2359 command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits
2360 updating any program source or PO file buffer, and does some simple
2361 keyword management instead.  In this case, the command asks for a
2362 keyword, written in full, which becomes a new allowed keyword for
2363 later @kbd{M-.} commands.  Moreover, this new keyword automatically
2364 becomes the @emph{preferred} keyword for later commands.  By typing
2365 an already known keyword in response to @w{@kbd{C-u M-.}}, one merely
2366 changes the @emph{preferred} keyword and does nothing more.
2367
2368 All keywords known for @kbd{M-.} are recognized by the @kbd{,} command
2369 when scanning for strings, and strings already marked by any of those
2370 known keywords are automatically skipped.  If many PO files are opened
2371 simultaneously, each one has its own independent set of known keywords.
2372 There is no provision in PO mode, currently, for deleting a known
2373 keyword, you have to quit the file (maybe using @kbd{q}) and reopen
2374 it afresh.  When a PO file is newly brought up in an Emacs window, only
2375 @samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}
2376 is preferred for the @kbd{M-.} command.  In fact, this is not useful to
2377 prefer @samp{_}, as this one is already built in the @kbd{M-,} command.
2378
2379 @node c-format Flag, Special cases, Marking, Sources
2380 @section Special Comments preceding Keywords
2381
2382 @c FIXME document c-format and no-c-format.
2383
2384 @cindex format strings
2385 In C programs strings are often used within calls of functions from the
2386 @code{printf} family.  The special thing about these format strings is
2387 that they can contain format specifiers introduced with @kbd{%}.  Assume
2388 we have the code
2389
2390 @example
2391 printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
2392 @end example
2393
2394 @noindent
2395 A possible German translation for the above string might be:
2396
2397 @example
2398 "%d Zeichen lang ist die Zeichenkette `%s'"
2399 @end example
2400
2401 A C programmer, even if he cannot speak German, will recognize that
2402 there is something wrong here.  The order of the two format specifiers
2403 is changed but of course the arguments in the @code{printf} don't have.
2404 This will most probably lead to problems because now the length of the
2405 string is regarded as the address.
2406
2407 To prevent errors at runtime caused by translations the @code{msgfmt}
2408 tool can check statically whether the arguments in the original and the
2409 translation string match in type and number.  If this is not the case
2410 and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt}
2411 will give an error and refuse to produce a MO file.  Thus consequent
2412 use of @samp{msgfmt -c} will catch the error, so that it cannot cause
2413 cause problems at runtime.
2414
2415 @noindent
2416 If the word order in the above German translation would be correct one
2417 would have to write
2418
2419 @example
2420 "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
2421 @end example
2422
2423 @noindent
2424 The routines in @code{msgfmt} know about this special notation.
2425
2426 Because not all strings in a program must be format strings it is not
2427 useful for @code{msgfmt} to test all the strings in the @file{.po} file.
2428 This might cause problems because the string might contain what looks
2429 like a format specifier, but the string is not used in @code{printf}.
2430
2431 Therefore the @code{xgettext} adds a special tag to those messages it
2432 thinks might be a format string.  There is no absolute rule for this,
2433 only a heuristic.  In the @file{.po} file the entry is marked using the
2434 @code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}).
2435
2436 @kwindex c-format@r{, and @code{xgettext}}
2437 @kwindex no-c-format@r{, and @code{xgettext}}
2438 The careful reader now might say that this again can cause problems.
2439 The heuristic might guess it wrong.  This is true and therefore
2440 @code{xgettext} knows about a special kind of comment which lets
2441 the programmer take over the decision.  If in the same line as or
2442 the immediately preceding line to the @code{gettext} keyword
2443 the @code{xgettext} program finds a comment containing the words
2444 @code{xgettext:c-format}, it will mark the string in any case with
2445 the @code{c-format} flag.  This kind of comment should be used when
2446 @code{xgettext} does not recognize the string as a format string but
2447 it really is one and it should be tested.  Please note that when the
2448 comment is in the same line as the @code{gettext} keyword, it must be
2449 before the string to be translated.
2450
2451 This situation happens quite often.  The @code{printf} function is often
2452 called with strings which do not contain a format specifier.  Of course
2453 one would normally use @code{fputs} but it does happen.  In this case
2454 @code{xgettext} does not recognize this as a format string but what
2455 happens if the translation introduces a valid format specifier?  The
2456 @code{printf} function will try to access one of the parameters but none
2457 exists because the original code does not pass any parameters.
2458
2459 @code{xgettext} of course could make a wrong decision the other way
2460 round, i.e. a string marked as a format string actually is not a format
2461 string.  In this case the @code{msgfmt} might give too many warnings and
2462 would prevent translating the @file{.po} file.  The method to prevent
2463 this wrong decision is similar to the one used above, only the comment
2464 to use must contain the string @code{xgettext:no-c-format}.
2465
2466 If a string is marked with @code{c-format} and this is not correct the
2467 user can find out who is responsible for the decision.  See
2468 @ref{xgettext Invocation} to see how the @code{--debug} option can be
2469 used for solving this problem.
2470
2471 @node Special cases, Names, c-format Flag, Sources
2472 @section Special Cases of Translatable Strings
2473
2474 @cindex marking string initializers
2475 The attentive reader might now point out that it is not always possible
2476 to mark translatable string with @code{gettext} or something like this.
2477 Consider the following case:
2478
2479 @example
2480 @group
2481 @{
2482   static const char *messages[] = @{
2483     "some very meaningful message",
2484     "and another one"
2485   @};
2486   const char *string;
2487   @dots{}
2488   string
2489     = index > 1 ? "a default message" : messages[index];
2490
2491   fputs (string);
2492   @dots{}
2493 @}
2494 @end group
2495 @end example
2496
2497 While it is no problem to mark the string @code{"a default message"} it
2498 is not possible to mark the string initializers for @code{messages}.
2499 What is to be done?  We have to fulfill two tasks.  First we have to mark the
2500 strings so that the @code{xgettext} program (@pxref{xgettext Invocation})
2501 can find them, and second we have to translate the string at runtime
2502 before printing them.
2503
2504 The first task can be fulfilled by creating a new keyword, which names a
2505 no-op.  For the second we have to mark all access points to a string
2506 from the array.  So one solution can look like this:
2507
2508 @example
2509 @group
2510 #define gettext_noop(String) String
2511
2512 @{
2513   static const char *messages[] = @{
2514     gettext_noop ("some very meaningful message"),
2515     gettext_noop ("and another one")
2516   @};
2517   const char *string;
2518   @dots{}
2519   string
2520     = index > 1 ? gettext ("a default message") : gettext (messages[index]);
2521
2522   fputs (string);
2523   @dots{}
2524 @}
2525 @end group
2526 @end example
2527
2528 Please convince yourself that the string which is written by
2529 @code{fputs} is translated in any case.  How to get @code{xgettext} know
2530 the additional keyword @code{gettext_noop} is explained in @ref{xgettext
2531 Invocation}.
2532
2533 The above is of course not the only solution.  You could also come along
2534 with the following one:
2535
2536 @example
2537 @group
2538 #define gettext_noop(String) String
2539
2540 @{
2541   static const char *messages[] = @{
2542     gettext_noop ("some very meaningful message",
2543     gettext_noop ("and another one")
2544   @};
2545   const char *string;
2546   @dots{}
2547   string
2548     = index > 1 ? gettext_noop ("a default message") : messages[index];
2549
2550   fputs (gettext (string));
2551   @dots{}
2552 @}
2553 @end group
2554 @end example
2555
2556 But this has a drawback.  The programmer has to take care that
2557 he uses @code{gettext_noop} for the string @code{"a default message"}.
2558 A use of @code{gettext} could have in rare cases unpredictable results.
2559
2560 One advantage is that you need not make control flow analysis to make
2561 sure the output is really translated in any case.  But this analysis is
2562 generally not very difficult.  If it should be in any situation you can
2563 use this second method in this situation.
2564
2565 @node Names, Libraries, Special cases, Sources
2566 @section Marking Proper Names for Translation
2567
2568 Should names of persons, cities, locations etc. be marked for translation
2569 or not?  People who only know languages that can be written with Latin
2570 letters (English, Spanish, French, German, etc.) are tempted to say ``no'',
2571 because names usually do not change when transported between these languages.
2572 However, in general when translating from one script to another, names
2573 are translated too, usually phonetically or by transliteration.  For
2574 example, Russian or Greek names are converted to the Latin alphabet when
2575 being translated to English, and English or French names are converted
2576 to the Katakana script when being translated to Japanese.  This is
2577 necessary because the speakers of the target language in general cannot
2578 read the script the name is originally written in.
2579
2580 As a programmer, you should therefore make sure that names are marked
2581 for translation, with a special comment telling the translators that it
2582 is a proper name and how to pronounce it.  Like this:
2583
2584 @example
2585 @group
2586 printf (_("Written by %s.\n"),
2587         /* TRANSLATORS: This is a proper name.  See the gettext
2588            manual, section Names.  Note this is actually a non-ASCII
2589            name: The first name is (with Unicode escapes)
2590            "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2591            Pronounciation is like "fraa-swa pee-nar".  */
2592         _("Francois Pinard"));
2593 @end group
2594 @end example
2595
2596 As a translator, you should use some care when translating names, because
2597 it is frustrating if people see their names mutilated or distorted.  If
2598 your language uses the Latin script, all you need to do is to reproduce
2599 the name as perfectly as you can within the usual character set of your
2600 language.  In this particular case, this means to provide a translation
2601 containing the c-cedilla character.  If your language uses a different
2602 script and the people speaking it don't usually read Latin words, it means
2603 transliteration; but you should still give, in parentheses, the original
2604 writing of the name -- for the sake of the people that do read the Latin
2605 script.  Here is an example, using Greek as the target script:
2606
2607 @example
2608 @group
2609 #. This is a proper name.  See the gettext
2610 #. manual, section Names.  Note this is actually a non-ASCII
2611 #. name: The first name is (with Unicode escapes)
2612 #. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
2613 #. Pronounciation is like "fraa-swa pee-nar".
2614 msgid "Francois Pinard"
2615 msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
2616        " (Francois Pinard)"
2617 @end group
2618 @end example
2619
2620 Because translation of names is such a sensitive domain, it is a good
2621 idea to test your translation before submitting it.
2622
2623 The translation project @url{http://sourceforge.net/projects/translation}
2624 has set up a POT file and translation domain consisting of program author
2625 names, with better facilities for the translator than those presented here.
2626 Namely, there the original name is written directly in Unicode (rather
2627 than with Unicode escapes or HTML entities), and the pronounciation is
2628 denoted using the International Phonetic Alphabet (see
2629 @url{http://www.wikipedia.org/wiki/International_Phonetic_Alphabet}).
2630
2631 However, we don't recommend this approach for all POT files in all packages,
2632 because this would force translators to use PO files in UTF-8 encoding,
2633 which is - in the current state of software (as of 2003) - a major hassle
2634 for translators using GNU Emacs or XEmacs with po-mode.
2635
2636 @node Libraries,  , Names, Sources
2637 @section Preparing Library Sources
2638
2639 When you are preparing a library, not a program, for the use of
2640 @code{gettext}, only a few details are different.  Here we assume that
2641 the library has a translation domain and a POT file of its own.  (If
2642 it uses the translation domain and POT file of the main program, then
2643 the previous sections apply without changes.)
2644
2645 @enumerate
2646 @item
2647 The library code doesn't call @code{setlocale (LC_ALL, "")}.  It's the
2648 responsibility of the main program to set the locale.  The library's
2649 documentation should mention this fact, so that developers of programs
2650 using the library are aware of it.
2651
2652 @item
2653 The library code doesn't call @code{textdomain (PACKAGE)}, because it
2654 would interfere with the text domain set by the main program.
2655
2656 @item
2657 The initialization code for a program was
2658
2659 @smallexample
2660   setlocale (LC_ALL, "");
2661   bindtextdomain (PACKAGE, LOCALEDIR);
2662   textdomain (PACKAGE);
2663 @end smallexample
2664
2665 @noindent
2666 For a library it is reduced to
2667
2668 @smallexample
2669   bindtextdomain (PACKAGE, LOCALEDIR);
2670 @end smallexample
2671
2672 @noindent
2673 If your library's API doesn't already have an initialization function,
2674 you need to create one, containing at least the @code{bindtextdomain}
2675 invocation.  However, you usually don't need to export and document this
2676 initialization function: It is sufficient that all entry points of the
2677 library call the initialization function if it hasn't been called before.
2678 The typical idiom used to achieve this is a static boolean variable that
2679 indicates whether the initialization function has been called. Like this:
2680
2681 @example
2682 @group
2683 static bool libfoo_initialized;
2684
2685 static void
2686 libfoo_initialize (void)
2687 @{
2688   bindtextdomain (PACKAGE, LOCALEDIR);
2689   libfoo_initialized = true;
2690 @}
2691
2692 /* This function is part of the exported API.  */
2693 struct foo *
2694 create_foo (...)
2695 @{
2696   /* Must ensure the initialization is performed.  */
2697   if (!libfoo_initialized)
2698     libfoo_initialize ();
2699   ...
2700 @}
2701
2702 /* This function is part of the exported API.  The argument must be
2703    non-NULL and have been created through create_foo().  */
2704 int
2705 foo_refcount (struct foo *argument)
2706 @{
2707   /* No need to invoke the initialization function here, because
2708      create_foo() must already have been called before.  */
2709   ...
2710 @}
2711 @end group
2712 @end example
2713
2714 @item
2715 The usual declaration of the @samp{_} macro in each source file was
2716
2717 @smallexample
2718 #include <libintl.h>
2719 #define _(String) gettext (String)
2720 @end smallexample
2721
2722 @noindent
2723 for a program.  For a library, which has its own translation domain,
2724 it reads like this:
2725
2726 @smallexample
2727 #include <libintl.h>
2728 #define _(String) dgettext (PACKAGE, String)
2729 @end smallexample
2730
2731 In other words, @code{dgettext} is used instead of @code{gettext}.
2732 Similary, the @code{dngettext} function should be used in place of the
2733 @code{ngettext} function.
2734 @end enumerate
2735
2736 @node Template, Creating, Sources, Top
2737 @chapter Making the PO Template File
2738 @cindex PO template file
2739
2740 After preparing the sources, the programmer creates a PO template file.
2741 This section explains how to use @code{xgettext} for this purpose.
2742
2743 @code{xgettext} creates a file named @file{@var{domainname}.po}.  You
2744 should then rename it to @file{@var{domainname}.pot}.  (Why doesn't
2745 @code{xgettext} create it under the name @file{@var{domainname}.pot}
2746 right away?  The answer is: for historical reasons.  When @code{xgettext}
2747 was specified, the distinction between a PO file and PO file template
2748 was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.)
2749
2750 @c FIXME: Rewrite.
2751
2752 @menu
2753 * xgettext Invocation::         Invoking the @code{xgettext} Program
2754 @end menu
2755
2756 @node xgettext Invocation,  , Template, Template
2757 @section Invoking the @code{xgettext} Program
2758
2759 @include xgettext.texi
2760
2761 @node Creating, Updating, Template, Top
2762 @chapter Creating a New PO File
2763 @cindex creating a new PO file
2764
2765 When starting a new translation, the translator creates a file called
2766 @file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template
2767 file with modifications in the initial comments (at the beginning of the file)
2768 and in the header entry (the first entry, near the beginning of the file).
2769
2770 The easiest way to do so is by use of the @samp{msginit} program.
2771 For example:
2772
2773 @example
2774 $ cd @var{PACKAGE}-@var{VERSION}
2775 $ cd po
2776 $ msginit
2777 @end example
2778
2779 The alternative way is to do the copy and modifications by hand.
2780 To do so, the translator copies @file{@var{package}.pot} to
2781 @file{@var{LANG}.po}.  Then she modifies the initial comments and
2782 the header entry of this file.
2783
2784 @menu
2785 * msginit Invocation::          Invoking the @code{msginit} Program
2786 * Header Entry::                Filling in the Header Entry
2787 @end menu
2788
2789 @node msginit Invocation, Header Entry, Creating, Creating
2790 @section Invoking the @code{msginit} Program
2791
2792 @include msginit.texi
2793
2794 @node Header Entry,  , msginit Invocation, Creating
2795 @section Filling in the Header Entry
2796 @cindex header entry of a PO file
2797
2798 The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
2799 "FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible
2800 information.  This can be done in any text editor; if Emacs is used
2801 and it switched to PO mode automatically (because it has recognized
2802 the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.
2803
2804 Modifying the header entry can already be done using PO mode: in Emacs,
2805 type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the
2806 entry.  You should fill in the following fields.
2807
2808 @table @asis
2809 @item Project-Id-Version
2810 This is the name and version of the package.
2811
2812 @item Report-Msgid-Bugs-To
2813 This has already been filled in by @code{xgettext}.  It contains an email
2814 address or URL where you can report bugs in the untranslated strings:
2815
2816 @itemize -
2817 @item Strings which are not entire sentences, see the maintainer guidelines
2818 in @ref{Preparing Strings}.
2819 @item Strings which use unclear terms or require additional context to be
2820 understood.
2821 @item Strings which make invalid assumptions about notation of date, time or
2822 money.
2823 @item Pluralisation problems.
2824 @item Incorrect English spelling.
2825 @item Incorrect formatting.
2826 @end itemize
2827
2828 @item POT-Creation-Date
2829 This has already been filled in by @code{xgettext}.
2830
2831 @item PO-Revision-Date
2832 You don't need to fill this in.  It will be filled by the Emacs PO mode
2833 when you save the file.
2834
2835 @item Last-Translator
2836 Fill in your name and email address (without double quotes).
2837
2838 @item Language-Team
2839 Fill in the English name of the language, and the email address or
2840 homepage URL of the language team you are part of.
2841
2842 Before starting a translation, it is a good idea to get in touch with
2843 your translation team, not only to make sure you don't do duplicated work,
2844 but also to coordinate difficult linguistic issues.
2845
2846 @cindex list of translation teams, where to find
2847 In the Free Translation Project, each translation team has its own mailing
2848 list.  The up-to-date list of teams can be found at the Free Translation
2849 Project's homepage, @uref{http://www.iro.umontreal.ca/contrib/po/HTML/},
2850 in the "National teams" area.
2851
2852 @item Content-Type
2853 @cindex encoding of PO files
2854 @cindex charset of PO files
2855 Replace @samp{CHARSET} with the character encoding used for your language,
2856 in your locale, or UTF-8.  This field is needed for correct operation of the
2857 @code{msgmerge} and @code{msgfmt} programs, as well as for users whose
2858 locale's character encoding differs from yours (see @ref{Charset conversion}).
2859
2860 @cindex @code{locale} program
2861 You get the character encoding of your locale by running the shell command
2862 @samp{locale charmap}.  If the result is @samp{C} or @samp{ANSI_X3.4-1968},
2863 which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your
2864 locale is not correctly configured.  In this case, ask your translation
2865 team which charset to use.  @samp{ASCII} is not usable for any language
2866 except Latin.
2867
2868 @cindex encoding list
2869 Because the PO files must be portable to operating systems with less advanced
2870 internationalization facilities, the character encodings that can be used
2871 are limited to those supported by both GNU @code{libc} and GNU
2872 @code{libiconv}.  These are:
2873 @code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},
2874 @code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},
2875 @code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14},
2876 @code{ISO-8859-15},
2877 @code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T},
2878 @code{CP850}, @code{CP866}, @code{CP874},
2879 @code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},
2880 @code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},
2881 @code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},
2882 @code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},
2883 @code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}.
2884
2885 @c This data is taken from glibc/localedata/SUPPORTED.
2886 @cindex Linux
2887 In the GNU system, the following encodings are frequently used for the
2888 corresponding languages.
2889
2890 @cindex encoding for your language
2891 @itemize
2892 @item @code{ISO-8859-1} for
2893 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,
2894 English, Estonian, Faroese, Finnish, French, Galician, German,
2895 Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,
2896 Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,
2897 Walloon,
2898 @item @code{ISO-8859-2} for
2899 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,
2900 Slovenian,
2901 @item @code{ISO-8859-3} for Maltese,
2902 @item @code{ISO-8859-5} for Macedonian, Serbian,
2903 @item @code{ISO-8859-6} for Arabic,
2904 @item @code{ISO-8859-7} for Greek,
2905 @item @code{ISO-8859-8} for Hebrew,
2906 @item @code{ISO-8859-9} for Turkish,
2907 @item @code{ISO-8859-13} for Latvian, Lithuanian, Maori,
2908 @item @code{ISO-8859-14} for Welsh,
2909 @item @code{ISO-8859-15} for
2910 Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
2911 Italian, Portuguese, Spanish, Swedish, Walloon,
2912 @item @code{KOI8-R} for Russian,
2913 @item @code{KOI8-U} for Ukrainian,
2914 @item @code{KOI8-T} for Tajik,
2915 @item @code{CP1251} for Bulgarian, Byelorussian,
2916 @item @code{GB2312}, @code{GBK}, @code{GB18030}
2917 for simplified writing of Chinese,
2918 @item @code{BIG5}, @code{BIG5-HKSCS}
2919 for traditional writing of Chinese,
2920 @item @code{EUC-JP} for Japanese,
2921 @item @code{EUC-KR} for Korean,
2922 @item @code{TIS-620} for Thai,
2923 @item @code{GEORGIAN-PS} for Georgian,
2924 @item @code{UTF-8} for any language, including those listed above.
2925 @end itemize
2926
2927 @cindex quote characters, use in PO files
2928 @cindex quotation marks
2929 When single quote characters or double quote characters are used in
2930 translations for your language, and your locale's encoding is one of the
2931 ISO-8859-* charsets, it is best if you create your PO files in UTF-8
2932 encoding, instead of your locale's encoding.  This is because in UTF-8
2933 the real quote characters can be represented (single quote characters:
2934 U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
2935 ISO-8859-* charsets has them all.  Users in UTF-8 locales will see the
2936 real quote characters, whereas users in ISO-8859-* locales will see the
2937 vertical apostrophe and the vertical double quote instead (because that's
2938 what the character set conversion will transliterate them to).
2939
2940 @cindex @code{xmodmap} program, and typing quotation marks
2941 To enter such quote characters under X11, you can change your keyboard
2942 mapping using the @code{xmodmap} program.  The X11 names of the quote
2943 characters are "leftsinglequotemark", "rightsinglequotemark",
2944 "leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
2945 "doublelowquotemark".
2946
2947 Note that only recent versions of GNU Emacs support the UTF-8 encoding:
2948 Emacs 20 with Mule-UCS, and Emacs 21.  As of January 2001, XEmacs doesn't
2949 support the UTF-8 encoding.
2950
2951 The character encoding name can be written in either upper or lower case.
2952 Usually upper case is preferred.
2953
2954 @item Content-Transfer-Encoding
2955 Set this to @code{8bit}.
2956
2957 @item Plural-Forms
2958 This field is optional.  It is only needed if the PO file has plural forms.
2959 You can find them by searching for the @samp{msgid_plural} keyword.  The
2960 format of the plural forms field is described in @ref{Plural forms}.
2961 @end table
2962
2963 @node Updating, Manipulating, Creating, Top
2964 @chapter Updating Existing PO Files
2965
2966 @c FIXME: Rewrite.
2967
2968 @menu
2969 * msgmerge Invocation::         Invoking the @code{msgmerge} Program
2970 * Translated Entries::          Translated Entries
2971 * Fuzzy Entries::               Fuzzy Entries
2972 * Untranslated Entries::        Untranslated Entries
2973 * Obsolete Entries::            Obsolete Entries
2974 * Modifying Translations::      Modifying Translations
2975 * Modifying Comments::          Modifying Comments
2976 * Subedit::                     Mode for Editing Translations
2977 * C Sources Context::           C Sources Context
2978 * Auxiliary::                   Consulting Auxiliary PO Files
2979 * Compendium::                  Using Translation Compendia
2980 @end menu
2981
2982 @node msgmerge Invocation, Translated Entries, Updating, Updating
2983 @section Invoking the @code{msgmerge} Program
2984
2985 @include msgmerge.texi
2986
2987 @node Translated Entries, Fuzzy Entries, msgmerge Invocation, Updating
2988 @section Translated Entries
2989 @cindex translated entries
2990
2991 Each PO file entry for which the @code{msgstr} field has been filled with
2992 a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),
2993 is said to be a @dfn{translated} entry.  Only translated entries will
2994 later be compiled by GNU @code{msgfmt} and become usable in programs.
2995 Other entry types will be excluded; translation will not occur for them.
2996
2997 @emindex moving by translated entries
2998 Some commands are more specifically related to translated entry processing.
2999
3000 @table @kbd
3001 @item t
3002 @efindex t@r{, PO Mode command}
3003 Find the next translated entry (@code{po-next-translated-entry}).
3004
3005 @item T
3006 @efindex T@r{, PO Mode command}
3007 Find the previous translated entry (@code{po-previous-translated-entry}).
3008
3009 @end table
3010
3011 @efindex t@r{, PO Mode command}
3012 @efindex po-next-translated-entry@r{, PO Mode command}
3013 @efindex T@r{, PO Mode command}
3014 @efindex po-previous-translated-entry@r{, PO Mode command}
3015 The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T}
3016 (@code{po-previous-translated-entry}) move forwards or backwards, chasing
3017 for an translated entry.  If none is found, the search is extended and
3018 wraps around in the PO file buffer.
3019
3020 @evindex po-auto-fuzzy-on-edit@r{, PO Mode variable}
3021 Translated entries usually result from the translator having edited in
3022 a translation for them, @ref{Modifying Translations}.  However, if the
3023 variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having
3024 received a new translation first becomes a fuzzy entry, which ought to
3025 be later unfuzzied before becoming an official, genuine translated entry.
3026 @xref{Fuzzy Entries}.
3027
3028 @node Fuzzy Entries, Untranslated Entries, Translated Entries, Updating
3029 @section Fuzzy Entries
3030 @cindex fuzzy entries
3031
3032 @cindex attributes of a PO file entry
3033 @cindex attribute, fuzzy
3034 Each PO file entry may have a set of @dfn{attributes}, which are
3035 qualities given a name and explicitly associated with the translation,
3036 using a special system comment.  One of these attributes
3037 has the name @code{fuzzy}, and entries having this attribute are said
3038 to have a fuzzy translation.  They are called fuzzy entries, for short.
3039
3040 Fuzzy entries, even if they account for translated entries for
3041 most other purposes, usually call for revision by the translator.
3042 Those may be produced by applying the program @code{msgmerge} to
3043 update an older translated PO files according to a new PO template
3044 file, when this tool hypothesises that some new @code{msgid} has
3045 been modified only slightly out of an older one, and chooses to pair
3046 what it thinks to be the old translation for the new modified entry.
3047 The slight alteration in the original string (the @code{msgid} string)
3048 should often be reflected in the translated string, and this requires
3049 the intervention of the translator.  For this reason, @code{msgmerge}
3050 might mark some entries as being fuzzy.
3051
3052 @emindex moving by fuzzy entries
3053 Also, the translator may decide herself to mark an entry as fuzzy
3054 for her own convenience, when she wants to remember that the entry
3055 has to be later revisited.  So, some commands are more specifically
3056 related to fuzzy entry processing.
3057
3058 @table @kbd
3059 @item z
3060 @efindex z@r{, PO Mode command}
3061 @c better append "-entry" all the time. -ke-
3062 Find the next fuzzy entry (@code{po-next-fuzzy-entry}).
3063
3064 @item Z
3065 @efindex Z@r{, PO Mode command}
3066 Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}).
3067
3068 @item @key{TAB}
3069 @efindex TAB@r{, PO Mode command}
3070 Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}).
3071
3072 @end table
3073
3074 @efindex z@r{, PO Mode command}
3075 @efindex po-next-fuzzy-entry@r{, PO Mode command}
3076 @efindex Z@r{, PO Mode command}
3077 @efindex po-previous-fuzzy-entry@r{, PO Mode command}
3078 The commands @kbd{z} (@code{po-next-fuzzy-entry}) and @kbd{Z}
3079 (@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for
3080 a fuzzy entry.  If none is found, the search is extended and wraps
3081 around in the PO file buffer.
3082
3083 @efindex TAB@r{, PO Mode command}
3084 @efindex po-unfuzzy@r{, PO Mode command}
3085 @evindex po-auto-select-on-unfuzzy@r{, PO Mode variable}
3086 The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy
3087 attribute associated with an entry, usually leaving it translated.
3088 Further, if the variable @code{po-auto-select-on-unfuzzy} has not
3089 the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase
3090 for another interesting entry to work on.  The initial value of
3091 @code{po-auto-select-on-unfuzzy} is @code{nil}.
3092
3093 The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}.  However,
3094 if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry
3095 edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to
3096 ensure some kind of double check, later.  In this case, the usual paradigm
3097 is that an entry becomes fuzzy (if not already) whenever the translator
3098 modifies it.  If she is satisfied with the translation, she then uses
3099 @kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute
3100 on the same blow.  If she is not satisfied yet, she merely uses @kbd{@key{SPC}}
3101 to chase another entry, leaving the entry fuzzy.
3102
3103 @efindex DEL@r{, PO Mode command}
3104 @efindex po-fade-out-entry@r{, PO Mode command}
3105 The translator may also use the @kbd{@key{DEL}} command
3106 (@code{po-fade-out-entry}) over any translated entry to mark it as being
3107 fuzzy, when she wants to easily leave a trace she wants to later return
3108 working at this entry.
3109
3110 Also, when time comes to quit working on a PO file buffer with the @kbd{q}
3111 command, the translator is asked for confirmation, if fuzzy string
3112 still exists.
3113
3114 @node Untranslated Entries, Obsolete Entries, Fuzzy Entries, Updating
3115 @section Untranslated Entries
3116 @cindex untranslated entries
3117
3118 When @code{xgettext} originally creates a PO file, unless told
3119 otherwise, it initializes the @code{msgid} field with the untranslated
3120 string, and leaves the @code{msgstr} string to be empty.  Such entries,
3121 having an empty translation, are said to be @dfn{untranslated} entries.
3122 Later, when the programmer slightly modifies some string right in
3123 the program, this change is later reflected in the PO file
3124 by the appearance of a new untranslated entry for the modified string.
3125
3126 The usual commands moving from entry to entry consider untranslated
3127 entries on the same level as active entries.  Untranslated entries
3128 are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.
3129
3130 @emindex moving by untranslated entries
3131 The work of the translator might be (quite naively) seen as the process
3132 of seeking for an untranslated entry, editing a translation for
3133 it, and repeating these actions until no untranslated entries remain.
3134 Some commands are more specifically related to untranslated entry
3135 processing.
3136
3137 @table @kbd
3138 @item u
3139 @efindex u@r{, PO Mode command}
3140 Find the next untranslated entry (@code{po-next-untranslated-entry}).
3141
3142 @item U
3143 @efindex U@r{, PO Mode command}
3144 Find the previous untranslated entry (@code{po-previous-untransted-entry}).
3145
3146 @item k
3147 @efindex k@r{, PO Mode command}
3148 Turn the current entry into an untranslated one (@code{po-kill-msgstr}).
3149
3150 @end table
3151
3152 @efindex u@r{, PO Mode command}
3153 @efindex po-next-untranslated-entry@r{, PO Mode command}
3154 @efindex U@r{, PO Mode command}
3155 @efindex po-previous-untransted-entry@r{, PO Mode command}
3156 The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U}
3157 (@code{po-previous-untransted-entry}) move forwards or backwards,
3158 chasing for an untranslated entry.  If none is found, the search is
3159 extended and wraps around in the PO file buffer.
3160
3161 @efindex k@r{, PO Mode command}
3162 @efindex po-kill-msgstr@r{, PO Mode command}
3163 An entry can be turned back into an untranslated entry by
3164 merely emptying its translation, using the command @kbd{k}
3165 (@code{po-kill-msgstr}).  @xref{Modifying Translations}.
3166
3167 Also, when time comes to quit working on a PO file buffer
3168 with the @kbd{q} command, the translator is asked for confirmation,
3169 if some untranslated string still exists.
3170
3171 @node Obsolete Entries, Modifying Translations, Untranslated Entries, Updating
3172 @section Obsolete Entries
3173 @cindex obsolete entries
3174
3175 By @dfn{obsolete} PO file entries, we mean those entries which are
3176 commented out, usually by @code{msgmerge} when it found that the
3177 translation is not needed anymore by the package being localized.
3178
3179 The usual commands moving from entry to entry consider obsolete
3180 entries on the same level as active entries.  Obsolete entries are
3181 easily recognizable by the fact that all their lines start with
3182 @code{#}, even those lines containing @code{msgid} or @code{msgstr}.
3183
3184 Commands exist for emptying the translation or reinitializing it
3185 to the original untranslated string.  Commands interfacing with the
3186 kill ring may force some previously saved text into the translation.
3187 The user may interactively edit the translation.  All these commands
3188 may apply to obsolete entries, carefully leaving the entry obsolete
3189 after the fact.
3190
3191 @emindex moving by obsolete entries
3192 Moreover, some commands are more specifically related to obsolete
3193 entry processing.
3194
3195 @table @kbd
3196 @item o
3197 @efindex o@r{, PO Mode command}
3198 Find the next obsolete entry (@code{po-next-obsolete-entry}).
3199
3200 @item O
3201 @efindex O@r{, PO Mode command}
3202 Find the previous obsolete entry (@code{po-previous-obsolete-entry}).
3203
3204 @item @key{DEL}
3205 @efindex DEL@r{, PO Mode command}
3206 Make an active entry obsolete, or zap out an obsolete entry
3207 (@code{po-fade-out-entry}).
3208
3209 @end table
3210
3211 @efindex o@r{, PO Mode command}
3212 @efindex po-next-obsolete-entry@r{, PO Mode command}
3213 @efindex O@r{, PO Mode command}
3214 @efindex po-previous-obsolete-entry@r{, PO Mode command}
3215 The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O}
3216 (@code{po-previous-obsolete-entry}) move forwards or backwards,
3217 chasing for an obsolete entry.  If none is found, the search is
3218 extended and wraps around in the PO file buffer.
3219
3220 PO mode does not provide ways for un-commenting an obsolete entry
3221 and making it active, because this would reintroduce an original
3222 untranslated string which does not correspond to any marked string
3223 in the program sources.  This goes with the philosophy of never
3224 introducing useless @code{msgid} values.
3225
3226 @efindex DEL@r{, PO Mode command}
3227 @efindex po-fade-out-entry@r{, PO Mode command}
3228 @emindex obsolete active entry
3229 @emindex comment out PO file entry
3230 However, it is possible to comment out an active entry, so making
3231 it obsolete.  GNU @code{gettext} utilities will later react to the
3232 disappearance of a translation by using the untranslated string.
3233 The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry
3234 a little further towards annihilation.  If the entry is active (it is a
3235 translated entry), then it is first made fuzzy.  If it is already fuzzy,
3236 then the entry is merely commented out, with confirmation.  If the entry
3237 is already obsolete, then it is completely deleted from the PO file.
3238 It is easy to recycle the translation so deleted into some other PO file
3239 entry, usually one which is untranslated.  @xref{Modifying Translations}.
3240
3241 Here is a quite interesting problem to solve for later development of
3242 PO mode, for those nights you are not sleepy.  The idea would be that
3243 PO mode might become bright enough, one of these days, to make good
3244 guesses at retrieving the most probable candidate, among all obsolete
3245 entries, for initializing the translation of a newly appeared string.
3246 I think it might be a quite hard problem to do this algorithmically, as
3247 we have to develop good and efficient measures of string similarity.
3248 Right now, PO mode completely lets the decision to the translator,
3249 when the time comes to find the adequate obsolete translation, it
3250 merely tries to provide handy tools for helping her to do so.
3251
3252 @node Modifying Translations, Modifying Comments, Obsolete Entries, Updating
3253 @section Modifying Translations
3254 @cindex editing translations
3255 @emindex editing translations
3256
3257 PO mode prevents direct modification of the PO file, by the usual
3258 means Emacs gives for altering a buffer's contents.  By doing so,
3259 it pretends helping the translator to avoid little clerical errors
3260 about the overall file format, or the proper quoting of strings,
3261 as those errors would be easily made.  Other kinds of errors are
3262 still possible, but some may be caught and diagnosed by the batch
3263 validation process, which the translator may always trigger by the
3264 @kbd{V} command.  For all other errors, the translator has to rely on
3265 her own judgment, and also on the linguistic reports submitted to her
3266 by the users of the translated package, having the same mother tongue.
3267
3268 When the time comes to create a translation, correct an error diagnosed
3269 mechanically or reported by a user, the translators have to resort to
3270 using the following commands for modifying the translations.
3271
3272 @table @kbd
3273 @item @key{RET}
3274 @efindex RET@r{, PO Mode command}
3275 Interactively edit the translation (@code{po-edit-msgstr}).
3276
3277 @item @key{LFD}
3278 @itemx C-j
3279 @efindex LFD@r{, PO Mode command}
3280 @efindex C-j@r{, PO Mode command}
3281 Reinitialize the translation with the original, untranslated string
3282 (@code{po-msgid-to-msgstr}).
3283
3284 @item k
3285 @efindex k@r{, PO Mode command}
3286 Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}).
3287
3288 @item w
3289 @efindex w@r{, PO Mode command}
3290 Save the translation on the kill ring, without deleting it
3291 (@code{po-kill-ring-save-msgstr}).
3292
3293 @item y
3294 @efindex y@r{, PO Mode command}
3295 Replace the translation, taking the new from the kill ring
3296 (@code{po-yank-msgstr}).
3297
3298 @end table
3299
3300 @efindex RET@r{, PO Mode command}
3301 @efindex po-edit-msgstr@r{, PO Mode command}
3302 The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs
3303 window meant to edit in a new translation, or to modify an already existing
3304 translation.  The new window contains a copy of the translation taken from
3305 the current PO file entry, all ready for edition, expunged of all quoting
3306 marks, fully modifiable and with the complete extent of Emacs modifying
3307 commands.  When the translator is done with her modifications, she may use
3308 @w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted
3309 results, or @w{@kbd{C-c C-k}} to abort her modifications.  @xref{Subedit},
3310 for more information.
3311
3312 @efindex LFD@r{, PO Mode command}
3313 @efindex C-j@r{, PO Mode command}
3314 @efindex po-msgid-to-msgstr@r{, PO Mode command}
3315 The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or
3316 reinitializes the translation with the original string.  This command is
3317 normally used when the translator wants to redo a fresh translation of
3318 the original string, disregarding any previous work.
3319
3320 @evindex po-auto-edit-with-msgid@r{, PO Mode variable}
3321 It is possible to arrange so, whenever editing an untranslated
3322 entry, the @kbd{@key{LFD}} command be automatically executed.  If you set
3323 @code{po-auto-edit-with-msgid} to @code{t}, the translation gets
3324 initialised with the original string, in case none exists already.
3325 The default value for @code{po-auto-edit-with-msgid} is @code{nil}.
3326
3327 @emindex starting a string translation
3328 In fact, whether it is best to start a translation with an empty
3329 string, or rather with a copy of the original string, is a matter of
3330 taste or habit.  Sometimes, the source language and the
3331 target language are so different that is simply best to start writing
3332 on an empty page.  At other times, the source and target languages
3333 are so close that it would be a waste to retype a number of words
3334 already being written in the original string.  A translator may also
3335 like having the original string right under her eyes, as she will
3336 progressively overwrite the original text with the translation, even
3337 if this requires some extra editing work to get rid of the original.
3338
3339 @emindex cut and paste for translated strings
3340 @efindex k@r{, PO Mode command}
3341 @efindex po-kill-msgstr@r{, PO Mode command}
3342 @efindex w@r{, PO Mode command}
3343 @efindex po-kill-ring-save-msgstr@r{, PO Mode command}
3344 The command @kbd{k} (@code{po-kill-msgstr}) merely empties the
3345 translation string, so turning the entry into an untranslated
3346 one.  But while doing so, its previous contents is put apart in
3347 a special place, known as the kill ring.  The command @kbd{w}
3348 (@code{po-kill-ring-save-msgstr}) has also the effect of taking a
3349 copy of the translation onto the kill ring, but it otherwise leaves
3350 the entry alone, and does @emph{not} remove the translation from the
3351 entry.  Both commands use exactly the Emacs kill ring, which is shared
3352 between buffers, and which is well known already to Emacs lovers.
3353
3354 The translator may use @kbd{k} or @kbd{w} many times in the course
3355 of her work, as the kill ring may hold several saved translations.
3356 From the kill ring, strings may later be reinserted in various
3357 Emacs buffers.  In particular, the kill ring may be used for moving
3358 translation strings between different entries of a single PO file
3359 buffer, or if the translator is handling many such buffers at once,
3360 even between PO files.
3361
3362 To facilitate exchanges with buffers which are not in PO mode, the
3363 translation string put on the kill ring by the @kbd{k} command is fully
3364 unquoted before being saved: external quotes are removed, multi-line
3365 strings are concatenated, and backslash escaped sequences are turned
3366 into their corresponding characters.  In the special case of obsolete
3367 entries, the translation is also uncommented prior to saving.
3368
3369 @efindex y@r{, PO Mode command}
3370 @efindex po-yank-msgstr@r{, PO Mode command}
3371 The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the
3372 translation of the current entry by a string taken from the kill ring.
3373 Following Emacs terminology, we then say that the replacement
3374 string is @dfn{yanked} into the PO file buffer.
3375 @xref{Yanking, , , emacs, The Emacs Editor}.
3376 The first time @kbd{y} is used, the translation receives the value of
3377 the most recent addition to the kill ring.  If @kbd{y} is typed once
3378 again, immediately, without intervening keystrokes, the translation
3379 just inserted is taken away and replaced by the second most recent
3380 addition to the kill ring.  By repeating @kbd{y} many times in a row,
3381 the translator may travel along the kill ring for saved strings,
3382 until she finds the string she really wanted.
3383
3384 When a string is yanked into a PO file entry, it is fully and
3385 automatically requoted for complying with the format PO files should
3386 have.  Further, if the entry is obsolete, PO mode then appropriately
3387 push the inserted string inside comments.  Once again, translators
3388 should not burden themselves with quoting considerations besides, of
3389 course, the necessity of the translated string itself respective to
3390 the program using it.
3391
3392 Note that @kbd{k} or @kbd{w} are not the only commands pushing strings
3393 on the kill ring, as almost any PO mode command replacing translation
3394 strings (or the translator comments) automatically saves the old string
3395 on the kill ring.  The main exceptions to this general rule are the
3396 yanking commands themselves.
3397
3398 @emindex using obsolete translations to make new entries
3399 To better illustrate the operation of killing and yanking, let's
3400 use an actual example, taken from a common situation.  When the
3401 programmer slightly modifies some string right in the program, his
3402 change is later reflected in the PO file by the appearance
3403 of a new untranslated entry for the modified string, and the fact
3404 that the entry translating the original or unmodified string becomes
3405 obsolete.  In many cases, the translator might spare herself some work
3406 by retrieving the unmodified translation from the obsolete entry,
3407 then initializing the untranslated entry @code{msgstr} field with
3408 this retrieved translation.  Once this done, the obsolete entry is
3409 not wanted anymore, and may be safely deleted.
3410
3411 When the translator finds an untranslated entry and suspects that a
3412 slight variant of the translation exists, she immediately uses @kbd{m}
3413 to mark the current entry location, then starts chasing obsolete
3414 entries with @kbd{o}, hoping to find some translation corresponding
3415 to the unmodified string.  Once found, she uses the @kbd{@key{DEL}} command
3416 for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}
3417 the translation, that is, pushes the translation on the kill ring.
3418 Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}
3419 then @emph{yanks} the saved translation right into the @code{msgstr}
3420 field.  The translator is then free to use @kbd{@key{RET}} for fine
3421 tuning the translation contents, and maybe to later use @kbd{u},
3422 then @kbd{m} again, for going on with the next untranslated string.
3423
3424 When some sequence of keys has to be typed over and over again, the
3425 translator may find it useful to become better acquainted with the Emacs
3426 capability of learning these sequences and playing them back under request.
3427 @xref{Keyboard Macros, , , emacs, The Emacs Editor}.
3428
3429 @node Modifying Comments, Subedit, Modifying Translations, Updating
3430 @section Modifying Comments
3431 @cindex editing comments in PO files
3432 @emindex editing comments
3433
3434 Any translation work done seriously will raise many linguistic
3435 difficulties, for which decisions have to be made, and the choices
3436 further documented.  These documents may be saved within the
3437 PO file in form of translator comments, which the translator
3438 is free to create, delete, or modify at will.  These comments may
3439 be useful to herself when she returns to this PO file after a while.
3440
3441 Comments not having whitespace after the initial @samp{#}, for example,
3442 those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator
3443 comments, they are exclusively created by other @code{gettext} tools.
3444 So, the commands below will never alter such system added comments,
3445 they are not meant for the translator to modify.  @xref{PO Files}.
3446
3447 The following commands are somewhat similar to those modifying translations,
3448 so the general indications given for those apply here.  @xref{Modifying
3449 Translations}.
3450
3451 @table @kbd
3452
3453 @item #
3454 @efindex #@r{, PO Mode command}
3455 Interactively edit the translator comments (@code{po-edit-comment}).
3456
3457 @item K
3458 @efindex K@r{, PO Mode command}
3459 Save the translator comments on the kill ring, and delete it
3460 (@code{po-kill-comment}).
3461
3462 @item W
3463 @efindex W@r{, PO Mode command}
3464 Save the translator comments on the kill ring, without deleting it
3465 (@code{po-kill-ring-save-comment}).
3466
3467 @item Y
3468 @efindex Y@r{, PO Mode command}
3469 Replace the translator comments, taking the new from the kill ring
3470 (@code{po-yank-comment}).
3471
3472 @end table
3473
3474 These commands parallel PO mode commands for modifying the translation
3475 strings, and behave much the same way as they do, except that they handle
3476 this part of PO file comments meant for translator usage, rather
3477 than the translation strings.  So, if the descriptions given below are
3478 slightly succinct, it is because the full details have already been given.
3479 @xref{Modifying Translations}.
3480
3481 @efindex #@r{, PO Mode command}
3482 @efindex po-edit-comment@r{, PO Mode command}
3483 The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window
3484 containing a copy of the translator comments on the current PO file entry.
3485 If there are no such comments, PO mode understands that the translator wants
3486 to add a comment to the entry, and she is presented with an empty screen.
3487 Comment marks (@code{#}) and the space following them are automatically
3488 removed before edition, and reinstated after.  For translator comments
3489 pertaining to obsolete entries, the uncommenting and recommenting operations
3490 are done twice.  Once in the editing window, the keys @w{@kbd{C-c C-c}}
3491 allow the translator to tell she is finished with editing the comment.
3492 @xref{Subedit}, for further details.
3493
3494 @evindex po-subedit-mode-hook@r{, PO Mode variable}
3495 Functions found on @code{po-subedit-mode-hook}, if any, are executed after
3496 the string has been inserted in the edit buffer.
3497
3498 @efindex K@r{, PO Mode command}
3499 @efindex po-kill-comment@r{, PO Mode command}
3500 @efindex W@r{, PO Mode command}
3501 @efindex po-kill-ring-save-comment@r{, PO Mode command}
3502 @efindex Y@r{, PO Mode command}
3503 @efindex po-yank-comment@r{, PO Mode command}
3504 The command @kbd{K} (@code{po-kill-comment}) gets rid of all
3505 translator comments, while saving those comments on the kill ring.
3506 The command @kbd{W} (@code{po-kill-ring-save-comment}) takes
3507 a copy of the translator comments on the kill ring, but leaves
3508 them undisturbed in the current entry.  The command @kbd{Y}
3509 (@code{po-yank-comment}) completely replaces the translator comments
3510 by a string taken at the front of the kill ring.  When this command
3511 is immediately repeated, the comments just inserted are withdrawn,
3512 and replaced by other strings taken along the kill ring.
3513
3514 On the kill ring, all strings have the same nature.  There is no
3515 distinction between @emph{translation} strings and @emph{translator
3516 comments} strings.  So, for example, let's presume the translator
3517 has just finished editing a translation, and wants to create a new
3518 translator comment to document why the previous translation was
3519 not good, just to remember what was the problem.  Foreseeing that she
3520 will do that in her documentation, the translator may want to quote
3521 the previous translation in her translator comments.  To do so, she
3522 may initialize the translator comments with the previous translation,
3523 still at the head of the kill ring.  Because editing already pushed the
3524 previous translation on the kill ring, she merely has to type @kbd{M-w}
3525 prior to @kbd{#}, and the previous translation will be right there,
3526 all ready for being introduced by some explanatory text.
3527
3528 On the other hand, presume there are some translator comments already
3529 and that the translator wants to add to those comments, instead
3530 of wholly replacing them.  Then, she should edit the comment right
3531 away with @kbd{#}.  Once inside the editing window, she can use the
3532 regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}
3533 (@code{yank-pop}) to get the previous translation where she likes.
3534
3535 @node Subedit, C Sources Context, Modifying Comments, Updating
3536 @section Details of Sub Edition
3537 @emindex subedit minor mode
3538
3539 The PO subedit minor mode has a few peculiarities worth being described
3540 in fuller detail.  It installs a few commands over the usual editing set
3541 of Emacs, which are described below.
3542
3543 @table @kbd
3544 @item C-c C-c
3545 @efindex C-c C-c@r{, PO Mode command}
3546 Complete edition (@code{po-subedit-exit}).
3547
3548 @item C-c C-k
3549 @efindex C-c C-k@r{, PO Mode command}
3550 Abort edition (@code{po-subedit-abort}).
3551
3552 @item C-c C-a
3553 @efindex C-c C-a@r{, PO Mode command}
3554 Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}).
3555
3556 @end table
3557
3558 @emindex exiting PO subedit
3559 @efindex C-c C-c@r{, PO Mode command}
3560 @efindex po-subedit-exit@r{, PO Mode command}
3561 The window's contents represents a translation for a given message,
3562 or a translator comment.  The translator may modify this window to
3563 her heart's content.  Once this is done, the command @w{@kbd{C-c C-c}}
3564 (@code{po-subedit-exit}) may be used to return the edited translation into
3565 the PO file, replacing the original translation, even if it moved out of
3566 sight or if buffers were switched.
3567
3568 @efindex C-c C-k@r{, PO Mode command}
3569 @efindex po-subedit-abort@r{, PO Mode command}
3570 If the translator becomes unsatisfied with her translation or comment,
3571 to the extent she prefers keeping what was existent prior to the
3572 @kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}
3573 (@code{po-subedit-abort}) to merely get rid of edition, while preserving
3574 the original translation or comment.  Another way would be for her to exit
3575 normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the
3576 whole effect of last edition.
3577
3578 @efindex C-c C-a@r{, PO Mode command}
3579 @efindex po-subedit-cycle-auxiliary@r{, PO Mode command}
3580 The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary})
3581 allows for glancing through translations
3582 already achieved in other languages, directly while editing the current
3583 translation.  This may be quite convenient when the translator is fluent
3584 at many languages, but of course, only makes sense when such completed
3585 auxiliary PO files are already available to her (@pxref{Auxiliary}).
3586
3587 Functions found on @code{po-subedit-mode-hook}, if any, are executed after
3588 the string has been inserted in the edit buffer.
3589
3590 While editing her translation, the translator should pay attention to not
3591 inserting unwanted @kbd{@key{RET}} (newline) characters at the end of
3592 the translated string if those are not meant to be there, or to removing
3593 such characters when they are required.  Since these characters are not
3594 visible in the editing buffer, they are easily introduced by mistake.
3595 To help her, @kbd{@key{RET}} automatically puts the character @code{<}
3596 at the end of the string being edited, but this @code{<} is not really
3597 part of the string.  On exiting the editing window with @w{@kbd{C-c C-c}},
3598 PO mode automatically removes such @kbd{<} and all whitespace added after
3599 it.  If the translator adds characters after the terminating @code{<}, it
3600 looses its delimiting property and integrally becomes part of the string.
3601 If she removes the delimiting @code{<}, then the edited string is taken
3602 @emph{as is}, with all trailing newlines, even if invisible.  Also, if
3603 the translated string ought to end itself with a genuine @code{<}, then
3604 the delimiting @code{<} may not be removed; so the string should appear,
3605 in the editing window, as ending with two @code{<} in a row.
3606
3607 @emindex editing multiple entries
3608 When a translation (or a comment) is being edited, the translator may move
3609 the cursor back into the PO file buffer and freely move to other entries,
3610 browsing at will.  If, with an edition pending, the translator wanders in the
3611 PO file buffer, she may decide to start modifying another entry.  Each entry
3612 being edited has its own subedit buffer.  It is possible to simultaneously
3613 edit the translation @emph{and} the comment of a single entry, or to
3614 edit entries in different PO files, all at once.  Typing @kbd{@key{RET}}
3615 on a field already being edited merely resumes that particular edit.  Yet,
3616 the translator should better be comfortable at handling many Emacs windows!
3617
3618 @emindex pending subedits
3619 Pending subedits may be completed or aborted in any order, regardless
3620 of how or when they were started.  When many subedits are pending and the
3621 translator asks for quitting the PO file (with the @kbd{q} command), subedits
3622 are automatically resumed one at a time, so she may decide for each of them.
3623
3624 @node C Sources Context, Auxiliary, Subedit, Updating
3625 @section C Sources Context
3626 @emindex consulting program sources
3627 @emindex looking at the source to aid translation
3628 @emindex use the source, Luke
3629
3630 PO mode is particularly powerful when used with PO files
3631 created through GNU @code{gettext} utilities, as those utilities
3632 insert special comments in the PO files they generate.
3633 Some of these special comments relate the PO file entry to
3634 exactly where the untranslated string appears in the program sources.
3635
3636 When the translator gets to an untranslated entry, she is fairly
3637 often faced with an original string which is not as informative as
3638 it normally should be, being succinct, cryptic, or otherwise ambiguous.
3639 Before choosing how to translate the string, she needs to understand
3640 better what the string really means and how tight the translation has
3641 to be.  Most of the time, when problems arise, the only way left to make
3642 her judgment is looking at the true program sources from where this
3643 string originated, searching for surrounding comments the programmer
3644 might have put in there, and looking around for helping clues of
3645 @emph{any} kind.
3646
3647 Surely, when looking at program sources, the translator will receive
3648 more help if she is a fluent programmer.  However, even if she is
3649 not versed in programming and feels a little lost in C code, the
3650 translator should not be shy at taking a look, once in a while.
3651 It is most probable that she will still be able to find some of the
3652 hints she needs.  She will learn quickly to not feel uncomfortable
3653 in program code, paying more attention to programmer's comments,
3654 variable and function names (if he dared choosing them well), and
3655 overall organization, than to the program code itself.
3656
3657 @emindex find source fragment for a PO file entry
3658 The following commands are meant to help the translator at getting
3659 program source context for a PO file entry.
3660
3661 @table @kbd
3662 @item s
3663 @efindex s@r{, PO Mode command}
3664 Resume the display of a program source context, or cycle through them
3665 (@code{po-cycle-source-reference}).
3666
3667 @item M-s
3668 @efindex M-s@r{, PO Mode command}
3669 Display of a program source context selected by menu
3670 (@code{po-select-source-reference}).
3671
3672 @item S
3673 @efindex S@r{, PO Mode command}
3674 Add a directory to the search path for source files
3675 (@code{po-consider-source-path}).
3676
3677 @item M-S
3678 @efindex M-S@r{, PO Mode command}
3679 Delete a directory from the search path for source files
3680 (@code{po-ignore-source-path}).
3681
3682 @end table
3683
3684 @efindex s@r{, PO Mode command}
3685 @efindex po-cycle-source-reference@r{, PO Mode command}
3686 @efindex M-s@r{, PO Mode command}
3687 @efindex po-select-source-reference@r{, PO Mode command}
3688 The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s}
3689 (@code{po-select-source-reference}) both open another window displaying
3690 some source program file, and already positioned in such a way that
3691 it shows an actual use of the string to be translated.  By doing
3692 so, the command gives source program context for the string.  But if
3693 the entry has no source context references, or if all references
3694 are unresolved along the search path for program sources, then the
3695 command diagnoses this as an error.
3696
3697 Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays
3698 in the PO file window.  If the translator really wants to
3699 get into the program source window, she ought to do it explicitly,
3700 maybe by using command @kbd{O}.
3701
3702 When @kbd{s} is typed for the first time, or for a PO file entry which
3703 is different of the last one used for getting source context, then the
3704 command reacts by giving the first context available for this entry,
3705 if any.  If some context has already been recently displayed for the
3706 current PO file entry, and the translator wandered off to do other
3707 things, typing @kbd{s} again will merely resume, in another window,
3708 the context last displayed.  In particular, if the translator moved
3709 the cursor away from the context in the source file, the command will
3710 bring the cursor back to the context.  By using @kbd{s} many times
3711 in a row, with no other commands intervening, PO mode will cycle to
3712 the next available contexts for this particular entry, getting back
3713 to the first context once the last has been shown.
3714
3715 The command @kbd{M-s} behaves differently.  Instead of cycling through
3716 references, it lets the translator choose a particular reference among
3717 many, and displays that reference.  It is best used with completion,
3718 if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in
3719 response to the question, she will be offered a menu of all possible
3720 references, as a reminder of which are the acceptable answers.
3721 This command is useful only where there are really many contexts
3722 available for a single string to translate.
3723
3724 @efindex S@r{, PO Mode command}
3725 @efindex po-consider-source-path@r{, PO Mode command}
3726 @efindex M-S@r{, PO Mode command}
3727 @efindex po-ignore-source-path@r{, PO Mode command}
3728 Program source files are usually found relative to where the PO
3729 file stands.  As a special provision, when this fails, the file is
3730 also looked for, but relative to the directory immediately above it.
3731 Those two cases take proper care of most PO files.  However, it might
3732 happen that a PO file has been moved, or is edited in a different
3733 place than its normal location.  When this happens, the translator
3734 should tell PO mode in which directory normally sits the genuine PO
3735 file.  Many such directories may be specified, and all together, they
3736 constitute what is called the @dfn{search path} for program sources.
3737 The command @kbd{S} (@code{po-consider-source-path}) is used to interactively
3738 enter a new directory at the front of the search path, and the command
3739 @kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,
3740 one of the directories she does not want anymore on the search path.
3741
3742 @node Auxiliary, Compendium, C Sources Context, Updating
3743 @section Consulting Auxiliary PO Files
3744 @emindex consulting translations to other languages
3745
3746 PO mode is able to help the knowledgeable translator, being fluent in
3747 many languages, at taking advantage of translations already achieved
3748 in other languages she just happens to know.  It provides these other
3749 language translations as additional context for her own work.  Moreover,
3750 it has features to ease the production of translations for many languages
3751 at once, for translators preferring to work in this way.
3752
3753 @cindex auxiliary PO file
3754 @emindex auxiliary PO file
3755 An @dfn{auxiliary} PO file is an existing PO file meant for the same
3756 package the translator is working on, but targeted to a different mother
3757 tongue language.  Commands exist for declaring and handling auxiliary
3758 PO files, and also for showing contexts for the entry under work.
3759
3760 Here are the auxiliary file commands available in PO mode.
3761
3762 @table @kbd
3763 @item a
3764 @efindex a@r{, PO Mode command}
3765 Seek auxiliary files for another translation for the same entry
3766 (@code{po-cycle-auxiliary}).
3767
3768 @item C-c C-a
3769 @efindex C-c C-a@r{, PO Mode command}
3770 Switch to a particular auxiliary file (@code{po-select-auxiliary}).
3771
3772 @item A
3773 @efindex A@r{, PO Mode command}
3774 Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}).
3775
3776 @item M-A
3777 @efindex M-A@r{, PO Mode command}
3778 Remove this PO file from the list of auxiliary files
3779 (@code{po-ignore-as-auxiliary}).
3780
3781 @end table
3782
3783 @efindex A@r{, PO Mode command}
3784 @efindex po-consider-as-auxiliary@r{, PO Mode command}
3785 @efindex M-A@r{, PO Mode command}
3786 @efindex po-ignore-as-auxiliary@r{, PO Mode command}
3787 Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current
3788 PO file to the list of auxiliary files, while command @kbd{M-A}
3789 (@code{po-ignore-as-auxiliary} just removes it.
3790
3791 @efindex a@r{, PO Mode command}
3792 @efindex po-cycle-auxiliary@r{, PO Mode command}
3793 The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO
3794 files, round-robin, searching for a translated entry in some other language
3795 having an @code{msgid} field identical as the one for the current entry.
3796 The found PO file, if any, takes the place of the current PO file in
3797 the display (its window gets on top).  Before doing so, the current PO
3798 file is also made into an auxiliary file, if not already.  So, @kbd{a}
3799 in this newly displayed PO file will seek another PO file, and so on,
3800 so repeating @kbd{a} will eventually yield back the original PO file.
3801
3802 @efindex C-c C-a@r{, PO Mode command}
3803 @efindex po-select-auxiliary@r{, PO Mode command}
3804 The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator
3805 for her choice of a particular auxiliary file, with completion, and
3806 then switches to that selected PO file.  The command also checks if
3807 the selected file has an @code{msgid} field identical as the one for
3808 the current entry, and if yes, this entry becomes current.  Otherwise,
3809 the cursor of the selected file is left undisturbed.
3810
3811 For all this to work fully, auxiliary PO files will have to be normalized,
3812 in that way that @code{msgid} fields should be written @emph{exactly}
3813 the same way.  It is possible to write @code{msgid} fields in various
3814 ways for representing the same string, different writing would break the
3815 proper behaviour of the auxiliary file commands of PO mode.  This is not
3816 expected to be much a problem in practice, as most existing PO files have
3817 their @code{msgid} entries written by the same GNU @code{gettext} tools.
3818
3819 @efindex normalize@r{, PO Mode command}
3820 However, PO files initially created by PO mode itself, while marking
3821 strings in source files, are normalised differently.  So are PO
3822 files resulting of the the @samp{M-x normalize} command.  Until these
3823 discrepancies between PO mode and other GNU @code{gettext} tools get
3824 fully resolved, the translator should stay aware of normalisation issues.
3825
3826 @node Compendium,  , Auxiliary, Updating
3827 @section Using Translation Compendia
3828 @emindex using translation compendia
3829
3830 @cindex compendium
3831 A @dfn{compendium} is a special PO file containing a set of
3832 translations recurring in many different packages.  The translator can
3833 use gettext tools to build a new compendium, to add entries to her
3834 compendium, and to initialize untranslated entries, or to update
3835 already translated entries, from translations kept in the compendium.
3836
3837 @menu
3838 * Creating Compendia::          Merging translations for later use
3839 * Using Compendia::             Using older translations if they fit
3840 @end menu
3841
3842 @node Creating Compendia, Using Compendia, Compendium, Compendium
3843 @subsection Creating Compendia
3844 @cindex creating compendia
3845 @cindex compendium, creating
3846
3847 Basically every PO file consisting of translated entries only can be
3848 declared as a valid compendium.  Often the translator wants to have
3849 special compendia; let's consider two cases: @cite{concatenating PO
3850 files} and @cite{extracting a message subset from a PO file}.
3851
3852 @subsubsection Concatenate PO Files
3853
3854 @cindex concatenating PO files into a compendium
3855 @cindex accumulating translations
3856 To concatenate several valid PO files into one compendium file you can
3857 use @samp{msgcomm} or @samp{msgcat} (the latter preferred):
3858
3859 @example
3860 msgcat -o compendium.po file1.po file2.po
3861 @end example
3862
3863 By default, @code{msgcat} will accumulate divergent translations
3864 for the same string.  Those occurences will be marked as @code{fuzzy}
3865 and highly visible decorated; calling @code{msgcat} on
3866 @file{file1.po}:
3867
3868 @example
3869 #: src/hello.c:200
3870 #, c-format
3871 msgid "Report bugs to <%s>.\n"
3872 msgstr "Comunicar `bugs' a <%s>.\n"
3873 @end example
3874
3875 @noindent
3876 and @file{file2.po}:
3877
3878 @example
3879 #: src/bye.c:100
3880 #, c-format
3881 msgid "Report bugs to <%s>.\n"
3882 msgstr "Comunicar \"bugs\" a <%s>.\n"
3883 @end example
3884
3885 @noindent
3886 will result in:
3887
3888 @example
3889 #: src/hello.c:200 src/bye.c:100
3890 #, fuzzy, c-format
3891 msgid "Report bugs to <%s>.\n"
3892 msgstr ""
3893 "#-#-#-#-#  file1.po  #-#-#-#-#\n"
3894 "Comunicar `bugs' a <%s>.\n"
3895 "#-#-#-#-#  file2.po  #-#-#-#-#\n"
3896 "Comunicar \"bugs\" a <%s>.\n"
3897 @end example
3898
3899 @noindent
3900 The translator will have to resolve this ``conflict'' manually; she
3901 has to decide whether the first or the second version is appropriate
3902 (or provide a new translation), to delete the ``marker lines'', and
3903 finally to remove the @code{fuzzy} mark.
3904
3905 If the translator knows in advance the first found translation of a
3906 message is always the best translation she can make use to the
3907 @samp{--use-first} switch:
3908
3909 @example
3910 msgcat --use-first -o compendium.po file1.po file2.po
3911 @end example
3912
3913 A good compendium file must not contain @code{fuzzy} or untranslated
3914 entries.  If input files are ``dirty'' you must preprocess the input
3915 files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}.
3916
3917 @subsubsection Extract a Message Subset from a PO File
3918 @cindex extracting parts of a PO file into a compendium
3919
3920 Nobody wants to translate the same messages again and again; thus you
3921 may wish to have a compendium file containing @file{getopt.c} messages.
3922
3923 To extract a message subset (e.g., all @file{getopt.c} messages) from an
3924 existing PO file into one compendium file you can use @samp{msggrep}:
3925
3926 @example
3927 msggrep --location src/getopt.c -o compendium.po file.po
3928 @end example
3929
3930 @node Using Compendia,  , Creating Compendia, Compendium
3931 @subsection Using Compendia
3932
3933 You can use a compendium file to initialize a translation from scratch
3934 or to update an already existing translation.
3935
3936 @subsubsection Initialize a New Translation File
3937 @cindex initialize translations from a compendium
3938
3939 Since a PO file with translations does not exist the translator can
3940 merely use @file{/dev/null} to fake the ``old'' translation file.
3941
3942 @example
3943 msgmerge --compendium compendium.po -o file.po /dev/null file.pot
3944 @end example
3945
3946 @subsubsection Update an Existing Translation File
3947 @cindex update translations from a compendium
3948
3949 Concatenate the compendium file(s) and the existing PO, merge the
3950 result with the POT file and remove the obsolete entries (optional,
3951 here done using @samp{sed}):
3952
3953 @example
3954 msgcat --use-first -o update.po compendium1.po compendium2.po file.po
3955 msgmerge update.po file.pot | sed -e '/^#~/d' > file.po
3956 @end example
3957
3958 @node Manipulating, Binaries, Updating, Top
3959 @chapter Manipulating PO Files
3960 @cindex manipulating PO files
3961
3962 Sometimes it is necessary to manipulate PO files in a way that is better
3963 performed automatically than by hand.  GNU @code{gettext} includes a
3964 complete set of tools for this purpose.
3965
3966 @cindex merging two PO files
3967 When merging two packages into a single package, the resulting POT file
3968 will be the concatenation of the two packages' POT files.  Thus the
3969 maintainer must concatenate the two existing package translations into
3970 a single translation catalog, for each language.  This is best performed
3971 using @samp{msgcat}.  It is then the translators' duty to deal with any
3972 possible conflicts that arose during the merge.
3973
3974 @cindex encoding conversion
3975 When a translator takes over the translation job from another translator,
3976 but she uses a different character encoding in her locale, she will
3977 convert the catalog to her character encoding.  This is best done through
3978 the @samp{msgconv} program.
3979
3980 When a maintainer takes a source file with tagged messages from another
3981 package, he should also take the existing translations for this source
3982 file (and not let the translators do the same job twice).  One way to do
3983 this is through @samp{msggrep}, another is to create a POT file for
3984 that source file and use @samp{msgmerge}.
3985
3986 @cindex dialect
3987 @cindex orthography
3988 When a translator wants to adjust some translation catalog for a special
3989 dialect or orthography --- for example, German as written in Switzerland
3990 versus German as written in Germany --- she needs to apply some text
3991 processing to every message in the catalog.  The tool for doing this is
3992 @samp{msgfilter}.
3993
3994 Another use of @code{msgfilter} is to produce approximately the POT file for
3995 which a given PO file was made.  This can be done through a filter command
3996 like @samp{msgfilter sed -e d | sed -e '/^# /d'}.  Note that the original
3997 POT file may have had different comments and different plural message counts,
3998 that's why it's better to use the original POT file if available.
3999
4000 @cindex checking of translations
4001 When a translator wants to check her translations, for example according
4002 to orthography rules or using a non-interactive spell checker, she can do
4003 so using the @samp{msgexec} program.
4004
4005 @cindex duplicate elimination
4006 When third party tools create PO or POT files, sometimes duplicates cannot
4007 be avoided.  But the GNU @code{gettext} tools give an error when they
4008 encounter duplicate msgids in the same file and in the same domain.
4009 To merge duplicates, the @samp{msguniq} program can be used.
4010
4011 @samp{msgcomm} is a more general tool for keeping or throwing away
4012 duplicates, occurring in different files.
4013
4014 @samp{msgcmp} can be used to check whether a translation catalog is
4015 completely translated.
4016
4017 @cindex attributes, manipulating
4018 @samp{msgattrib} can be used to select and extract only the fuzzy
4019 or untranslated messages of a translation catalog.
4020
4021 @samp{msgen} is useful as a first step for preparing English translation
4022 catalogs.  It copies each message's msgid to its msgstr.
4023
4024 Finally, for those applications where all these various programs are not
4025 sufficient, a library @samp{libgettextpo} is provided that can be used to
4026 write other specialized programs that process PO files.
4027
4028 @menu
4029 * msgcat Invocation::           Invoking the @code{msgcat} Program
4030 * msgconv Invocation::          Invoking the @code{msgconv} Program
4031 * msggrep Invocation::          Invoking the @code{msggrep} Program
4032 * msgfilter Invocation::        Invoking the @code{msgfilter} Program
4033 * msguniq Invocation::          Invoking the @code{msguniq} Program
4034 * msgcomm Invocation::          Invoking the @code{msgcomm} Program
4035 * msgcmp Invocation::           Invoking the @code{msgcmp} Program
4036 * msgattrib Invocation::        Invoking the @code{msgattrib} Program
4037 * msgen Invocation::            Invoking the @code{msgen} Program
4038 * msgexec Invocation::          Invoking the @code{msgexec} Program
4039 * libgettextpo::                Writing your own programs that process PO files
4040 @end menu
4041
4042 @node msgcat Invocation, msgconv Invocation, Manipulating, Manipulating
4043 @section Invoking the @code{msgcat} Program
4044
4045 @include msgcat.texi
4046
4047 @node msgconv Invocation, msggrep Invocation, msgcat Invocation, Manipulating
4048 @section Invoking the @code{msgconv} Program
4049
4050 @include msgconv.texi
4051
4052 @node msggrep Invocation, msgfilter Invocation, msgconv Invocation, Manipulating
4053 @section Invoking the @code{msggrep} Program
4054
4055 @include msggrep.texi
4056
4057 @node msgfilter Invocation, msguniq Invocation, msggrep Invocation, Manipulating
4058 @section Invoking the @code{msgfilter} Program
4059
4060 @include msgfilter.texi
4061
4062 @node msguniq Invocation, msgcomm Invocation, msgfilter Invocation, Manipulating
4063 @section Invoking the @code{msguniq} Program
4064
4065 @include msguniq.texi
4066
4067 @node msgcomm Invocation, msgcmp Invocation, msguniq Invocation, Manipulating
4068 @section Invoking the @code{msgcomm} Program
4069
4070 @include msgcomm.texi
4071
4072 @node msgcmp Invocation, msgattrib Invocation, msgcomm Invocation, Manipulating
4073 @section Invoking the @code{msgcmp} Program
4074
4075 @include msgcmp.texi
4076
4077 @node msgattrib Invocation, msgen Invocation, msgcmp Invocation, Manipulating
4078 @section Invoking the @code{msgattrib} Program
4079
4080 @include msgattrib.texi
4081
4082 @node msgen Invocation, msgexec Invocation, msgattrib Invocation, Manipulating
4083 @section Invoking the @code{msgen} Program
4084
4085 @include msgen.texi
4086
4087 @node msgexec Invocation, libgettextpo, msgen Invocation, Manipulating
4088 @section Invoking the @code{msgexec} Program
4089
4090 @include msgexec.texi
4091
4092 @node libgettextpo,  , msgexec Invocation, Manipulating
4093 @section Writing your own programs that process PO files
4094
4095 For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc.
4096 is not sufficient, a set of C functions is provided in a library, to make it
4097 possible to process PO files in your own programs.  When you use this library,
4098 you don't need to write routines to parse the PO file; instead, you retreive
4099 a pointer in memory to each of messages contained in the PO file.  Functions
4100 for writing PO files are not provided at this time.
4101
4102 The functions are declared in the header file @samp{<gettext-po.h>}, and are
4103 defined in a library called @samp{libgettextpo}.
4104
4105 @deftp {Data Type} po_file_t
4106 This is a pointer type that refers to the contents of a PO file, after it has
4107 been read into memory.
4108 @end deftp
4109
4110 @deftp {Data Type} po_message_iterator_t
4111 This is a pointer type that refers to an iterator that produces a sequence of
4112 messages.
4113 @end deftp
4114
4115 @deftp {Data Type} po_message_t
4116 This is a pointer type that refers to a message of a PO file, including its
4117 translation.
4118 @end deftp
4119
4120 @deftypefun po_file_t po_file_read (const char *@var{filename})
4121 The @code{po_file_read} function reads a PO file into memory.  The file name
4122 is given as argument.  The return value is a handle to the PO file's contents,
4123 valid until @code{po_file_free} is called on it.  In case of error, the return
4124 value is @code{NULL}, and @code{errno} is set.
4125 @end deftypefun
4126
4127 @deftypefun void po_file_free (po_file_t @var{file})
4128 The @code{po_file_free} function frees a PO file's contents from memory,
4129 including all messages that are only implicitly accessible through iterators.
4130 @end deftypefun
4131
4132 @deftypefun {const char * const *} po_file_domains (po_file_t @var{file})
4133 The @code{po_file_domains} function returns the domains for which the given
4134 PO file has messages.  The return value is a @code{NULL} terminated array
4135 which is valid as long as the @var{file} handle is valid.  For PO files which
4136 contain no @samp{domain} directive, the return value contains only one domain,
4137 namely the default domain @code{"messages"}.
4138 @end deftypefun
4139
4140 @deftypefun po_message_iterator_t po_message_iterator (po_file_t @var{file}, const char *@var{domain})
4141 The @code{po_message_iterator} returns an iterator that will produce the
4142 messages of @var{file} that belong to the given @var{domain}.  If @var{domain}
4143 is @code{NULL}, the default domain is used instead.  To list the messages,
4144 use the function @code{po_next_message} repeatedly.
4145 @end deftypefun
4146
4147 @deftypefun void po_message_iterator_free (po_message_iterator_t @var{iterator})
4148 The @code{po_message_iterator_free} function frees an iterator previously
4149 allocated through the @code{po_message_iterator} function.
4150 @end deftypefun
4151
4152 @deftypefun po_message_t po_next_message (po_message_iterator_t @var{iterator})
4153 The @code{po_next_message} function returns the next message from
4154 @var{iterator} and advances the iterator.  It returns @code{NULL} when the
4155 iterator has reached the end of its message list.
4156 @end deftypefun
4157
4158 The following functions returns details of a @code{po_message_t}.  Recall
4159 that the results are valid as long as the @var{file} handle is valid.
4160
4161 @deftypefun {const char *} po_message_msgid (po_message_t @var{message})
4162 The @code{po_message_msgid} function returns the @code{msgid} (untranslated
4163 English string) of a message.  This is guaranteed to be non-@code{NULL}.
4164 @end deftypefun
4165
4166 @deftypefun {const char *} po_message_msgid_plural (po_message_t @var{message})
4167 The @code{po_message_msgid_plural} function returns the @code{msgid_plural}
4168 (untranslated English plural string) of a message with plurals, or @code{NULL}
4169 for a message without plural.
4170 @end deftypefun
4171
4172 @deftypefun {const char *} po_message_msgstr (po_message_t @var{message})
4173 The @code{po_message_msgstr} function returns the @code{msgstr} (translation)
4174 of a message.  For an untranslated message, the return value is an empty
4175 string.
4176 @end deftypefun
4177
4178 @deftypefun {const char *} po_message_msgstr_plural (po_message_t @var{message}, int @var{index})
4179 The @code{po_message_msgstr_plural} function returns the
4180 @code{msgstr[@var{index}]} of a message with plurals, or @code{NULL} when
4181 the @var{index} is out of range or for a message without plural.
4182 @end deftypefun
4183
4184 Here is an example code how these functions can be used.
4185
4186 @example
4187 const char *filename = @dots{};
4188 po_file_t file = po_file_read (filename);
4189
4190 if (file == NULL)
4191   error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename);
4192 @{
4193   const char * const *domains = po_file_domains (file);
4194   const char * const *domainp;
4195
4196   for (domainp = domains; *domainp; domainp++)
4197     @{
4198       const char *domain = *domainp;
4199       po_message_iterator_t iterator = po_message_iterator (file, domain);
4200
4201       for (;;)
4202         @{
4203           po_message_t *message = po_next_message (iterator);
4204
4205           if (message == NULL)
4206             break;
4207           @{
4208             const char *msgid = po_message_msgid (message);
4209             const char *msgstr = po_message_msgstr (message);
4210
4211             @dots{}
4212           @}
4213         @}
4214       po_message_iterator_free (iterator);
4215     @}
4216 @}
4217 po_file_free (file);
4218 @end example
4219
4220 @node Binaries, Users, Manipulating, Top
4221 @chapter Producing Binary MO Files
4222
4223 @c FIXME: Rewrite.
4224
4225 @menu
4226 * msgfmt Invocation::           Invoking the @code{msgfmt} Program
4227 * msgunfmt Invocation::         Invoking the @code{msgunfmt} Program
4228 * MO Files::                    The Format of GNU MO Files
4229 @end menu
4230
4231 @node msgfmt Invocation, msgunfmt Invocation, Binaries, Binaries
4232 @section Invoking the @code{msgfmt} Program
4233
4234 @include msgfmt.texi
4235
4236 @node msgunfmt Invocation, MO Files, msgfmt Invocation, Binaries
4237 @section Invoking the @code{msgunfmt} Program
4238
4239 @include msgunfmt.texi
4240
4241 @node MO Files,  , msgunfmt Invocation, Binaries
4242 @section The Format of GNU MO Files
4243 @cindex MO file's format
4244 @cindex file format, @file{.mo}
4245
4246 The format of the generated MO files is best described by a picture,
4247 which appears below.
4248
4249 @cindex magic signature of MO files
4250 The first two words serve the identification of the file.  The magic
4251 number will always signal GNU MO files.  The number is stored in the
4252 byte order of the generating machine, so the magic number really is
4253 two numbers: @code{0x950412de} and @code{0xde120495}.  The second
4254 word describes the current revision of the file format.  For now the
4255 revision is 0.  This might change in future versions, and ensures
4256 that the readers of MO files can distinguish new formats from old
4257 ones, so that both can be handled correctly.  The version is kept
4258 separate from the magic number, instead of using different magic
4259 numbers for different formats, mainly because @file{/etc/magic} is
4260 not updated often.  It might be better to have magic separated from
4261 internal format version identification.
4262
4263 Follow a number of pointers to later tables in the file, allowing
4264 for the extension of the prefix part of MO files without having to
4265 recompile programs reading them.  This might become useful for later
4266 inserting a few flag bits, indication about the charset used, new
4267 tables, or other things.
4268
4269 Then, at offset @var{O} and offset @var{T} in the picture, two tables
4270 of string descriptors can be found.  In both tables, each string
4271 descriptor uses two 32 bits integers, one for the string length,
4272 another for the offset of the string in the MO file, counting in bytes
4273 from the start of the file.  The first table contains descriptors
4274 for the original strings, and is sorted so the original strings
4275 are in increasing lexicographical order.  The second table contains
4276 descriptors for the translated strings, and is parallel to the first
4277 table: to find the corresponding translation one has to access the
4278 array slot in the second array with the same index.
4279
4280 Having the original strings sorted enables the use of simple binary
4281 search, for when the MO file does not contain an hashing table, or
4282 for when it is not practical to use the hashing table provided in
4283 the MO file.  This also has another advantage, as the empty string
4284 in a PO file GNU @code{gettext} is usually @emph{translated} into
4285 some system information attached to that particular MO file, and the
4286 empty string necessarily becomes the first in both the original and
4287 translated tables, making the system information very easy to find.
4288
4289 @cindex hash table, inside MO files
4290 The size @var{S} of the hash table can be zero.  In this case, the
4291 hash table itself is not contained in the MO file.  Some people might
4292 prefer this because a precomputed hashing table takes disk space, and
4293 does not win @emph{that} much speed.  The hash table contains indices
4294 to the sorted array of strings in the MO file.  Conflict resolution is
4295 done by double hashing.  The precise hashing algorithm used is fairly
4296 dependent on GNU @code{gettext} code, and is not documented here.
4297
4298 As for the strings themselves, they follow the hash file, and each
4299 is terminated with a @key{NUL}, and this @key{NUL} is not counted in
4300 the length which appears in the string descriptor.  The @code{msgfmt}
4301 program has an option selecting the alignment for MO file strings.
4302 With this option, each string is separately aligned so it starts at
4303 an offset which is a multiple of the alignment value.  On some RISC
4304 machines, a correct alignment will speed things up.
4305
4306 @cindex plural forms, in MO files
4307 Plural forms are stored by letting the plural of the original string
4308 follow the singular of the original string, separated through a
4309 @key{NUL} byte.  The length which appears in the string descriptor
4310 includes both.  However, only the singular of the original string
4311 takes part in the hash table lookup.  The plural variants of the
4312 translation are all stored consecutively, separated through a
4313 @key{NUL} byte.  Here also, the length in the string descriptor
4314 includes all of them.
4315
4316 Nothing prevents a MO file from having embedded @key{NUL}s in strings.
4317 However, the program interface currently used already presumes
4318 that strings are @key{NUL} terminated, so embedded @key{NUL}s are
4319 somewhat useless.  But the MO file format is general enough so other
4320 interfaces would be later possible, if for example, we ever want to
4321 implement wide characters right in MO files, where @key{NUL} bytes may
4322 accidently appear.  (No, we don't want to have wide characters in MO
4323 files.  They would make the file unnecessarily large, and the
4324 @samp{wchar_t} type being platform dependent, MO files would be
4325 platform dependent as well.)
4326
4327 This particular issue has been strongly debated in the GNU
4328 @code{gettext} development forum, and it is expectable that MO file
4329 format will evolve or change over time.  It is even possible that many
4330 formats may later be supported concurrently.  But surely, we have to
4331 start somewhere, and the MO file format described here is a good start.
4332 Nothing is cast in concrete, and the format may later evolve fairly
4333 easily, so we should feel comfortable with the current approach.
4334
4335 @example
4336 @group
4337         byte
4338              +------------------------------------------+
4339           0  | magic number = 0x950412de                |
4340              |                                          |
4341           4  | file format revision = 0                 |
4342              |                                          |
4343           8  | number of strings                        |  == N
4344              |                                          |
4345          12  | offset of table with original strings    |  == O
4346              |                                          |
4347          16  | offset of table with translation strings |  == T
4348              |                                          |
4349          20  | size of hashing table                    |  == S
4350              |                                          |
4351          24  | offset of hashing table                  |  == H
4352              |                                          |
4353              .                                          .
4354              .    (possibly more entries later)         .
4355              .                                          .
4356              |                                          |
4357           O  | length & offset 0th string  ----------------.
4358       O + 8  | length & offset 1st string  ------------------.
4359               ...                                    ...   | |
4360 O + ((N-1)*8)| length & offset (N-1)th string           |  | |
4361              |                                          |  | |
4362           T  | length & offset 0th translation  ---------------.
4363       T + 8  | length & offset 1st translation  -----------------.
4364               ...                                    ...   | | | |
4365 T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
4366              |                                          |  | | | |
4367           H  | start hash table                         |  | | | |
4368               ...                                    ...   | | | |
4369   H + S * 4  | end hash table                           |  | | | |
4370              |                                          |  | | | |
4371              | NUL terminated 0th string  <----------------' | | |
4372              |                                          |    | | |
4373              | NUL terminated 1st string  <------------------' | |
4374              |                                          |      | |
4375               ...                                    ...       | |
4376              |                                          |      | |
4377              | NUL terminated 0th translation  <---------------' |
4378              |                                          |        |
4379              | NUL terminated 1st translation  <-----------------'
4380              |                                          |
4381               ...                                    ...
4382              |                                          |
4383              +------------------------------------------+
4384 @end group
4385 @end example
4386
4387 @node Users, Programmers, Binaries, Top
4388 @chapter The User's View
4389
4390 When GNU @code{gettext} will truly have reached its goal, average users
4391 should feel some kind of astonished pleasure, seeing the effect of
4392 that strange kind of magic that just makes their own native language
4393 appear everywhere on their screens.  As for naive users, they would
4394 ideally have no special pleasure about it, merely taking their own
4395 language for @emph{granted}, and becoming rather unhappy otherwise.
4396
4397 So, let's try to describe here how we would like the magic to operate,
4398 as we want the users' view to be the simplest, among all ways one
4399 could look at GNU @code{gettext}.  All other software engineers:
4400 programmers, translators, maintainers, should work together in such a
4401 way that the magic becomes possible.  This is a long and progressive
4402 undertaking, and information is available about the progress of the
4403 Translation Project.
4404
4405 When a package is distributed, there are two kinds of users:
4406 @dfn{installers} who fetch the distribution, unpack it, configure
4407 it, compile it and install it for themselves or others to use; and
4408 @dfn{end users} that call programs of the package, once these have
4409 been installed at their site.  GNU @code{gettext} is offering magic
4410 for both installers and end users.
4411
4412 @menu
4413 * Matrix::                      The Current @file{ABOUT-NLS} Matrix
4414 * Installers::                  Magic for Installers
4415 * End Users::                   Magic for End Users
4416 @end menu
4417
4418 @node Matrix, Installers, Users, Users
4419 @section The Current @file{ABOUT-NLS} Matrix
4420 @cindex Translation Matrix
4421 @cindex available translations
4422 @cindex @file{ABOUT-NLS} file
4423
4424 Languages are not equally supported in all packages using GNU
4425 @code{gettext}.  To know if some package uses GNU @code{gettext}, one
4426 may check the distribution for the @file{ABOUT-NLS} information file, for
4427 some @file{@var{ll}.po} files, often kept together into some @file{po/}
4428 directory, or for an @file{intl/} directory.  Internationalized packages
4429 have usually many @file{@var{ll}.po} files, where @var{ll} represents
4430 the language.  @ref{End Users} for a complete description of the format
4431 for @var{ll}.
4432
4433 More generally, a matrix is available for showing the current state
4434 of the Translation Project, listing which packages are prepared for
4435 multi-lingual messages, and which languages are supported by each.
4436 Because this information changes often, this matrix is not kept within
4437 this GNU @code{gettext} manual.  This information is often found in
4438 file @file{ABOUT-NLS} from various distributions, but is also as old as
4439 the distribution itself.  A recent copy of this @file{ABOUT-NLS} file,
4440 containing up-to-date information, should generally be found on the
4441 Translation Project sites, and also on most GNU archive sites.
4442
4443 @node Installers, End Users, Matrix, Users
4444 @section Magic for Installers
4445 @cindex package build and installation options
4446 @cindex setting up @code{gettext} at build time
4447
4448 By default, packages fully using GNU @code{gettext}, internally,
4449 are installed in such a way that they to allow translation of
4450 messages.  At @emph{configuration} time, those packages should
4451 automatically detect whether the underlying host system already provides
4452 the GNU @code{gettext} functions.  If not,
4453 the GNU @code{gettext} library should be automatically prepared
4454 and used.  Installers may use special options at configuration
4455 time for changing this behavior.  The command @samp{./configure
4456 --with-included-gettext} bypasses system @code{gettext} to
4457 use the included GNU @code{gettext} instead,
4458 while @samp{./configure --disable-nls}
4459 produces programs totally unable to translate messages.
4460
4461 @vindex LINGUAS@r{, environment variable}
4462 Internationalized packages have usually many @file{@var{ll}.po}
4463 files.  Unless
4464 translations are disabled, all those available are installed together
4465 with the package.  However, the environment variable @code{LINGUAS}
4466 may be set, prior to configuration, to limit the installed set.
4467 @code{LINGUAS} should then contain a space separated list of two-letter
4468 codes, stating which languages are allowed.
4469
4470 @node End Users,  , Installers, Users
4471 @section Magic for End Users
4472 @cindex setting up @code{gettext} at run time
4473 @cindex selecting message language
4474 @cindex language selection
4475
4476 @vindex LANG@r{, environment variable}
4477 We consider here those packages using GNU @code{gettext} internally,
4478 and for which the installers did not disable translation at
4479 @emph{configure} time.  Then, users only have to set the @code{LANG}
4480 environment variable to the appropriate @samp{@var{ll}_@var{CC}}
4481 combination prior to using the programs in the package.  @xref{Matrix}.
4482 For example, let's presume a German site.  At the shell prompt, users
4483 merely have to execute @w{@samp{setenv LANG de_DE}} (in @code{csh}) or
4484 @w{@samp{export LANG; LANG=de_DE}} (in @code{sh}).  They could even do
4485 this from their @file{.login} or @file{.profile} file.
4486
4487 @node Programmers, Translators, Users, Top
4488 @chapter The Programmer's View
4489
4490 @c FIXME: Reorganize whole chapter.
4491
4492 One aim of the current message catalog implementation provided by
4493 GNU @code{gettext} was to use the system's message catalog handling, if the
4494 installer wishes to do so.  So we perhaps should first take a look at
4495 the solutions we know about.  The people in the POSIX committee did not
4496 manage to agree on one of the semi-official standards which we'll
4497 describe below.  In fact they couldn't agree on anything, so they decided
4498 only to include an example of an interface.  The major Unix vendors
4499 are split in the usage of the two most important specifications: X/Open's
4500 catgets vs. Uniforum's gettext interface.  We'll describe them both and
4501 later explain our solution of this dilemma.
4502
4503 @menu
4504 * catgets::                     About @code{catgets}
4505 * gettext::                     About @code{gettext}
4506 * Comparison::                  Comparing the two interfaces
4507 * Using libintl.a::             Using libintl.a in own programs
4508 * gettext grok::                Being a @code{gettext} grok
4509 * Temp Programmers::            Temporary Notes for the Programmers Chapter
4510 @end menu
4511
4512 @node catgets, gettext, Programmers, Programmers
4513 @section About @code{catgets}
4514 @cindex @code{catgets}, X/Open specification
4515
4516 The @code{catgets} implementation is defined in the X/Open Portability
4517 Guide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
4518 process of creating this standard seemed to be too slow for some of
4519 the Unix vendors so they created their implementations on preliminary
4520 versions of the standard.  Of course this leads again to problems while
4521 writing platform independent programs: even the usage of @code{catgets}
4522 does not guarantee a unique interface.
4523
4524 Another, personal comment on this that only a bunch of committee members
4525 could have made this interface.  They never really tried to program
4526 using this interface.  It is a fast, memory-saving implementation, an
4527 user can happily live with it.  But programmers hate it (at least I and
4528 some others do@dots{})
4529
4530 But we must not forget one point: after all the trouble with transfering
4531 the rights on Unix(tm) they at last came to X/Open, the very same who
4532 published this specification.  This leads me to making the prediction
4533 that this interface will be in future Unix standards (e.g. Spec1170) and
4534 therefore part of all Unix implementation (implementations, which are
4535 @emph{allowed} to wear this name).
4536
4537 @menu
4538 * Interface to catgets::        The interface
4539 * Problems with catgets::       Problems with the @code{catgets} interface?!
4540 @end menu
4541
4542 @node Interface to catgets, Problems with catgets, catgets, catgets
4543 @subsection The Interface
4544 @cindex interface to @code{catgets}
4545
4546 The interface to the @code{catgets} implementation consists of three
4547 functions which correspond to those used in file access: @code{catopen}
4548 to open the catalog for using, @code{catgets} for accessing the message
4549 tables, and @code{catclose} for closing after work is done.  Prototypes
4550 for the functions and the needed definitions are in the
4551 @code{<nl_types.h>} header file.
4552
4553 @cindex @code{catopen}, a @code{catgets} function
4554 @code{catopen} is used like in this:
4555
4556 @example
4557 nl_catd catd = catopen ("catalog_name", 0);
4558 @end example
4559
4560 The function takes as the argument the name of the catalog.  This usual
4561 refers to the name of the program or the package.  The second parameter
4562 is not further specified in the standard.  I don't even know whether it
4563 is implemented consistently among various systems.  So the common advice
4564 is to use @code{0} as the value.  The return value is a handle to the
4565 message catalog, equivalent to handles to file returned by @code{open}.
4566
4567 @cindex @code{catgets}, a @code{catgets} function
4568 This handle is of course used in the @code{catgets} function which can
4569 be used like this:
4570
4571 @example
4572 char *translation = catgets (catd, set_no, msg_id, "original string");
4573 @end example
4574
4575 The first parameter is this catalog descriptor.  The second parameter
4576 specifies the set of messages in this catalog, in which the message
4577 described by @code{msg_id} is obtained.  @code{catgets} therefore uses a
4578 three-stage addressing:
4579
4580 @display
4581 catalog name @result{} set number @result{} message ID @result{} translation
4582 @end display
4583
4584 @c Anybody else loving Haskell??? :-) -- Uli
4585
4586 The fourth argument is not used to address the translation.  It is given
4587 as a default value in case when one of the addressing stages fail.  One
4588 important thing to remember is that although the return type of catgets
4589 is @code{char *} the resulting string @emph{must not} be changed.  It
4590 should better be @code{const char *}, but the standard is published in
4591 1988, one year before ANSI C.
4592
4593 @noindent
4594 @cindex @code{catclose}, a @code{catgets} function
4595 The last of these functions is used and behaves as expected:
4596
4597 @example
4598 catclose (catd);
4599 @end example
4600
4601 After this no @code{catgets} call using the descriptor is legal anymore.
4602
4603 @node Problems with catgets,  , Interface to catgets, catgets
4604 @subsection Problems with the @code{catgets} Interface?!
4605 @cindex problems with @code{catgets} interface
4606
4607 Now that this description seemed to be really easy --- where are the
4608 problems we speak of?  In fact the interface could be used in a
4609 reasonable way, but constructing the message catalogs is a pain.  The
4610 reason for this lies in the third argument of @code{catgets}: the unique
4611 message ID.  This has to be a numeric value for all messages in a single
4612 set.  Perhaps you could imagine the problems keeping such a list while
4613 changing the source code.  Add a new message here, remove one there.  Of
4614 course there have been developed a lot of tools helping to organize this
4615 chaos but one as the other fails in one aspect or the other.  We don't
4616 want to say that the other approach has no problems but they are far
4617 more easy to manage.
4618
4619 @node gettext, Comparison, catgets, Programmers
4620 @section About @code{gettext}
4621 @cindex @code{gettext}, a programmer's view
4622
4623 The definition of the @code{gettext} interface comes from a Uniforum
4624 proposal.  It was submitted there by Sun, who had implemented the
4625 @code{gettext} function in SunOS 4, around 1990.  Nowadays, the
4626 @code{gettext} interface is specified by the OpenI18N standard.
4627
4628 The main point about this solution is that it does not follow the
4629 method of normal file handling (open-use-close) and that it does not
4630 burden the programmer with so many tasks, especially the unique key handling.
4631 Of course here also a unique key is needed, but this key is the message
4632 itself (how long or short it is).  See @ref{Comparison} for a more
4633 detailed comparison of the two methods.
4634
4635 The following section contains a rather detailed description of the
4636 interface.  We make it that detailed because this is the interface
4637 we chose for the GNU @code{gettext} Library.  Programmers interested
4638 in using this library will be interested in this description.
4639
4640 @menu
4641 * Interface to gettext::        The interface
4642 * Ambiguities::                 Solving ambiguities
4643 * Locating Catalogs::           Locating message catalog files
4644 * Charset conversion::          How to request conversion to Unicode
4645 * Plural forms::                Additional functions for handling plurals
4646 * GUI program problems::        Another technique for solving ambiguities
4647 * Optimized gettext::           Optimization of the *gettext functions
4648 @end menu
4649
4650 @node Interface to gettext, Ambiguities, gettext, gettext
4651 @subsection The Interface
4652 @cindex @code{gettext} interface
4653
4654 The minimal functionality an interface must have is a) to select a
4655 domain the strings are coming from (a single domain for all programs is
4656 not reasonable because its construction and maintenance is difficult,
4657 perhaps impossible) and b) to access a string in a selected domain.
4658
4659 This is principally the description of the @code{gettext} interface.  It
4660 has a global domain which unqualified usages reference.  Of course this
4661 domain is selectable by the user.
4662
4663 @example
4664 char *textdomain (const char *domain_name);
4665 @end example
4666
4667 This provides the possibility to change or query the current status of
4668 the current global domain of the @code{LC_MESSAGE} category.  The
4669 argument is a null-terminated string, whose characters must be legal in
4670 the use in filenames.  If the @var{domain_name} argument is @code{NULL},
4671 the function returns the current value.  If no value has been set
4672 before, the name of the default domain is returned: @emph{messages}.
4673 Please note that although the return value of @code{textdomain} is of
4674 type @code{char *} no changing is allowed.  It is also important to know
4675 that no checks of the availability are made.  If the name is not
4676 available you will see this by the fact that no translations are provided.
4677
4678 @noindent
4679 To use a domain set by @code{textdomain} the function
4680
4681 @example
4682 char *gettext (const char *msgid);
4683 @end example
4684
4685 @noindent
4686 is to be used.  This is the simplest reasonable form one can imagine.
4687 The translation of the string @var{msgid} is returned if it is available
4688 in the current domain.  If it is not available, the argument itself is
4689 returned.  If the argument is @code{NULL} the result is undefined.
4690
4691 One thing which should come into mind is that no explicit dependency to
4692 the used domain is given.  The current value of the domain for the
4693 @code{LC_MESSAGES} locale is used.  If this changes between two
4694 executions of the same @code{gettext} call in the program, both calls
4695 reference a different message catalog.
4696
4697 For the easiest case, which is normally used in internationalized
4698 packages, once at the beginning of execution a call to @code{textdomain}
4699 is issued, setting the domain to a unique name, normally the package
4700 name.  In the following code all strings which have to be translated are
4701 filtered through the gettext function.  That's all, the package speaks
4702 your language.
4703
4704 @node Ambiguities, Locating Catalogs, Interface to gettext, gettext
4705 @subsection Solving Ambiguities
4706 @cindex several domains
4707 @cindex domain ambiguities
4708 @cindex large package
4709
4710 While this single name domain works well for most applications there
4711 might be the need to get translations from more than one domain.  Of
4712 course one could switch between different domains with calls to
4713 @code{textdomain}, but this is really not convenient nor is it fast.  A
4714 possible situation could be one case subject to discussion during this
4715 writing:  all
4716 error messages of functions in the set of common used functions should
4717 go into a separate domain @code{error}.  By this mean we would only need
4718 to translate them once.
4719 Another case are messages from a library, as these @emph{have} to be
4720 independent of the current domain set by the application.
4721
4722 @noindent
4723 For this reasons there are two more functions to retrieve strings:
4724
4725 @example
4726 char *dgettext (const char *domain_name, const char *msgid);
4727 char *dcgettext (const char *domain_name, const char *msgid,
4728                  int category);
4729 @end example
4730
4731 Both take an additional argument at the first place, which corresponds
4732 to the argument of @code{textdomain}.  The third argument of
4733 @code{dcgettext} allows to use another locale but @code{LC_MESSAGES}.
4734 But I really don't know where this can be useful.  If the
4735 @var{domain_name} is @code{NULL} or @var{category} has an value beside
4736 the known ones, the result is undefined.  It should also be noted that
4737 this function is not part of the second known implementation of this
4738 function family, the one found in Solaris.
4739
4740 A second ambiguity can arise by the fact, that perhaps more than one
4741 domain has the same name.  This can be solved by specifying where the
4742 needed message catalog files can be found.
4743
4744 @example
4745 char *bindtextdomain (const char *domain_name,
4746                       const char *dir_name);
4747 @end example
4748
4749 Calling this function binds the given domain to a file in the specified
4750 directory (how this file is determined follows below).  Especially a
4751 file in the systems default place is not favored against the specified
4752 file anymore (as it would be by solely using @code{textdomain}).  A
4753 @code{NULL} pointer for the @var{dir_name} parameter returns the binding
4754 associated with @var{domain_name}.  If @var{domain_name} itself is
4755 @code{NULL} nothing happens and a @code{NULL} pointer is returned.  Here
4756 again as for all the other functions is true that none of the return
4757 value must be changed!
4758
4759 It is important to remember that relative path names for the
4760 @var{dir_name} parameter can be trouble.  Since the path is always
4761 computed relative to the current directory different results will be
4762 achieved when the program executes a @code{chdir} command.  Relative
4763 paths should always be avoided to avoid dependencies and
4764 unreliabilities.
4765
4766 @node Locating Catalogs, Charset conversion, Ambiguities, gettext
4767 @subsection Locating Message Catalog Files
4768 @cindex message catalog files location
4769
4770 Because many different languages for many different packages have to be
4771 stored we need some way to add these information to file message catalog
4772 files.  The way usually used in Unix environments is have this encoding
4773 in the file name.  This is also done here.  The directory name given in
4774 @code{bindtextdomain}s second argument (or the default directory),
4775 followed by the value and name of the locale and the domain name are
4776 concatenated:
4777
4778 @example
4779 @var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
4780 @end example
4781
4782 The default value for @var{dir_name} is system specific.  For the GNU
4783 library, and for packages adhering to its conventions, it's:
4784 @example
4785 /usr/local/share/locale
4786 @end example
4787
4788 @noindent
4789 @var{locale} is the value of the locale whose name is this
4790 @code{LC_@var{category}}.  For @code{gettext} and @code{dgettext} this
4791 @code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some
4792 system, eg Ultrix, don't have @code{LC_MESSAGES}.  Here we use a more or
4793 less arbitrary value for it, namely 1729, the smallest positive integer
4794 which can be represented in two different ways as the sum of two cubes.}
4795 The value of the locale is determined through
4796 @code{setlocale (LC_@var{category}, NULL)}.
4797 @footnote{When the system does not support @code{setlocale} its behavior
4798 in setting the locale values is simulated by looking at the environment
4799 variables.}
4800 @code{dcgettext} specifies the locale category by the third argument.
4801
4802 @node Charset conversion, Plural forms, Locating Catalogs, gettext
4803 @subsection How to specify the output character set @code{gettext} uses
4804 @cindex charset conversion at runtime
4805 @cindex encoding conversion at runtime
4806
4807 @code{gettext} not only looks up a translation in a message catalog.  It
4808 also converts the translation on the fly to the desired output character
4809 set.  This is useful if the user is working in a different character set
4810 than the translator who created the message catalog, because it avoids
4811 distributing variants of message catalogs which differ only in the
4812 character set.
4813
4814 The output character set is, by default, the value of @code{nl_langinfo
4815 (CODESET)}, which depends on the @code{LC_CTYPE} part of the current
4816 locale.  But programs which store strings in a locale independent way
4817 (e.g. UTF-8) can request that @code{gettext} and related functions
4818 return the translations in that encoding, by use of the
4819 @code{bind_textdomain_codeset} function.
4820
4821 Note that the @var{msgid} argument to @code{gettext} is not subject to
4822 character set conversion.  Also, when @code{gettext} does not find a
4823 translation for @var{msgid}, it returns @var{msgid} unchanged --
4824 independently of the current output character set.  It is therefore
4825 recommended that all @var{msgid}s be US-ASCII strings.
4826
4827 @deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
4828 The @code{bind_textdomain_codeset} function can be used to specify the
4829 output character set for message catalogs for domain @var{domainname}.
4830 The @var{codeset} argument must be a valid codeset name which can be used
4831 for the @code{iconv_open} function, or a null pointer.
4832
4833 If the @var{codeset} parameter is the null pointer,
4834 @code{bind_textdomain_codeset} returns the currently selected codeset
4835 for the domain with the name @var{domainname}.  It returns @code{NULL} if
4836 no codeset has yet been selected.
4837
4838 The @code{bind_textdomain_codeset} function can be used several times.
4839 If used multiple times with the same @var{domainname} argument, the
4840 later call overrides the settings made by the earlier one.
4841
4842 The @code{bind_textdomain_codeset} function returns a pointer to a
4843 string containing the name of the selected codeset.  The string is
4844 allocated internally in the function and must not be changed by the
4845 user.  If the system went out of core during the execution of
4846 @code{bind_textdomain_codeset}, the return value is @code{NULL} and the
4847 global variable @var{errno} is set accordingly.
4848 @end deftypefun
4849
4850 @node Plural forms, GUI program problems, Charset conversion, gettext
4851 @subsection Additional functions for plural forms
4852 @cindex plural forms
4853
4854 The functions of the @code{gettext} family described so far (and all the
4855 @code{catgets} functions as well) have one problem in the real world
4856 which have been neglected completely in all existing approaches.  What
4857 is meant here is the handling of plural forms.
4858
4859 Looking through Unix source code before the time anybody thought about
4860 internationalization (and, sadly, even afterwards) one can often find
4861 code similar to the following:
4862
4863 @smallexample
4864    printf ("%d file%s deleted", n, n == 1 ? "" : "s");
4865 @end smallexample
4866
4867 @noindent
4868 After the first complaints from people internationalizing the code people
4869 either completely avoided formulations like this or used strings like
4870 @code{"file(s)"}.  Both look unnatural and should be avoided.  First
4871 tries to solve the problem correctly looked like this:
4872
4873 @smallexample
4874    if (n == 1)
4875      printf ("%d file deleted", n);
4876    else
4877      printf ("%d files deleted", n);
4878 @end smallexample
4879
4880 But this does not solve the problem.  It helps languages where the
4881 plural form of a noun is not simply constructed by adding an `s' but
4882 that is all.  Once again people fell into the trap of believing the
4883 rules their language is using are universal.  But the handling of plural
4884 forms differs widely between the language families.  For example,
4885 Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
4886
4887 @quotation
4888 In Polish we use e.g. plik (file) this way:
4889 @example
4890 1 plik
4891 2,3,4 pliki
4892 5-21 pliko'w
4893 22-24 pliki
4894 25-31 pliko'w
4895 @end example
4896 and so on (o' means 8859-2 oacute which should be rather okreska,
4897 similar to aogonek).
4898 @end quotation
4899
4900 There are two things which can differ between languages (and even inside
4901 language families);
4902
4903 @itemize @bullet
4904 @item
4905 The form how plural forms are built differs.  This is a problem with
4906 languages which have many irregularities.  German, for instance, is a
4907 drastic case.  Though English and German are part of the same language
4908 family (Germanic), the almost regular forming of plural noun forms
4909 (appending an `s') is hardly found in German.
4910
4911 @item
4912 The number of plural forms differ.  This is somewhat surprising for
4913 those who only have experiences with Romanic and Germanic languages
4914 since here the number is the same (there are two).
4915
4916 But other language families have only one form or many forms.  More
4917 information on this in an extra section.
4918 @end itemize
4919
4920 The consequence of this is that application writers should not try to
4921 solve the problem in their code.  This would be localization since it is
4922 only usable for certain, hardcoded language environments.  Instead the
4923 extended @code{gettext} interface should be used.
4924
4925 These extra functions are taking instead of the one key string two
4926 strings and a numerical argument.  The idea behind this is that using
4927 the numerical argument and the first string as a key, the implementation
4928 can select using rules specified by the translator the right plural
4929 form.  The two string arguments then will be used to provide a return
4930 value in case no message catalog is found (similar to the normal
4931 @code{gettext} behavior).  In this case the rules for Germanic language
4932 is used and it is assumed that the first string argument is the singular
4933 form, the second the plural form.
4934
4935 This has the consequence that programs without language catalogs can
4936 display the correct strings only if the program itself is written using
4937 a Germanic language.  This is a limitation but since the GNU C library
4938 (as well as the GNU @code{gettext} package) are written as part of the
4939 GNU package and the coding standards for the GNU project require program
4940 being written in English, this solution nevertheless fulfills its
4941 purpose.
4942
4943 @deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
4944 The @code{ngettext} function is similar to the @code{gettext} function
4945 as it finds the message catalogs in the same way.  But it takes two
4946 extra arguments.  The @var{msgid1} parameter must contain the singular
4947 form of the string to be converted.  It is also used as the key for the
4948 search in the catalog.  The @var{msgid2} parameter is the plural form.
4949 The parameter @var{n} is used to determine the plural form.  If no
4950 message catalog is found @var{msgid1} is returned if @code{n == 1},
4951 otherwise @code{msgid2}.
4952
4953 An example for the use of this function is:
4954
4955 @smallexample
4956 printf (ngettext ("%d file removed", "%d files removed", n), n);
4957 @end smallexample
4958
4959 Please note that the numeric value @var{n} has to be passed to the
4960 @code{printf} function as well.  It is not sufficient to pass it only to
4961 @code{ngettext}.
4962 @end deftypefun
4963
4964 @deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
4965 The @code{dngettext} is similar to the @code{dgettext} function in the
4966 way the message catalog is selected.  The difference is that it takes
4967 two extra parameter to provide the correct plural form.  These two
4968 parameters are handled in the same way @code{ngettext} handles them.
4969 @end deftypefun
4970
4971 @deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
4972 The @code{dcngettext} is similar to the @code{dcgettext} function in the
4973 way the message catalog is selected.  The difference is that it takes
4974 two extra parameter to provide the correct plural form.  These two
4975 parameters are handled in the same way @code{ngettext} handles them.
4976 @end deftypefun
4977
4978 Now, how do these functions solve the problem of the plural forms?
4979 Without the input of linguists (which was not available) it was not
4980 possible to determine whether there are only a few different forms in
4981 which plural forms are formed or whether the number can increase with
4982 every new supported language.
4983
4984 Therefore the solution implemented is to allow the translator to specify
4985 the rules of how to select the plural form.  Since the formula varies
4986 with every language this is the only viable solution except for
4987 hardcoding the information in the code (which still would require the
4988 possibility of extensions to not prevent the use of new languages).
4989
4990 @cindex specifying plural form in a PO file
4991 @kwindex nplurals@r{, in a PO file header}
4992 @kwindex plural@r{, in a PO file header}
4993 The information about the plural form selection has to be stored in the
4994 header entry of the PO file (the one with the empty @code{msgid} string).
4995 The plural form information looks like this:
4996
4997 @smallexample
4998 Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
4999 @end smallexample
5000
5001 The @code{nplurals} value must be a decimal number which specifies how
5002 many different plural forms exist for this language.  The string
5003 following @code{plural} is an expression which is using the C language
5004 syntax.  Exceptions are that no negative numbers are allowed, numbers
5005 must be decimal, and the only variable allowed is @code{n}.  This
5006 expression will be evaluated whenever one of the functions
5007 @code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
5008 numeric value passed to these functions is then substituted for all uses
5009 of the variable @code{n} in the expression.  The resulting value then
5010 must be greater or equal to zero and smaller than the value given as the
5011 value of @code{nplurals}.
5012
5013 @noindent
5014 @cindex plural form formulas
5015 The following rules are known at this point.  The language with families
5016 are listed.  But this does not necessarily mean the information can be
5017 generalized for the whole family (as can be easily seen in the table
5018 below).@footnote{Additions are welcome.  Send appropriate information to
5019 @email{bug-glibc-manual@@gnu.org}.}
5020
5021 @table @asis
5022 @item Only one form:
5023 Some languages only require one single form.  There is no distinction
5024 between the singular and plural form.  An appropriate header entry
5025 would look like this:
5026
5027 @smallexample
5028 Plural-Forms: nplurals=1; plural=0;
5029 @end smallexample
5030
5031 @noindent
5032 Languages with this property include:
5033
5034 @table @asis
5035 @item Finno-Ugric family
5036 Hungarian
5037 @item Asian family
5038 Japanese, Korean, Vietnamese
5039 @item Turkic/Altaic family
5040 Turkish
5041 @end table
5042
5043 @item Two forms, singular used for one only
5044 This is the form used in most existing programs since it is what English
5045 is using.  A header entry would look like this:
5046
5047 @smallexample
5048 Plural-Forms: nplurals=2; plural=n != 1;
5049 @end smallexample
5050
5051 (Note: this uses the feature of C expressions that boolean expressions
5052 have to value zero or one.)
5053
5054 @noindent
5055 Languages with this property include:
5056
5057 @table @asis
5058 @item Germanic family
5059 Danish, Dutch, English, Faroese, German, Norwegian, Swedish
5060 @item Finno-Ugric family
5061 Estonian, Finnish
5062 @item Latin/Greek family
5063 Greek
5064 @item Semitic family
5065 Hebrew
5066 @item Romanic family
5067 Italian, Portuguese, Spanish
5068 @item Artificial
5069 Esperanto
5070 @end table
5071
5072 @item Two forms, singular used for zero and one
5073 Exceptional case in the language family.  The header entry would be:
5074
5075 @smallexample
5076 Plural-Forms: nplurals=2; plural=n>1;
5077 @end smallexample
5078
5079 @noindent
5080 Languages with this property include:
5081
5082 @table @asis
5083 @item Romanic family
5084 French, Brazilian Portuguese
5085 @end table
5086
5087 @item Three forms, special case for zero
5088 The header entry would be:
5089
5090 @smallexample
5091 Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
5092 @end smallexample
5093
5094 @noindent
5095 Languages with this property include:
5096
5097 @table @asis
5098 @item Baltic family
5099 Latvian
5100 @end table
5101
5102 @item Three forms, special cases for one and two
5103 The header entry would be:
5104
5105 @smallexample
5106 Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
5107 @end smallexample
5108
5109 @noindent
5110 Languages with this property include:
5111
5112 @table @asis
5113 @item Celtic
5114 Gaeilge (Irish)
5115 @end table
5116
5117 @item Three forms, special case for numbers ending in 1[2-9]
5118 The header entry would look like this:
5119
5120 @smallexample
5121 Plural-Forms: nplurals=3; \
5122     plural=n%10==1 && n%100!=11 ? 0 : \
5123            n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
5124 @end smallexample
5125
5126 @noindent
5127 Languages with this property include:
5128
5129 @table @asis
5130 @item Baltic family
5131 Lithuanian
5132 @end table
5133
5134 @item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
5135 The header entry would look like this:
5136
5137 @smallexample
5138 Plural-Forms: nplurals=3; \
5139     plural=n%10==1 && n%100!=11 ? 0 : \
5140            n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
5141 @end smallexample
5142
5143 @noindent
5144 Languages with this property include:
5145
5146 @table @asis
5147 @item Slavic family
5148 Croatian, Serbian, Russian, Ukrainian
5149 @end table
5150
5151 @item Three forms, special cases for 1 and 2, 3, 4
5152 The header entry would look like this:
5153
5154 @smallexample
5155 Plural-Forms: nplurals=3; \
5156     plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
5157 @end smallexample
5158
5159 @noindent
5160 Languages with this property include:
5161
5162 @table @asis
5163 @item Slavic family
5164 Slovak, Czech
5165 @end table
5166
5167 @item Three forms, special case for one and some numbers ending in 2, 3, or 4
5168 The header entry would look like this:
5169
5170 @smallexample
5171 Plural-Forms: nplurals=3; \
5172     plural=n==1 ? 0 : \
5173            n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
5174 @end smallexample
5175
5176 @noindent
5177 Languages with this property include:
5178
5179 @table @asis
5180 @item Slavic family
5181 Polish
5182 @end table
5183
5184 @item Four forms, special case for one and all numbers ending in 02, 03, or 04
5185 The header entry would look like this:
5186
5187 @smallexample
5188 Plural-Forms: nplurals=4; \
5189     plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
5190 @end smallexample
5191
5192 @noindent
5193 Languages with this property include:
5194
5195 @table @asis
5196 @item Slavic family
5197 Slovenian
5198 @end table
5199 @end table
5200
5201 @node GUI program problems, Optimized gettext, Plural forms, gettext
5202 @subsection How to use @code{gettext} in GUI programs
5203 @cindex GUI programs
5204 @cindex translating menu entries
5205 @cindex menu entries
5206
5207 One place where the @code{gettext} functions, if used normally, have big
5208 problems is within programs with graphical user interfaces (GUIs).  The
5209 problem is that many of the strings which have to be translated are very
5210 short.  They have to appear in pull-down menus which restricts the
5211 length.  But strings which are not containing entire sentences or at
5212 least large fragments of a sentence may appear in more than one
5213 situation in the program but might have different translations.  This is
5214 especially true for the one-word strings which are frequently used in
5215 GUI programs.
5216
5217 As a consequence many people say that the @code{gettext} approach is
5218 wrong and instead @code{catgets} should be used which indeed does not
5219 have this problem.  But there is a very simple and powerful method to
5220 handle these kind of problems with the @code{gettext} functions.
5221
5222 @noindent
5223 As as example consider the following fictional situation.  A GUI program
5224 has a menu bar with the following entries:
5225
5226 @smallexample
5227 +------------+------------+--------------------------------------+
5228 | File       | Printer    |                                      |
5229 +------------+------------+--------------------------------------+
5230 | Open     | | Select   |
5231 | New      | | Open     |
5232 +----------+ | Connect  |
5233              +----------+
5234 @end smallexample
5235
5236 To have the strings @code{File}, @code{Printer}, @code{Open},
5237 @code{New}, @code{Select}, and @code{Connect} translated there has to be
5238 at some point in the code a call to a function of the @code{gettext}
5239 family.  But in two places the string passed into the function would be
5240 @code{Open}.  The translations might not be the same and therefore we
5241 are in the dilemma described above.
5242
5243 One solution to this problem is to artificially enlengthen the strings
5244 to make them unambiguous.  But what would the program do if no
5245 translation is available?  The enlengthened string is not what should be
5246 printed.  So we should use a little bit modified version of the functions.
5247
5248 To enlengthen the strings a uniform method should be used.  E.g., in the
5249 example above the strings could be chosen as
5250
5251 @smallexample
5252 Menu|File
5253 Menu|Printer
5254 Menu|File|Open
5255 Menu|File|New
5256 Menu|Printer|Select
5257 Menu|Printer|Open
5258 Menu|Printer|Connect
5259 @end smallexample
5260
5261 Now all the strings are different and if now instead of @code{gettext}
5262 the following little wrapper function is used, everything works just
5263 fine:
5264
5265 @cindex sgettext
5266 @smallexample
5267   char *
5268   sgettext (const char *msgid)
5269   @{
5270     char *msgval = gettext (msgid);
5271     if (msgval == msgid)
5272       msgval = strrchr (msgid, '|') + 1;
5273     return msgval;
5274   @}
5275 @end smallexample
5276
5277 What this little function does is to recognize the case when no
5278 translation is available.  This can be done very efficiently by a
5279 pointer comparison since the return value is the input value.  If there
5280 is no translation we know that the input string is in the format we used
5281 for the Menu entries and therefore contains a @code{|} character.  We
5282 simply search for the last occurrence of this character and return a
5283 pointer to the character following it.  That's it!
5284
5285 If one now consistently uses the enlengthened string form and replaces
5286 the @code{gettext} calls with calls to @code{sgettext} (this is normally
5287 limited to very few places in the GUI implementation) then it is
5288 possible to produce a program which can be internationalized.
5289
5290 The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
5291 and the @code{ngettext} equivalents) can and should have corresponding
5292 functions as well which look almost identical, except for the parameters
5293 and the call to the underlying function.
5294
5295 Now there is of course the question why such functions do not exist in
5296 the GNU gettext package?  There are two parts of the answer to this question.
5297
5298 @itemize @bullet
5299 @item
5300 They are easy to write and therefore can be provided by the project they
5301 are used in.  This is not an answer by itself and must be seen together
5302 with the second part which is:
5303
5304 @item
5305 There is no way the gettext package can contain a version which can work
5306 everywhere.  The problem is the selection of the character to separate
5307 the prefix from the actual string in the enlenghtened string.  The
5308 examples above used @code{|} which is a quite good choice because it
5309 resembles a notation frequently used in this context and it also is a
5310 character not often used in message strings.
5311
5312 But what if the character is used in message strings?  Or if the chose
5313 character is not available in the character set on the machine one
5314 compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is
5315 why the @file{iso646.h} file exists in @w{ISO C} programming environments).
5316 @end itemize
5317
5318 There is only one more comment to be said.  The wrapper function above
5319 requires that the translations strings are not enlengthened themselves.
5320 This is only logical.  There is no need to disambiguate the strings
5321 (since they are never used as keys for a search) and one also saves
5322 quite some memory and disk space by doing this.
5323
5324 @node Optimized gettext,  , GUI program problems, gettext
5325 @subsection Optimization of the *gettext functions
5326 @cindex optimization of @code{gettext} functions
5327
5328 At this point of the discussion we should talk about an advantage of the
5329 GNU @code{gettext} implementation.  Some readers might have pointed out
5330 that an internationalized program might have a poor performance if some
5331 string has to be translated in an inner loop.  While this is unavoidable
5332 when the string varies from one run of the loop to the other it is
5333 simply a waste of time when the string is always the same.  Take the
5334 following example:
5335
5336 @example
5337 @group
5338 @{
5339   while (@dots{})
5340     @{
5341       puts (gettext ("Hello world"));
5342     @}
5343 @}
5344 @end group
5345 @end example
5346
5347 @noindent
5348 When the locale selection does not change between two runs the resulting
5349 string is always the same.  One way to use this is:
5350
5351 @example
5352 @group
5353 @{
5354   str = gettext ("Hello world");
5355   while (@dots{})
5356     @{
5357       puts (str);
5358     @}
5359 @}
5360 @end group
5361 @end example
5362
5363 @noindent
5364 But this solution is not usable in all situation (e.g. when the locale
5365 selection changes) nor does it lead to legible code.
5366
5367 For this reason, GNU @code{gettext} caches previous translation results.
5368 When the same translation is requested twice, with no new message
5369 catalogs being loaded in between, @code{gettext} will, the second time,
5370 find the result through a single cache lookup.
5371
5372 @node Comparison, Using libintl.a, gettext, Programmers
5373 @section Comparing the Two Interfaces
5374 @cindex @code{gettext} vs @code{catgets}
5375 @cindex comparison of interfaces
5376
5377 @c FIXME: arguments to catgets vs. gettext
5378 @c Partly done 950718 -- drepper
5379
5380 The following discussion is perhaps a little bit colored.  As said
5381 above we implemented GNU @code{gettext} following the Uniforum
5382 proposal and this surely has its reasons.  But it should show how we
5383 came to this decision.
5384
5385 First we take a look at the developing process.  When we write an
5386 application using NLS provided by @code{gettext} we proceed as always.
5387 Only when we come to a string which might be seen by the users and thus
5388 has to be translated we use @code{gettext("@dots{}")} instead of
5389 @code{"@dots{}"}.  At the beginning of each source file (or in a central
5390 header file) we define
5391
5392 @example
5393 #define gettext(String) (String)
5394 @end example
5395
5396 Even this definition can be avoided when the system supports the
5397 @code{gettext} function in its C library.  When we compile this code the
5398 result is the same as if no NLS code is used.  When  you take a look at
5399 the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}
5400 instead of @code{gettext("@dots{}")}.  This reduces the number of
5401 additional characters per translatable string to @emph{3} (in words:
5402 three).
5403
5404 When now a production version of the program is needed we simply replace
5405 the definition
5406
5407 @example
5408 #define _(String) (String)
5409 @end example
5410
5411 @noindent
5412 by
5413
5414 @cindex include file @file{libintl.h}
5415 @example
5416 #include <libintl.h>
5417 #define _(String) gettext (String)
5418 @end example
5419
5420 @noindent
5421 Additionally we run the program @file{xgettext} on all source code file
5422 which contain translatable strings and that's it: we have a running
5423 program which does not depend on translations to be available, but which
5424 can use any that becomes available.
5425
5426 @cindex @code{N_}, a convenience macro
5427 The same procedure can be done for the @code{gettext_noop} invocations
5428 (@pxref{Special cases}).  One usually defines @code{gettext_noop} as a
5429 no-op macro.  So you should consider the following code for your project:
5430
5431 @example
5432 #define gettext_noop(String) String
5433 #define N_(String) gettext_noop (String)
5434 @end example
5435
5436 @code{N_} is a short form similar to @code{_}.  The @file{Makefile} in
5437 the @file{po/} directory of GNU @code{gettext} knows by default both of the
5438 mentioned short forms so you are invited to follow this proposal for
5439 your own ease.
5440
5441 Now to @code{catgets}.  The main problem is the work for the
5442 programmer.  Every time he comes to a translatable string he has to
5443 define a number (or a symbolic constant) which has also be defined in
5444 the message catalog file.  He also has to take care for duplicate
5445 entries, duplicate message IDs etc.  If he wants to have the same
5446 quality in the message catalog as the GNU @code{gettext} program
5447 provides he also has to put the descriptive comments for the strings and
5448 the location in all source code files in the message catalog.  This is
5449 nearly a Mission: Impossible.
5450
5451 But there are also some points people might call advantages speaking for
5452 @code{catgets}.  If you have a single word in a string and this string
5453 is used in different contexts it is likely that in one or the other
5454 language the word has different translations.  Example:
5455
5456 @example
5457 printf ("%s: %d", gettext ("number"), number_of_errors)
5458
5459 printf ("you should see %d %s", number_count,
5460         number_count == 1 ? gettext ("number") : gettext ("numbers"))
5461 @end example
5462
5463 Here we have to translate two times the string @code{"number"}.  Even
5464 if you do not speak a language beside English it might be possible to
5465 recognize that the two words have a different meaning.  In German the
5466 first appearance has to be translated to @code{"Anzahl"} and the second
5467 to @code{"Zahl"}.
5468
5469 Now you can say that this example is really esoteric.  And you are
5470 right!  This is exactly how we felt about this problem and decide that
5471 it does not weight that much.  The solution for the above problem could
5472 be very easy:
5473
5474 @example
5475 printf ("%s %d", gettext ("number:"), number_of_errors)
5476
5477 printf (number_count == 1 ? gettext ("you should see %d number")
5478                           : gettext ("you should see %d numbers"),
5479         number_count)
5480 @end example
5481
5482 We believe that we can solve all conflicts with this method.  If it is
5483 difficult one can also consider changing one of the conflicting string a
5484 little bit.  But it is not impossible to overcome.
5485
5486 @code{catgets} allows same original entry to have different translations,
5487 but @code{gettext} has another, scalable approach for solving ambiguities
5488 of this kind: @xref{Ambiguities}.
5489
5490 @node Using libintl.a, gettext grok, Comparison, Programmers
5491 @section Using libintl.a in own programs
5492
5493 Starting with version 0.9.4 the library @code{libintl.h} should be
5494 self-contained.  I.e., you can use it in your own programs without
5495 providing additional functions.  The @file{Makefile} will put the header
5496 and the library in directories selected using the @code{$(prefix)}.
5497
5498 @node gettext grok, Temp Programmers, Using libintl.a, Programmers
5499 @section Being a @code{gettext} grok
5500
5501 To fully exploit the functionality of the GNU @code{gettext} library it
5502 is surely helpful to read the source code.  But for those who don't want
5503 to spend that much time in reading the (sometimes complicated) code here
5504 is a list comments:
5505
5506 @itemize @bullet
5507 @item Changing the language at runtime
5508 @cindex language selection at runtime
5509
5510 For interactive programs it might be useful to offer a selection of the
5511 used language at runtime.  To understand how to do this one need to know
5512 how the used language is determined while executing the @code{gettext}
5513 function.  The method which is presented here only works correctly
5514 with the GNU implementation of the @code{gettext} functions.
5515
5516 In the function @code{dcgettext} at every call the current setting of
5517 the highest priority environment variable is determined and used.
5518 Highest priority means here the following list with decreasing
5519 priority:
5520
5521 @enumerate
5522 @vindex LANGUAGE@r{, environment variable}
5523 @item @code{LANGUAGE}
5524 @vindex LC_ALL@r{, environment variable}
5525 @item @code{LC_ALL}
5526 @vindex LC_CTYPE@r{, environment variable}
5527 @vindex LC_NUMERIC@r{, environment variable}
5528 @vindex LC_TIME@r{, environment variable}
5529 @vindex LC_COLLATE@r{, environment variable}
5530 @vindex LC_MONETARY@r{, environment variable}
5531 @vindex LC_MESSAGES@r{, environment variable}
5532 @item @code{LC_xxx}, according to selected locale
5533 @vindex LANG@r{, environment variable}
5534 @item @code{LANG}
5535 @end enumerate
5536
5537 Afterwards the path is constructed using the found value and the
5538 translation file is loaded if available.
5539
5540 What happens now when the value for, say, @code{LANGUAGE} changes?  According
5541 to the process explained above the new value of this variable is found
5542 as soon as the @code{dcgettext} function is called.  But this also means
5543 the (perhaps) different message catalog file is loaded.  In other
5544 words: the used language is changed.
5545
5546 But there is one little hook.  The code for gcc-2.7.0 and up provides
5547 some optimization.  This optimization normally prevents the calling of
5548 the @code{dcgettext} function as long as no new catalog is loaded.  But
5549 if @code{dcgettext} is not called the program also cannot find the
5550 @code{LANGUAGE} variable be changed (@pxref{Optimized gettext}).  A
5551 solution for this is very easy.  Include the following code in the
5552 language switching function.
5553
5554 @example
5555   /* Change language.  */
5556   setenv ("LANGUAGE", "fr", 1);
5557
5558   /* Make change known.  */
5559   @{
5560     extern int  _nl_msg_cat_cntr;
5561     ++_nl_msg_cat_cntr;
5562   @}
5563 @end example
5564
5565 @cindex @code{_nl_msg_cat_cntr}
5566 The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.
5567 You don't need to know what this is for.  But it can be used to detect
5568 whether a @code{gettext} implementation is GNU gettext and not non-GNU
5569 system's native gettext implementation.
5570
5571 @end itemize
5572
5573 @node Temp Programmers,  , gettext grok, Programmers
5574 @section Temporary Notes for the Programmers Chapter
5575
5576 @menu
5577 * Temp Implementations::        Temporary - Two Possible Implementations
5578 * Temp catgets::                Temporary - About @code{catgets}
5579 * Temp WSI::                    Temporary - Why a single implementation
5580 * Temp Notes::                  Temporary - Notes
5581 @end menu
5582
5583 @node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers
5584 @subsection Temporary - Two Possible Implementations
5585
5586 There are two competing methods for language independent messages:
5587 the X/Open @code{catgets} method, and the Uniforum @code{gettext}
5588 method.  The @code{catgets} method indexes messages by integers; the
5589 @code{gettext} method indexes them by their English translations.
5590 The @code{catgets} method has been around longer and is supported
5591 by more vendors.  The @code{gettext} method is supported by Sun,
5592 and it has been heard that the COSE multi-vendor initiative is
5593 supporting it.  Neither method is a POSIX standard; the POSIX.1
5594 committee had a lot of disagreement in this area.
5595
5596 Neither one is in the POSIX standard.  There was much disagreement
5597 in the POSIX.1 committee about using the @code{gettext} routines
5598 vs. @code{catgets} (XPG).  In the end the committee couldn't
5599 agree on anything, so no messaging system was included as part
5600 of the standard.  I believe the informative annex of the standard
5601 includes the XPG3 messaging interfaces, ``@dots{}as an example of
5602 a messaging system that has been implemented@dots{}''
5603
5604 They were very careful not to say anywhere that you should use one
5605 set of interfaces over the other.  For more on this topic please
5606 see the Programming for Internationalization FAQ.
5607
5608 @node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers
5609 @subsection Temporary - About @code{catgets}
5610
5611 There have been a few discussions of late on the use of
5612 @code{catgets} as a base.  I think it important to present both
5613 sides of the argument and hence am opting to play devil's advocate
5614 for a little bit.
5615
5616 I'll not deny the fact that @code{catgets} could have been designed
5617 a lot better.  It currently has quite a number of limitations and
5618 these have already been pointed out.
5619
5620 However there is a great deal to be said for consistency and
5621 standardization.  A common recurring problem when writing Unix
5622 software is the myriad portability problems across Unix platforms.
5623 It seems as if every Unix vendor had a look at the operating system
5624 and found parts they could improve upon.  Undoubtedly, these
5625 modifications are probably innovative and solve real problems.
5626 However, software developers have a hard time keeping up with all
5627 these changes across so many platforms.
5628
5629 And this has prompted the Unix vendors to begin to standardize their
5630 systems.  Hence the impetus for Spec1170.  Every major Unix vendor
5631 has committed to supporting this standard and every Unix software
5632 developer waits with glee the day they can write software to this
5633 standard and simply recompile (without having to use autoconf)
5634 across different platforms.
5635
5636 As I understand it, Spec1170 is roughly based upon version 4 of the
5637 X/Open Portability Guidelines (XPG4).  Because @code{catgets} and
5638 friends are defined in XPG4, I'm led to believe that @code{catgets}
5639 is a part of Spec1170 and hence will become a standardized component
5640 of all Unix systems.
5641
5642 @node Temp WSI, Temp Notes, Temp catgets, Temp Programmers
5643 @subsection Temporary - Why a single implementation
5644
5645 Now it seems kind of wasteful to me to have two different systems
5646 installed for accessing message catalogs.  If we do want to remedy
5647 @code{catgets} deficiencies why don't we try to expand @code{catgets}
5648 (in a compatible manner) rather than implement an entirely new system.
5649 Otherwise, we'll end up with two message catalog access systems installed
5650 with an operating system - one set of routines for packages using GNU
5651 @code{gettext} for their internationalization, and another set of routines
5652 (catgets) for all other software.  Bloated?
5653
5654 Supposing another catalog access system is implemented.  Which do
5655 we recommend?  At least for Linux, we need to attract as many
5656 software developers as possible.  Hence we need to make it as easy
5657 for them to port their software as possible.  Which means supporting
5658 @code{catgets}.  We will be implementing the @code{libintl} code
5659 within our @code{libc}, but does this mean we also have to incorporate
5660 another message catalog access scheme within our @code{libc} as well?
5661 And what about people who are going to be using the @code{libintl}
5662 + non-@code{catgets} routines.  When they port their software to
5663 other platforms, they're now going to have to include the front-end
5664 (@code{libintl}) code plus the back-end code (the non-@code{catgets}
5665 access routines) with their software instead of just including the
5666 @code{libintl} code with their software.
5667
5668 Message catalog support is however only the tip of the iceberg.
5669 What about the data for the other locale categories.  They also have
5670 a number of deficiencies.  Are we going to abandon them as well and
5671 develop another duplicate set of routines (should @code{libintl}
5672 expand beyond message catalog support)?
5673
5674 Like many parts of Unix that can be improved upon, we're stuck with balancing
5675 compatibility with the past with useful improvements and innovations for
5676 the future.
5677
5678 @node Temp Notes,  , Temp WSI, Temp Programmers
5679 @subsection Temporary - Notes
5680
5681 X/Open agreed very late on the standard form so that many
5682 implementations differ from the final form.  Both of my system (old
5683 Linux catgets and Ultrix-4) have a strange variation.
5684
5685 OK.  After incorporating the last changes I have to spend some time on
5686 making the GNU/Linux @code{libc} @code{gettext} functions.  So in future
5687 Solaris is not the only system having @code{gettext}.
5688
5689 @node Translators, Maintainers, Programmers, Top
5690 @chapter The Translator's View
5691
5692 @c FIXME: Reorganize whole chapter.
5693
5694 @menu
5695 * Trans Intro 0::               Introduction 0
5696 * Trans Intro 1::               Introduction 1
5697 * Discussions::                 Discussions
5698 * Organization::                Organization
5699 * Information Flow::            Information Flow
5700 * Prioritizing messages::       How to find which messages to translate first
5701 @end menu
5702
5703 @node Trans Intro 0, Trans Intro 1, Translators, Translators
5704 @section Introduction 0
5705
5706 Free software is going international!  The Translation Project is a way
5707 to get maintainers, translators and users all together, so free software
5708 will gradually become able to speak many native languages.
5709
5710 The GNU @code{gettext} tool set contains @emph{everything} maintainers
5711 need for internationalizing their packages for messages.  It also
5712 contains quite useful tools for helping translators at localizing
5713 messages to their native language, once a package has already been
5714 internationalized.
5715
5716 To achieve the Translation Project, we need many interested
5717 people who like their own language and write it well, and who are also
5718 able to synergize with other translators speaking the same language.
5719 If you'd like to volunteer to @emph{work} at translating messages,
5720 please send mail to your translating team.
5721
5722 Each team has its own mailing list, courtesy of Linux
5723 International.  You may reach your translating team at the address
5724 @file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}
5725 code for your language.  Language codes are @emph{not} the same as
5726 country codes given in @w{ISO 3166}.  The following translating teams
5727 exist:
5728
5729 @quotation
5730 Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},
5731 Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish
5732 @code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},
5733 Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish
5734 @code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},
5735 Swedish @code{sv} and Turkish @code{tr}.
5736 @end quotation
5737
5738 @noindent
5739 For example, you may reach the Chinese translating team by writing to
5740 @file{zh@@li.org}.  When you become a member of the translating team
5741 for your own language, you may subscribe to its list.  For example,
5742 Swedish people can send a message to @w{@file{sv-request@@li.org}},
5743 having this message body:
5744
5745 @example
5746 subscribe
5747 @end example
5748
5749 Keep in mind that team members should be interested in @emph{working}
5750 at translations, or at solving translational difficulties, rather than
5751 merely lurking around.  If your team does not exist yet and you want to
5752 start one, please write to @w{@file{translation@@iro.umontreal.ca}};
5753 you will then reach the coordinator for all translator teams.
5754
5755 A handful of GNU packages have already been adapted and provided
5756 with message translations for several languages.  Translation
5757 teams have begun to organize, using these packages as a starting
5758 point.  But there are many more packages and many languages for
5759 which we have no volunteer translators.  If you would like to
5760 volunteer to work at translating messages, please send mail to
5761 @file{translation@@iro.umontreal.ca} indicating what language(s)
5762 you can work on.
5763
5764 @node Trans Intro 1, Discussions, Trans Intro 0, Translators
5765 @section Introduction 1
5766
5767 This is now official, GNU is going international!  Here is the
5768 announcement submitted for the January 1995 GNU Bulletin:
5769
5770 @quotation
5771 A handful of GNU packages have already been adapted and provided
5772 with message translations for several languages.  Translation
5773 teams have begun to organize, using these packages as a starting
5774 point.  But there are many more packages and many languages
5775 for which we have no volunteer translators.  If you'd like to
5776 volunteer to work at translating messages, please send mail to
5777 @samp{translation@@iro.umontreal.ca} indicating what language(s)
5778 you can work on.
5779 @end quotation
5780
5781 This document should answer many questions for those who are curious about
5782 the process or would like to contribute.  Please at least skim over it,
5783 hoping to cut down a little of the high volume of e-mail generated by this
5784 collective effort towards internationalization of free software.
5785
5786 Most free programming which is widely shared is done in English, and
5787 currently, English is used as the main communicating language between
5788 national communities collaborating to free software.  This very document
5789 is written in English.  This will not change in the foreseeable future.
5790
5791 However, there is a strong appetite from national communities for
5792 having more software able to write using national language and habits,
5793 and there is an on-going effort to modify free software in such a way
5794 that it becomes able to do so.  The experiments driven so far raised
5795 an enthusiastic response from pretesters, so we believe that
5796 internationalization of free software is dedicated to succeed.
5797
5798 For suggestion clarifications, additions or corrections to this
5799 document, please e-mail to @file{translation@@iro.umontreal.ca}.
5800
5801 @node Discussions, Organization, Trans Intro 1, Translators
5802 @section Discussions
5803
5804 Facing this internationalization effort, a few users expressed their
5805 concerns.  Some of these doubts are presented and discussed, here.
5806
5807 @itemize @bullet
5808 @item Smaller groups
5809
5810 Some languages are not spoken by a very large number of people, so people
5811 speaking them sometimes consider that there may not be all that much
5812 demand such versions of free software packages.  Moreover, many people
5813 being @emph{into computers}, in some countries, generally seem to prefer
5814 English versions of their software.
5815
5816 On the other end, people might enjoy their own language a lot, and be
5817 very motivated at providing to themselves the pleasure of having their
5818 beloved free software speaking their mother tongue.  They do themselves
5819 a personal favor, and do not pay that much attention to the number of
5820 people benefiting of their work.
5821
5822 @item Misinterpretation
5823
5824 Other users are shy to push forward their own language, seeing in this
5825 some kind of misplaced propaganda.  Someone thought there must be some
5826 users of the language over the networks pestering other people with it.
5827
5828 But any spoken language is worth localization, because there are
5829 people behind the language for whom the language is important and
5830 dear to their hearts.
5831
5832 @item Odd translations
5833
5834 The biggest problem is to find the right translations so that
5835 everybody can understand the messages.  Translations are usually a
5836 little odd.  Some people get used to English, to the extent they may
5837 find translations into their own language ``rather pushy, obnoxious
5838 and sometimes even hilarious.''  As a French speaking man, I have
5839 the experience of those instruction manuals for goods, so poorly
5840 translated in French in Korea or Taiwan@dots{}
5841
5842 The fact is that we sometimes have to create a kind of national
5843 computer culture, and this is not easy without the collaboration of
5844 many people liking their mother tongue.  This is why translations are
5845 better achieved by people knowing and loving their own language, and
5846 ready to work together at improving the results they obtain.
5847
5848 @item Dependencies over the GPL or LGPL
5849
5850 Some people wonder if using GNU @code{gettext} necessarily brings their
5851 package under the protective wing of the GNU General Public License or
5852 the GNU Library General Public License, when they do not want to make
5853 their program free, or want other kinds of freedom.  The simplest
5854 answer is ``normally not''.
5855
5856 The @code{gettext-runtime} part of GNU @code{gettext}, i.e. the
5857 contents of @code{libintl}, is covered by the GNU Library General Public
5858 License.  The @code{gettext-tools} part of GNU @code{gettext}, i.e. the
5859 rest of the GNU @code{gettext} package, is covered by the GNU General
5860 Public License.
5861
5862 The mere marking of localizable strings in a package, or conditional
5863 inclusion of a few lines for initialization, is not really including
5864 GPL'ed or LGPL'ed code.  However, since the localization routines in
5865 @code{libintl} are under the LGPL, the LGPL needs to be considered.
5866 It gives the right to distribute the complete unmodified source of
5867 @code{libintl} even with non-free programs.  It also gives the right
5868 to use @code{libintl} as a shared library, even for non-free programs.
5869 But it gives the right to use @code{libintl} as a static library or
5870 to incorporate @code{libintl} into another library only to free
5871 software.
5872
5873 @end itemize
5874
5875 @node Organization, Information Flow, Discussions, Translators
5876 @section Organization
5877
5878 On a larger scale, the true solution would be to organize some kind of
5879 fairly precise set up in which volunteers could participate.  I gave
5880 some thought to this idea lately, and realize there will be some
5881 touchy points.  I thought of writing to Richard Stallman to launch
5882 such a project, but feel it might be good to shake out the ideas
5883 between ourselves first.  Most probably that Linux International has
5884 some experience in the field already, or would like to orchestrate
5885 the volunteer work, maybe.  Food for thought, in any case!
5886
5887 I guess we have to setup something early, somehow, that will help
5888 many possible contributors of the same language to interlock and avoid
5889 work duplication, and further be put in contact for solving together
5890 problems particular to their tongue (in most languages, there are many
5891 difficulties peculiar to translating technical English).  My Swedish
5892 contributor acknowledged these difficulties, and I'm well aware of
5893 them for French.
5894
5895 This is surely not a technical issue, but we should manage so the
5896 effort of locale contributors be maximally useful, despite the national
5897 team layer interface between contributors and maintainers.
5898
5899 The Translation Project needs some setup for coordinating language
5900 coordinators.  Localizing evolving programs will surely
5901 become a permanent and continuous activity in the free software community,
5902 once well started.
5903 The setup should be minimally completed and tested before GNU
5904 @code{gettext} becomes an official reality.  The e-mail address
5905 @file{translation@@iro.umontreal.ca} has been setup for receiving
5906 offers from volunteers and general e-mail on these topics.  This address
5907 reaches the Translation Project coordinator.
5908
5909 @menu
5910 * Central Coordination::        Central Coordination
5911 * National Teams::              National Teams
5912 * Mailing Lists::               Mailing Lists
5913 @end menu
5914
5915 @node Central Coordination, National Teams, Organization, Organization
5916 @subsection Central Coordination
5917
5918 I also think GNU will need sooner than it thinks, that someone setup
5919 a way to organize and coordinate these groups.  Some kind of group
5920 of groups.  My opinion is that it would be good that GNU delegates
5921 this task to a small group of collaborating volunteers, shortly.
5922 Perhaps in @file{gnu.announce} a list of this national committee's
5923 can be published.
5924
5925 My role as coordinator would simply be to refer to Ulrich any German
5926 speaking volunteer interested to localization of free software packages, and
5927 maybe helping national groups to initially organize, while maintaining
5928 national registries for until national groups are ready to take over.
5929 In fact, the coordinator should ease volunteers to get in contact with
5930 one another for creating national teams, which should then select
5931 one coordinator per language, or country (regionalized language).
5932 If well done, the coordination should be useful without being an
5933 overwhelming task, the time to put delegations in place.
5934
5935 @node National Teams, Mailing Lists, Central Coordination, Organization
5936 @subsection National Teams
5937
5938 I suggest we look for volunteer coordinators/editors for individual
5939 languages.  These people will scan contributions of translation files
5940 for various programs, for their own languages, and will ensure high
5941 and uniform standards of diction.
5942
5943 From my current experience with other people in these days, those who
5944 provide localizations are very enthusiastic about the process, and are
5945 more interested in the localization process than in the program they
5946 localize, and want to do many programs, not just one.  This seems
5947 to confirm that having a coordinator/editor for each language is a
5948 good idea.
5949
5950 We need to choose someone who is good at writing clear and concise
5951 prose in the language in question.  That is hard---we can't check
5952 it ourselves.  So we need to ask a few people to judge each others'
5953 writing and select the one who is best.
5954
5955 I announce my prerelease to a few dozen people, and you would not
5956 believe all the discussions it generated already.  I shudder to think
5957 what will happen when this will be launched, for true, officially,
5958 world wide.  Who am I to arbitrate between two Czekolsovak users
5959 contradicting each other, for example?
5960
5961 I assume that your German is not much better than my French so that
5962 I would not be able to judge about these formulations.  What I would
5963 suggest is that for each language there is a group for people who
5964 maintain the PO files and judge about changes.  I suspect there will
5965 be cultural differences between how such groups of people will behave.
5966 Some will have relaxed ways, reach consensus easily, and have anyone
5967 of the group relate to the maintainers, while others will fight to
5968 death, organize heavy administrations up to national standards, and
5969 use strict channels.
5970
5971 The German team is putting out a good example.  Right now, they are
5972 maybe half a dozen people revising translations of each other and
5973 discussing the linguistic issues.  I do not even have all the names.
5974 Ulrich Drepper is taking care of coordinating the German team.
5975 He subscribed to all my pretest lists, so I do not even have to warn
5976 him specifically of incoming releases.
5977
5978 I'm sure, that is a good idea to get teams for each language working
5979 on translations.  That will make the translations better and more
5980 consistent.
5981
5982 @menu
5983 * Sub-Cultures::                Sub-Cultures
5984 * Organizational Ideas::        Organizational Ideas
5985 @end menu
5986
5987 @node Sub-Cultures, Organizational Ideas, National Teams, National Teams
5988 @subsubsection Sub-Cultures
5989
5990 Taking French for example, there are a few sub-cultures around computers
5991 which developed diverging vocabularies.  Picking volunteers here and
5992 there without addressing this problem in an organized way, soon in the
5993 project, might produce a distasteful mix of internationalized programs,
5994 and possibly trigger endless quarrels among those who really care.
5995
5996 Keeping some kind of unity in the way French localization of
5997 internationalized programs is achieved is a difficult (and delicate) job.
5998 Knowing the latin character of French people (:-), if we take this
5999 the wrong way, we could end up nowhere, or spoil a lot of energies.
6000 Maybe we should begin to address this problem seriously @emph{before}
6001 GNU @code{gettext} become officially published.  And I suspect that this
6002 means soon!
6003
6004 @node Organizational Ideas,  , Sub-Cultures, National Teams
6005 @subsubsection Organizational Ideas
6006
6007 I expect the next big changes after the official release.  Please note
6008 that I use the German translation of the short GPL message.  We need
6009 to set a few good examples before the localization goes out for true
6010 in the free software community.  Here are a few points to discuss:
6011
6012 @itemize @bullet
6013 @item
6014 Each group should have one FTP server (at least one master).
6015
6016 @item
6017 The files on the server should reflect the latest version (of
6018 course!) and it should also contain a RCS directory with the
6019 corresponding archives (I don't have this now).
6020
6021 @item
6022 There should also be a ChangeLog file (this is more useful than the
6023 RCS archive but can be generated automatically from the later by
6024 Emacs).
6025
6026 @item
6027 A @dfn{core group} should judge about questionable changes (for now
6028 this group consists solely by me but I ask some others occasionally;
6029 this also seems to work).
6030
6031 @end itemize
6032
6033 @node Mailing Lists,  , National Teams, Organization
6034 @subsection Mailing Lists
6035
6036 If we get any inquiries about GNU @code{gettext}, send them on to:
6037
6038 @example
6039 @file{translation@@iro.umontreal.ca}
6040 @end example
6041
6042 The @file{*-pretest} lists are quite useful to me, maybe the idea could
6043 be generalized to many GNU, and non-GNU packages.  But each maintainer
6044 his/her way!
6045
6046 Fran@,{c}ois, we have a mechanism in place here at
6047 @file{gnu.ai.mit.edu} to track teams, support mailing lists for
6048 them and log members.  We have a slight preference that you use it.
6049 If this is OK with you, I can get you clued in.
6050
6051 Things are changing!  A few years ago, when Daniel Fekete and I
6052 asked for a mailing list for GNU localization, nested at the FSF, we
6053 were politely invited to organize it anywhere else, and so did we.
6054 For communicating with my pretesters, I later made a handful of
6055 mailing lists located at iro.umontreal.ca and administrated by
6056 @code{majordomo}.  These lists have been @emph{very} dependable
6057 so far@dots{}
6058
6059 I suspect that the German team will organize itself a mailing list
6060 located in Germany, and so forth for other countries.  But before they
6061 organize for true, it could surely be useful to offer mailing lists
6062 located at the FSF to each national team.  So yes, please explain me
6063 how I should proceed to create and handle them.
6064
6065 We should create temporary mailing lists, one per country, to help
6066 people organize.  Temporary, because once regrouped and structured, it
6067 would be fair the volunteers from country bring back @emph{their} list
6068 in there and manage it as they want.  My feeling is that, in the long
6069 run, each team should run its own list, from within their country.
6070 There also should be some central list to which all teams could
6071 subscribe as they see fit, as long as each team is represented in it.
6072
6073 @node Information Flow, Prioritizing messages, Organization, Translators
6074 @section Information Flow
6075
6076 There will surely be some discussion about this messages after the
6077 packages are finally released.  If people now send you some proposals
6078 for better messages, how do you proceed?  Jim, please note that
6079 right now, as I put forward nearly a dozen of localizable programs, I
6080 receive both the translations and the coordination concerns about them.
6081
6082 If I put one of my things to pretest, Ulrich receives the announcement
6083 and passes it on to the German team, who make last minute revisions.
6084 Then he submits the translation files to me @emph{as the maintainer}.
6085 For free packages I do not maintain, I would not even hear about it.
6086 This scheme could be made to work for the whole Translation Project,
6087 I think.  For security reasons, maybe Ulrich (national coordinators,
6088 in fact) should update central registry kept at the Translation Project
6089 (Jim, me, or Len's recruits) once in a while.
6090
6091 In December/January, I was aggressively ready to internationalize
6092 all of GNU, giving myself the duty of one small GNU package per week
6093 or so, taking many weeks or months for bigger packages.  But it does
6094 not work this way.  I first did all the things I'm responsible for.
6095 I've nothing against some missionary work on other maintainers, but
6096 I'm also loosing a lot of energy over it---same debates over again.
6097
6098 And when the first localized packages are released we'll get a lot of
6099 responses about ugly translations :-).  Surely, and we need to have
6100 beforehand a fairly good idea about how to handle the information
6101 flow between the national teams and the package maintainers.
6102
6103 Please start saving somewhere a quick history of each PO file.  I know
6104 for sure that the file format will change, allowing for comments.
6105 It would be nice that each file has a kind of log, and references for
6106 those who want to submit comments or gripes, or otherwise contribute.
6107 I sent a proposal for a fast and flexible format, but it is not
6108 receiving acceptance yet by the GNU deciders.  I'll tell you when I
6109 have more information about this.
6110
6111 @node Prioritizing messages,  , Information Flow, Translators
6112 @section Prioritizing messages: How to determine which messages to translate first
6113
6114 A translator sometimes has only a limited amount of time per week to
6115 spend on a package, and some packages have quite large message catalogs
6116 (over 1000 messages).  Therefore she wishes to translate the messages
6117 first that are the most visible to the user, or that occur most frequently.
6118 This section describes how to determine these "most urgent" messages.
6119 It also applies to determine the "next most urgent" messages after the
6120 message catalog has already been partially translated.
6121
6122 In a first step, she uses the programs like a user would do.  While she
6123 does this, the GNU @code{gettext} library logs into a file the not yet
6124 translated messages for which a translation was requested from the program.
6125
6126 In a second step, she uses the PO mode to translate precisely this set
6127 of messages.
6128
6129 @vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable}
6130 Here a more details.  The GNU @code{libintl} library (but not the
6131 corresponding functions in GNU @code{libc}) supports an environment variable
6132 @code{GETTEXT_LOG_UNTRANSLATED}.  The GNU @code{libintl} library will
6133 log into this file the messages for which @code{gettext()} and related
6134 functions couldn't find the translation.  If the file doesn't exist, it
6135 will be created as needed.  On systems with GNU @code{libc} a shared library
6136 @samp{preloadable_libintl.so} is provided that can be used with the ELF
6137 @samp{LD_PRELOAD} mechanism.
6138
6139 So, in the first step, the translator uses these commands on systems with
6140 GNU @code{libc}:
6141
6142 @smallexample
6143 $ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
6144 $ export LD_PRELOAD
6145 $ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
6146 $ export GETTEXT_LOG_UNTRANSLATED
6147 @end smallexample
6148
6149 @noindent
6150 and these commands on other systems:
6151
6152 @smallexample
6153 $ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
6154 $ export GETTEXT_LOG_UNTRANSLATED
6155 @end smallexample
6156
6157 Then she uses and peruses the programs.  (It is a good and recommended
6158 practice to use the programs for which you provide translations: it
6159 gives you the needed context.)  When done, she removes the environment
6160 variables:
6161
6162 @smallexample
6163 $ unset LD_PRELOAD
6164 $ unset GETTEXT_LOG_UNTRANSLATED
6165 @end smallexample
6166
6167 The second step starts with removing duplicates:
6168
6169 @smallexample
6170 $ msguniq $HOME/gettextlogused > missing.po
6171 @end smallexample
6172
6173 The result is a PO file, but needs some preprocessing before the Emacs PO
6174 mode can be used with it.  First, it is a multi-domain PO file, containing
6175 messages from many translation domains.  Second, it lacks all translator
6176 comments and source references.  Here is how to get a list of the affected
6177 translation domains:
6178
6179 @smallexample
6180 $ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq
6181 @end smallexample
6182
6183 Then the translator can handle the domains one by one.  For simplicity,
6184 let's use environment variables to denote the language, domain and source
6185 package.
6186
6187 @smallexample
6188 $ lang=nl             # your language
6189 $ domain=coreutils    # the name of the domain to be handled
6190 $ package=/usr/src/gnu/coreutils-4.5.4   # the package where it comes from
6191 @end smallexample
6192
6193 She takes the latest copy of @file{$lang.po} from the Translation Project,
6194 or from the package (in most cases, @file{$package/po/$lang.po}), or
6195 creates a fresh one if she's the first translator (see @ref{Creating}).
6196 She then uses the following commands to mark the not urgent messages as
6197 "obsolete".  (This doesn't mean that these messages - translated and
6198 untranslated ones - will go away.  It simply means that Emacs PO mode
6199 will ignore them in the following editing session.)
6200
6201 @smallexample
6202 $ msggrep --domain=$domain missing.po | grep -v '^domain' \
6203   > $domain-missing.po
6204 $ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
6205   > $domain.$lang-urgent.po
6206 @end smallexample
6207
6208 The she translates @file{$domain.$lang-urgent.po} by use of Emacs PO mode.
6209 (FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also
6210 preserve obsolete messages, as they should.)
6211 Finally she restores the not urgent messages (with their earlier
6212 translations, for those which were already translated) through this command:
6213
6214 @smallexample
6215 $ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
6216   > $domain.$lang.po
6217 @end smallexample
6218
6219 Then she can submit @file{$domain.$lang.po} and proceed to the next domain.
6220
6221 @node Maintainers, Programming Languages, Translators, Top
6222 @chapter The Maintainer's View
6223 @cindex package maintainer's view of @code{gettext}
6224
6225 The maintainer of a package has many responsibilities.  One of them
6226 is ensuring that the package will install easily on many platforms,
6227 and that the magic we described earlier (@pxref{Users}) will work
6228 for installers and end users.
6229
6230 Of course, there are many possible ways by which GNU @code{gettext}
6231 might be integrated in a distribution, and this chapter does not cover
6232 them in all generality.  Instead, it details one possible approach which
6233 is especially adequate for many free software distributions following GNU
6234 standards, or even better, Gnits standards, because GNU @code{gettext}
6235 is purposely for helping the internationalization of the whole GNU
6236 project, and as many other good free packages as possible.  So, the
6237 maintainer's view presented here presumes that the package already has
6238 a @file{configure.in} file and uses GNU Autoconf.
6239
6240 Nevertheless, GNU @code{gettext} may surely be useful for free packages
6241 not following GNU standards and conventions, but the maintainers of such
6242 packages might have to show imagination and initiative in organizing
6243 their distributions so @code{gettext} work for them in all situations.
6244 There are surely many, out there.
6245
6246 Even if @code{gettext} methods are now stabilizing, slight adjustments
6247 might be needed between successive @code{gettext} versions, so you
6248 should ideally revise this chapter in subsequent releases, looking
6249 for changes.
6250
6251 @menu
6252 * Flat and Non-Flat::           Flat or Non-Flat Directory Structures
6253 * Prerequisites::               Prerequisite Works
6254 * gettextize Invocation::       Invoking the @code{gettextize} Program
6255 * Adjusting Files::             Files You Must Create or Alter
6256 * autoconf macros::             Autoconf macros for use in @file{configure.in}
6257 * CVS Issues::                  Integrating with CVS
6258 * Release Management::          Creating a Distribution Tarball
6259 @end menu
6260
6261 @node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers
6262 @section Flat or Non-Flat Directory Structures
6263
6264 Some free software packages are distributed as @code{tar} files which unpack
6265 in a single directory, these are said to be @dfn{flat} distributions.
6266 Other free software packages have a one level hierarchy of subdirectories, using
6267 for example a subdirectory named @file{doc/} for the Texinfo manual and
6268 man pages, another called @file{lib/} for holding functions meant to
6269 replace or complement C libraries, and a subdirectory @file{src/} for
6270 holding the proper sources for the package.  These other distributions
6271 are said to be @dfn{non-flat}.
6272
6273 We cannot say much about flat distributions.  A flat
6274 directory structure has the disadvantage of increasing the difficulty
6275 of updating to a new version of GNU @code{gettext}.  Also, if you have
6276 many PO files, this could somewhat pollute your single directory.
6277 Also, GNU @code{gettext}'s libintl sources consist of C sources, shell
6278 scripts, @code{sed} scripts and complicated Makefile rules, which don't
6279 fit well into an existing flat structure.  For these reasons, we
6280 recommend to use non-flat approach in this case as well.
6281
6282 Maybe because GNU @code{gettext} itself has a non-flat structure,
6283 we have more experience with this approach, and this is what will be
6284 described in the remaining of this chapter.  Some maintainers might
6285 use this as an opportunity to unflatten their package structure.
6286
6287 @node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers
6288 @section Prerequisite Works
6289 @cindex converting a package to use @code{gettext}
6290 @cindex migration from earlier versions of @code{gettext}
6291 @cindex upgrading to new versions of @code{gettext}
6292
6293 There are some works which are required for using GNU @code{gettext}
6294 in one of your package.  These works have some kind of generality
6295 that escape the point by point descriptions used in the remainder
6296 of this chapter.  So, we describe them here.
6297
6298 @itemize @bullet
6299 @item
6300 Before attempting to use @code{gettextize} you should install some
6301 other packages first.
6302 Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU
6303 @code{gettext} are already installed at your site, and if not, proceed
6304 to do this first.  If you get to install these things, beware that
6305 GNU @code{m4} must be fully installed before GNU Autoconf is even
6306 @emph{configured}.
6307
6308 To further ease the task of a package maintainer the @code{automake}
6309 package was designed and implemented.  GNU @code{gettext} now uses this
6310 tool and the @file{Makefile}s in the @file{intl/} and @file{po/}
6311 therefore know about all the goals necessary for using @code{automake}
6312 and @file{libintl} in one project.
6313
6314 Those four packages are only needed by you, as a maintainer; the
6315 installers of your own package and end users do not really need any of
6316 GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}
6317 for successfully installing and running your package, with messages
6318 properly translated.  But this is not completely true if you provide
6319 internationalized shell scripts within your own package: GNU
6320 @code{gettext} shall then be installed at the user site if the end users
6321 want to see the translation of shell script messages.
6322
6323 @item
6324 Your package should use Autoconf and have a @file{configure.in} or
6325 @file{configure.ac} file.
6326 If it does not, you have to learn how.  The Autoconf documentation
6327 is quite well written, it is a good idea that you print it and get
6328 familiar with it.
6329
6330 @item
6331 Your C sources should have already been modified according to
6332 instructions given earlier in this manual.  @xref{Sources}.
6333
6334 @item
6335 Your @file{po/} directory should receive all PO files submitted to you
6336 by the translator teams, each having @file{@var{ll}.po} as a name.
6337 This is not usually easy to get translation
6338 work done before your package gets internationalized and available!
6339 Since the cycle has to start somewhere, the easiest for the maintainer
6340 is to start with absolutely no PO files, and wait until various
6341 translator teams get interested in your package, and submit PO files.
6342
6343 @end itemize
6344
6345 It is worth adding here a few words about how the maintainer should
6346 ideally behave with PO files submissions.  As a maintainer, your role is
6347 to authenticate the origin of the submission as being the representative
6348 of the appropriate translating teams of the Translation Project (forward
6349 the submission to @file{translation@@iro.umontreal.ca} in case of doubt),
6350 to ensure that the PO file format is not severely broken and does not
6351 prevent successful installation, and for the rest, to merely put these
6352 PO files in @file{po/} for distribution.
6353
6354 As a maintainer, you do not have to take on your shoulders the
6355 responsibility of checking if the translations are adequate or
6356 complete, and should avoid diving into linguistic matters.  Translation
6357 teams drive themselves and are fully responsible of their linguistic
6358 choices for the Translation Project.  Keep in mind that translator teams are @emph{not}
6359 driven by maintainers.  You can help by carefully redirecting all
6360 communications and reports from users about linguistic matters to the
6361 appropriate translation team, or explain users how to reach or join
6362 their team.  The simplest might be to send them the @file{ABOUT-NLS} file.
6363
6364 Maintainers should @emph{never ever} apply PO file bug reports
6365 themselves, short-cutting translation teams.  If some translator has
6366 difficulty to get some of her points through her team, it should not be
6367 an option for her to directly negotiate translations with maintainers.
6368 Teams ought to settle their problems themselves, if any.  If you, as
6369 a maintainer, ever think there is a real problem with a team, please
6370 never try to @emph{solve} a team's problem on your own.
6371
6372 @node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers
6373 @section Invoking the @code{gettextize} Program
6374
6375 @include gettextize.texi
6376
6377 @node Adjusting Files, autoconf macros, gettextize Invocation, Maintainers
6378 @section Files You Must Create or Alter
6379 @cindex @code{gettext} files
6380
6381 Besides files which are automatically added through @code{gettextize},
6382 there are many files needing revision for properly interacting with
6383 GNU @code{gettext}.  If you are closely following GNU standards for
6384 Makefile engineering and auto-configuration, the adaptations should
6385 be easier to achieve.  Here is a point by point description of the
6386 changes needed in each.
6387
6388 So, here comes a list of files, each one followed by a description of
6389 all alterations it needs.  Many examples are taken out from the GNU
6390 @code{gettext} @value{VERSION} distribution itself, or from the GNU
6391 @code{hello} distribution (@uref{http://www.franken.de/users/gnu/ke/hello}
6392 or @uref{http://www.gnu.franken.de/ke/hello/})  You may indeed
6393 refer to the source code of the GNU @code{gettext} and GNU @code{hello}
6394 packages, as they are intended to be good examples for using GNU
6395 gettext functionality.
6396
6397 @menu
6398 * po/POTFILES.in::              @file{POTFILES.in} in @file{po/}
6399 * po/LINGUAS::                  @file{LINGUAS} in @file{po/}
6400 * po/Makevars::                 @file{Makevars} in @file{po/}
6401 * configure.in::                @file{configure.in} at top level
6402 * config.guess::                @file{config.guess}, @file{config.sub} at top level
6403 * mkinstalldirs::               @file{mkinstalldirs} at top level
6404 * aclocal::                     @file{aclocal.m4} at top level
6405 * acconfig::                    @file{acconfig.h} at top level
6406 * config.h.in::                 @file{config.h.in} at top level
6407 * Makefile::                    @file{Makefile.in} at top level
6408 * src/Makefile::                @file{Makefile.in} in @file{src/}
6409 * lib/gettext.h::               @file{gettext.h} in @file{lib/}
6410 @end menu
6411
6412 @node po/POTFILES.in, po/LINGUAS, Adjusting Files, Adjusting Files
6413 @subsection @file{POTFILES.in} in @file{po/}
6414 @cindex @file{POTFILES.in} file
6415
6416 The @file{po/} directory should receive a file named
6417 @file{POTFILES.in}.  This file tells which files, among all program
6418 sources, have marked strings needing translation.  Here is an example
6419 of such a file:
6420
6421 @example
6422 @group
6423 # List of source files containing translatable strings.
6424 # Copyright (C) 1995 Free Software Foundation, Inc.
6425
6426 # Common library files
6427 lib/error.c
6428 lib/getopt.c
6429 lib/xmalloc.c
6430
6431 # Package source files
6432 src/gettext.c
6433 src/msgfmt.c
6434 src/xgettext.c
6435 @end group
6436 @end example
6437
6438 @noindent
6439 Hash-marked comments and white lines are ignored.  All other lines
6440 list those source files containing strings marked for translation
6441 (@pxref{Mark Keywords}), in a notation relative to the top level
6442 of your whole distribution, rather than the location of the
6443 @file{POTFILES.in} file itself.
6444
6445 When a C file is automatically generated by a tool, like @code{flex} or
6446 @code{bison}, that doesn't introduce translatable strings by itself,
6447 it is recommended to list in @file{po/POTFILES.in} the real source file
6448 (ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the
6449 case of @code{bison}), not the generated C file.
6450
6451 @node po/LINGUAS, po/Makevars, po/POTFILES.in, Adjusting Files
6452 @subsection @file{LINGUAS} in @file{po/}
6453 @cindex @file{LINGUAS} file
6454
6455 The @file{po/} directory should also receive a file named
6456 @file{LINGUAS}.  This file contains the list of available translations.
6457 It is a whitespace separated list.  Hash-marked comments and white lines
6458 are ignored.  Here is an example file:
6459
6460 @example
6461 @group
6462 # Set of available languages.
6463 de fr
6464 @end group
6465 @end example
6466
6467 @noindent
6468 This example means that German and French PO files are available, so
6469 that these languages are currently supported by your package.  If you
6470 want to further restrict, at installation time, the set of installed
6471 languages, this should not be done by modifying the @file{LINGUAS} file,
6472 but rather by using the @code{LINGUAS} environment variable
6473 (@pxref{Installers}).
6474
6475 It is recommended that you add the "languages" @samp{en@@quot} and
6476 @samp{en@@boldquot} to the @code{LINGUAS} file.  @code{en@@quot} is a
6477 variant of English message catalogs (@code{en}) which uses real quotation
6478 marks instead of the ugly looking asymmetric ASCII substitutes @samp{`}
6479 and @samp{'}.  @code{en@@boldquot} is a variant of @code{en@@quot} that
6480 additionally outputs quoted pieces of text in a bold font, when used in
6481 a terminal emulator which supports the VT100 escape sequences (such as
6482 @code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode).
6483
6484 These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot}
6485 are constructed automatically, not by translators; to support them, you
6486 need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed},
6487 @file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin}
6488 in the @file{po/} directory.  You can copy them from GNU gettext's @file{po/}
6489 directory; they are also installed by running @code{gettextize}.
6490
6491 @node po/Makevars, configure.in, po/LINGUAS, Adjusting Files
6492 @subsection @file{Makevars} in @file{po/}
6493 @cindex @file{Makevars} file
6494
6495 The @file{po/} directory also has a file named @file{Makevars}.
6496 It can be left unmodified if your package has a single message domain
6497 and, accordingly, a single @file{po/} directory.  Only packages which
6498 have multiple @file{po/} directories at different locations need to
6499 adjust the three variables defined in @file{Makevars}.
6500
6501 @file{po/Makevars} gets inserted into the @file{po/Makefile} when the
6502 latter is created.  At the same time, all files called @file{Rules-*} in the
6503 @file{po/} directory get appended to the @file{po/Makefile}.  They present
6504 an opportunity to add rules for special PO files to the Makefile, without
6505 needing to mess with @file{po/Makefile.in.in}.
6506
6507 @cindex quotation marks
6508 @vindex LANGUAGE@r{, environment variable}
6509 GNU gettext comes with a @file{Rules-quot} file, containing rules for
6510 building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}.  The
6511 effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE}
6512 environment variable to @samp{en@@quot} will get messages with proper
6513 looking symmetric Unicode quotation marks instead of abusing the ASCII
6514 grave accent and the ASCII apostrophe for indicating quotations.  To
6515 enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS}
6516 file.  The effect of @file{en@@boldquot.po} is that people who set
6517 @code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation
6518 marks, but also the quoted text will be shown in a bold font on terminals
6519 and consoles.  This catalog is useful only for command-line programs, not
6520 GUI programs.  To enable it, similarly add @code{en@@boldquot} to the
6521 @file{po/LINGUAS} file.
6522
6523 @node configure.in, config.guess, po/Makevars, Adjusting Files
6524 @subsection @file{configure.in} at top level
6525
6526 @file{configure.in} or @file{configure.ac} - this is the source from which
6527 @code{autoconf} generates the @file{configure} script.
6528
6529 @enumerate
6530 @item Declare the package and version.
6531 @cindex package and version declaration in @file{configure.in}
6532
6533 This is done by a set of lines like these:
6534
6535 @example
6536 PACKAGE=gettext
6537 VERSION=@value{VERSION}
6538 AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
6539 AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
6540 AC_SUBST(PACKAGE)
6541 AC_SUBST(VERSION)
6542 @end example
6543
6544 @noindent
6545 or, if you are using GNU @code{automake}, by a line like this:
6546
6547 @example
6548 AM_INIT_AUTOMAKE(gettext, @value{VERSION})
6549 @end example
6550
6551 @noindent
6552 Of course, you replace @samp{gettext} with the name of your package,
6553 and @samp{@value{VERSION}} by its version numbers, exactly as they
6554 should appear in the packaged @code{tar} file name of your distribution
6555 (@file{gettext-@value{VERSION}.tar.gz}, here).
6556
6557 @item Check for internationalization support.
6558
6559 Here is the main @code{m4} macro for triggering internationalization
6560 support.  Just add this line to @file{configure.in}:
6561
6562 @example
6563 AM_GNU_GETTEXT
6564 @end example
6565
6566 @noindent
6567 This call is purposely simple, even if it generates a lot of configure
6568 time checking and actions.
6569
6570 If you have suppressed the @file{intl/} subdirectory by calling
6571 @code{gettextize} without @samp{--intl} option, this call should read
6572
6573 @example
6574 AM_GNU_GETTEXT([external])
6575 @end example
6576
6577 @item Have output files created.
6578
6579 The @code{AC_OUTPUT} directive, at the end of your @file{configure.in}
6580 file, needs to be modified in two ways:
6581
6582 @example
6583 AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in],
6584 [@var{existing additional actions}])
6585 @end example
6586
6587 The modification to the first argument to @code{AC_OUTPUT} asks
6588 for substitution in the @file{intl/} and @file{po/} directories.
6589 Note the @samp{.in} suffix used for @file{po/} only.  This is because
6590 the distributed file is really @file{po/Makefile.in.in}.
6591
6592 If you have suppressed the @file{intl/} subdirectory by calling
6593 @code{gettextize} without @samp{--intl} option, then you don't need to
6594 add @code{intl/Makefile} to the @code{AC_OUTPUT} line.
6595
6596 @end enumerate
6597
6598 @node config.guess, mkinstalldirs, configure.in, Adjusting Files
6599 @subsection @file{config.guess}, @file{config.sub} at top level
6600
6601 If you haven't suppressed the @file{intl/} subdirectory,
6602 you need to add the GNU @file{config.guess} and @file{config.sub} files
6603 to your distribution.  They are needed because the @file{intl/} directory
6604 has platform dependent support for determining the locale's character
6605 encoding and therefore needs to identify the platform.
6606
6607 You can obtain the newest version of @file{config.guess} and
6608 @file{config.sub} from the CVS of the @samp{config} project at
6609 @file{http://savannah.gnu.org/}. The commands to fetch them are
6610 @smallexample
6611 $ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess'
6612 $ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub'
6613 @end smallexample
6614 @noindent
6615 Less recent versions are also contained in the GNU @code{automake} and
6616 GNU @code{libtool} packages.
6617
6618 Normally, @file{config.guess} and @file{config.sub} are put at the
6619 top level of a distribution.  But it is also possible to put them in a
6620 subdirectory, altogether with other configuration support files like
6621 @file{install-sh}, @file{ltconfig}, @file{ltmain.sh},
6622 @file{mkinstalldirs} or @file{missing}.  All you need to do, other than
6623 moving the files, is to add the following line to your
6624 @file{configure.in}.
6625
6626 @example
6627 AC_CONFIG_AUX_DIR([@var{subdir}])
6628 @end example
6629
6630 @node mkinstalldirs, aclocal, config.guess, Adjusting Files
6631 @subsection @file{mkinstalldirs} at top level
6632 @cindex @file{mkinstalldirs} file
6633
6634 If @code{gettextize} has not already done it, you need to add the GNU
6635 @file{mkinstalldirs} script to your distribution.  It is needed because
6636 @samp{mkdir -p} is not portable enough.  You find this script in the
6637 GNU @code{automake} distribution.
6638
6639 Normally, @file{mkinstalldirs} is put at the top level of a distribution.
6640 But it is also possible to put it in a subdirectory, altogether with other
6641 configuration support files like @file{install-sh}, @file{ltconfig},
6642 @file{ltmain.sh} or @file{missing}.  All you need to do, other than
6643 moving the files, is to add the following line to your @file{configure.in}.
6644
6645 @example
6646 AC_CONFIG_AUX_DIR([@var{subdir}])
6647 @end example
6648
6649 @node aclocal, acconfig, mkinstalldirs, Adjusting Files
6650 @subsection @file{aclocal.m4} at top level
6651 @cindex @file{aclocal.m4} file
6652
6653 If you do not have an @file{aclocal.m4} file in your distribution,
6654 the simplest is to concatenate the files @file{codeset.m4},
6655 @file{gettext.m4}, @file{glibc2.m4}, @file{glibc21.m4}, @file{iconv.m4},
6656 @file{intdiv0.m4}, @file{intmax.m4}, @file{inttypes.m4}, @file{inttypes_h.m4},
6657 @file{inttypes-pri.m4}, @file{isc-posix.m4}, @file{lcmessage.m4},
6658 @file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4},
6659 @file{longdouble.m4}, @file{longlong.m4}, @file{printf-posix.m4},
6660 @file{progtest.m4}, @file{signed.m4}, @file{size_max.m4},
6661 @file{stdint_h.m4}, @file{uintmax_t.m4}, @file{ulonglong.m4},
6662 @file{wchar_t.m4}, @file{wint_t.m4}, @file{xsize.m4}
6663 from GNU @code{gettext}'s
6664 @file{m4/} directory into a single file.  If you have suppressed the
6665 @file{intl/} directory, only @file{gettext.m4}, @file{iconv.m4},
6666 @file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4},
6667 @file{progtest.m4} need to be concatenated.
6668
6669 If you already have an @file{aclocal.m4} file, then you will have
6670 to merge the said macro files into your @file{aclocal.m4}.  Note that if
6671 you are upgrading from a previous release of GNU @code{gettext}, you
6672 should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},
6673 etc.), as they usually
6674 change a little from one release of GNU @code{gettext} to the next.
6675 Their contents may vary as we get more experience with strange systems
6676 out there.
6677
6678 If you are using GNU @code{automake} 1.5 or newer, it is enough to put
6679 these macro files into a subdirectory named @file{m4/} and add the line
6680
6681 @example
6682 ACLOCAL_AMFLAGS = -I m4
6683 @end example
6684
6685 @noindent
6686 to your top level @file{Makefile.am}.
6687
6688 These macros check for the internationalization support functions
6689 and related informations.  Hopefully, once stabilized, these macros
6690 might be integrated in the standard Autoconf set, because this
6691 piece of @code{m4} code will be the same for all projects using GNU
6692 @code{gettext}.
6693
6694 @node acconfig, config.h.in, aclocal, Adjusting Files
6695 @subsection @file{acconfig.h} at top level
6696 @cindex @file{acconfig.h} file
6697
6698 Earlier GNU @code{gettext} releases required to put definitions for
6699 @code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES},
6700 @code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an
6701 @file{acconfig.h} file.  This is not needed any more; you can remove
6702 them from your @file{acconfig.h} file unless your package uses them
6703 independently from the @file{intl/} directory.
6704
6705 @node config.h.in, Makefile, acconfig, Adjusting Files
6706 @subsection @file{config.h.in} at top level
6707 @cindex @file{config.h.in} file
6708
6709 The include file template that holds the C macros to be defined by
6710 @code{configure} is usually called @file{config.h.in} and may be
6711 maintained either manually or automatically.
6712
6713 If @code{gettextize} has created an @file{intl/} directory, this file
6714 must be called @file{config.h.in} and must be at the top level.  If,
6715 however, you have suppressed the @file{intl/} directory by calling
6716 @code{gettextize} without @samp{--intl} option, then you can choose the
6717 name of this file and its location freely.
6718
6719 If it is maintained automatically, by use of the @samp{autoheader}
6720 program, you need to do nothing about it.  This is the case in particular
6721 if you are using GNU @code{automake}.
6722
6723 If it is maintained manually, and if @code{gettextize} has created an
6724 @file{intl/} directory, you should switch to using @samp{autoheader}.
6725 The list of C macros to be added for the sake of the @file{intl/}
6726 directory is just too long to be maintained manually; it also changes
6727 between different versions of GNU @code{gettext}.
6728
6729 If it is maintained manually, and if on the other hand you have
6730 suppressed the @file{intl/} directory by calling @code{gettextize}
6731 without @samp{--intl} option, then you can get away by adding the
6732 following lines to @file{config.h.in}:
6733
6734 @example
6735 /* Define to 1 if translation of program messages to the user's
6736    native language is requested. */
6737 #undef ENABLE_NLS
6738 @end example
6739
6740 @node Makefile, src/Makefile, config.h.in, Adjusting Files
6741 @subsection @file{Makefile.in} at top level
6742
6743 Here are a few modifications you need to make to your main, top-level
6744 @file{Makefile.in} file.
6745
6746 @enumerate
6747 @item
6748 Add the following lines near the beginning of your @file{Makefile.in},
6749 so the @samp{dist:} goal will work properly (as explained further down):
6750
6751 @example
6752 PACKAGE = @@PACKAGE@@
6753 VERSION = @@VERSION@@
6754 @end example
6755
6756 @item
6757 Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets
6758 distributed.
6759
6760 @item
6761 Wherever you process subdirectories in your @file{Makefile.in}, be sure
6762 you also process the subdirectories @samp{intl} and @samp{po}.  Special
6763 rules in the @file{Makefiles} take care for the case where no
6764 internationalization is wanted.
6765
6766 If you are using Makefiles, either generated by automake, or hand-written
6767 so they carefully follow the GNU coding standards, the effected goals for
6768 which the new subdirectories must be handled include @samp{installdirs},
6769 @samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.
6770
6771 Here is an example of a canonical order of processing.  In this
6772 example, we also define @code{SUBDIRS} in @code{Makefile.in} for it
6773 to be further used in the @samp{dist:} goal.
6774
6775 @example
6776 SUBDIRS = doc intl lib src po
6777 @end example
6778
6779 Note that you must arrange for @samp{make} to descend into the
6780 @code{intl} directory before descending into other directories containing
6781 code which make use of the @code{libintl.h} header file.  For this
6782 reason, here we mention @code{intl} before @code{lib} and @code{src}.
6783
6784 @item
6785 A delicate point is the @samp{dist:} goal, as both
6786 @file{intl/Makefile} and @file{po/Makefile} will later assume that the
6787 proper directory has been set up from the main @file{Makefile}.  Here is
6788 an example at what the @samp{dist:} goal might look like:
6789
6790 @example
6791 distdir = $(PACKAGE)-$(VERSION)
6792 dist: Makefile
6793         rm -fr $(distdir)
6794         mkdir $(distdir)
6795         chmod 777 $(distdir)
6796         for file in $(DISTFILES); do \
6797           ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
6798         done
6799         for subdir in $(SUBDIRS); do \
6800           mkdir $(distdir)/$$subdir || exit 1; \
6801           chmod 777 $(distdir)/$$subdir; \
6802           (cd $$subdir && $(MAKE) $@@) || exit 1; \
6803         done
6804         tar chozf $(distdir).tar.gz $(distdir)
6805         rm -fr $(distdir)
6806 @end example
6807
6808 @end enumerate
6809
6810 Note that if you are using GNU @code{automake}, @file{Makefile.in} is
6811 automatically generated from @file{Makefile.am}, and all needed changes
6812 to @file{Makefile.am} are already made by running @samp{gettextize}.
6813
6814 @node src/Makefile, lib/gettext.h, Makefile, Adjusting Files
6815 @subsection @file{Makefile.in} in @file{src/}
6816
6817 Some of the modifications made in the main @file{Makefile.in} will
6818 also be needed in the @file{Makefile.in} from your package sources,
6819 which we assume here to be in the @file{src/} subdirectory.  Here are
6820 all the modifications needed in @file{src/Makefile.in}:
6821
6822 @enumerate
6823 @item
6824 In view of the @samp{dist:} goal, you should have these lines near the
6825 beginning of @file{src/Makefile.in}:
6826
6827 @example
6828 PACKAGE = @@PACKAGE@@
6829 VERSION = @@VERSION@@
6830 @end example
6831
6832 @item
6833 If not done already, you should guarantee that @code{top_srcdir}
6834 gets defined.  This will serve for @code{cpp} include files.  Just add
6835 the line:
6836
6837 @example
6838 top_srcdir = @@top_srcdir@@
6839 @end example
6840
6841 @item
6842 You might also want to define @code{subdir} as @samp{src}, later
6843 allowing for almost uniform @samp{dist:} goals in all your
6844 @file{Makefile.in}.  At list, the @samp{dist:} goal below assume that
6845 you used:
6846
6847 @example
6848 subdir = src
6849 @end example
6850
6851 @item
6852 The @code{main} function of your program will normally call
6853 @code{bindtextdomain} (see @pxref{Triggering}), like this:
6854
6855 @example
6856 bindtextdomain (@var{PACKAGE}, LOCALEDIR);
6857 textdomain (@var{PACKAGE});
6858 @end example
6859
6860 To make LOCALEDIR known to the program, add the following lines to
6861 @file{Makefile.in}:
6862
6863 @example
6864 datadir = @@datadir@@
6865 localedir = $(datadir)/locale
6866 DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
6867 @end example
6868
6869 Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus
6870 @code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.
6871
6872 @item
6873 You should ensure that the final linking will use @code{@@LIBINTL@@} or
6874 @code{@@LTLIBINTL@@} as a library.  @code{@@LIBINTL@@} is for use without
6875 @code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}.  An
6876 easy way to achieve this is to manage that it gets into @code{LIBS}, like
6877 this:
6878
6879 @example
6880 LIBS = @@LIBINTL@@ @@LIBS@@
6881 @end example
6882
6883 In most packages internationalized with GNU @code{gettext}, one will
6884 find a directory @file{lib/} in which a library containing some helper
6885 functions will be build.  (You need at least the few functions which the
6886 GNU @code{gettext} Library itself needs.)  However some of the functions
6887 in the @file{lib/} also give messages to the user which of course should be
6888 translated, too.  Taking care of this, the support library (say
6889 @file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and
6890 @code{@@LIBS@@} in the above example.  So one has to write this:
6891
6892 @example
6893 LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@
6894 @end example
6895
6896 @item
6897 You should also ensure that directory @file{intl/} will be searched for
6898 C preprocessor include files in all circumstances.  So, you have to
6899 manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will
6900 be given to the C compiler.
6901
6902 @item
6903 Your @samp{dist:} goal has to conform with others.  Here is a
6904 reasonable definition for it:
6905
6906 @example
6907 distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
6908 dist: Makefile $(DISTFILES)
6909         for file in $(DISTFILES); do \
6910           ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \
6911         done
6912 @end example
6913
6914 @end enumerate
6915
6916 Note that if you are using GNU @code{automake}, @file{Makefile.in} is
6917 automatically generated from @file{Makefile.am}, and the first three
6918 changes and the last change are not necessary.  The remaining needed
6919 @file{Makefile.am} modifications are the following:
6920
6921 @enumerate
6922 @item
6923 To make LOCALEDIR known to the program, add the following to
6924 @file{Makefile.am}:
6925
6926 @example
6927 <module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
6928 @end example
6929
6930 @noindent
6931 for each specific module or compilation unit, or
6932
6933 @example
6934 AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
6935 @end example
6936
6937 for all modules and compilation units together.  Furthermore, add this
6938 line to define @samp{localedir}:
6939
6940 @example
6941 localedir = $(datadir)/locale
6942 @end example
6943
6944 @item
6945 To ensure that the final linking will use @code{@@LIBINTL@@} or
6946 @code{@@LTLIBINTL@@} as a library, add the following to
6947 @file{Makefile.am}:
6948
6949 @example
6950 <program>_LDADD = @@LIBINTL@@
6951 @end example
6952
6953 @noindent
6954 for each specific program, or
6955
6956 @example
6957 LDADD = @@LIBINTL@@
6958 @end example
6959
6960 for all programs together.  Remember that when you use @code{libtool}
6961 to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@
6962 for that program.
6963
6964 @item
6965 If you have an @file{intl/} directory, whose contents is created by
6966 @code{gettextize}, then to ensure that it will be searched for
6967 C preprocessor include files in all circumstances, add something like
6968 this to @file{Makefile.am}:
6969
6970 @example
6971 AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl
6972 @end example
6973
6974 @end enumerate
6975
6976 @node lib/gettext.h,  , src/Makefile, Adjusting Files
6977 @subsection @file{gettext.h} in @file{lib/}
6978 @cindex @file{gettext.h} file
6979 @cindex turning off NLS support
6980 @cindex disabling NLS
6981
6982 Internationalization of packages, as provided by GNU @code{gettext}, is
6983 optional.  It can be turned off in two situations:
6984
6985 @itemize @bullet
6986 @item
6987 When the installer has specified @samp{./configure --disable-nls}.  This
6988 can be useful when small binaries are more important than features, for
6989 example when building utilities for boot diskettes.  It can also be useful
6990 in order to get some specific C compiler warnings about code quality with
6991 some older versions of GCC (older than 3.0).
6992
6993 @item
6994 When the package does not include the @code{intl/} subdirectory, and the
6995 libintl.h header (with its associated libintl library, if any) is not
6996 already installed on the system, it is preferrable that the package builds
6997 without internationalization support, rather than to give a compilation
6998 error.
6999 @end itemize
7000
7001 A C preprocessor macro can be used to detect these two cases.  Usually,
7002 when @code{libintl.h} was found and not explicitly disabled, the
7003 @code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated
7004 configuration file (usually called @file{config.h}).  In the two negative
7005 situations, however, this macro will not be defined, thus it will evaluate
7006 to 0 in C preprocessor expressions.
7007
7008 @cindex include file @file{libintl.h}
7009 @file{gettext.h} is a convenience header file for conditional use of
7010 @file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro.  If
7011 @code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it
7012 defines no-op substitutes for the libintl.h functions.  We recommend
7013 the use of @code{"gettext.h"} over direct use of @file{<libintl.h>},
7014 so that portability to older systems is guaranteed and installers can
7015 turn off internationalization if they want to.  In the C code, you will
7016 then write
7017
7018 @example
7019 #include "gettext.h"
7020 @end example
7021
7022 @noindent
7023 instead of
7024
7025 @example
7026 #include <libintl.h>
7027 @end example
7028
7029 The location of @code{gettext.h} is usually in a directory containing
7030 auxiliary include files.  In many GNU packages, there is a directory
7031 @file{lib/} containing helper functions; @file{gettext.h} fits there.
7032 In other packages, it can go into the @file{src} directory.
7033
7034 Do not install the @code{gettext.h} file in public locations.  Every
7035 package that needs it should contain a copy of it on its own.
7036
7037 @node autoconf macros, CVS Issues, Adjusting Files, Maintainers
7038 @section Autoconf macros for use in @file{configure.in}
7039 @cindex autoconf macros for @code{gettext}
7040
7041 GNU @code{gettext} installs macros for use in a package's
7042 @file{configure.in} or @file{configure.ac}.
7043 @xref{Top, , Introduction, autoconf, The Autoconf Manual}.
7044 The primary macro is, of course, @code{AM_GNU_GETTEXT}.
7045
7046 @menu
7047 * AM_GNU_GETTEXT::              AM_GNU_GETTEXT in @file{gettext.m4}
7048 * AM_GNU_GETTEXT_VERSION::      AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
7049 * AM_PO_SUBDIRS::               AM_PO_SUBDIRS in @file{po.m4}
7050 * AM_ICONV::                    AM_ICONV in @file{iconv.m4}
7051 @end menu
7052
7053 @node AM_GNU_GETTEXT, AM_GNU_GETTEXT_VERSION, autoconf macros, autoconf macros
7054 @subsection AM_GNU_GETTEXT in @file{gettext.m4}
7055
7056 @amindex AM_GNU_GETTEXT
7057 The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext
7058 function family in either the C library or a separate @code{libintl}
7059 library (shared or static libraries are both supported) or in the package's
7060 @file{intl/} directory.  It also invokes @code{AM_PO_SUBDIRS}, thus preparing
7061 the @file{po/} directories of the package for building.
7062
7063 @code{AM_GNU_GETTEXT} accepts up to three optional arguments.  The general
7064 syntax is
7065
7066 @example
7067 AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}], [@var{intldir}])
7068 @end example
7069
7070 @c We don't document @var{intlsymbol} = @samp{use-libtool} here, because
7071 @c it is of no use for packages other than GNU gettext itself.  (Such packages
7072 @c are not allowed to install the shared libintl.  But if they use libtool,
7073 @c then it is in order to install shared libraries that depend on libintl.)
7074 @var{intlsymbol} can be @samp{external} or @samp{no-libtool}.  The default
7075 (if it is not specified or empty) is @samp{no-libtool}.  @var{intlsymbol}
7076 should be @samp{external} for packages with no @file{intl/} directory,
7077 and @samp{no-libtool} for packages with an @file{intl/} directory.  In
7078 the latter case, a static library @code{$(top_builddir)/intl/libintl.a}
7079 will be created.
7080
7081 If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU
7082 gettext implementations (in libc or libintl) without the @code{ngettext()}
7083 function will be ignored.  If @var{needsymbol} is specified and is
7084 @samp{need-formatstring-macros}, then GNU gettext implementations that don't
7085 support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored.
7086 Only one @var{needsymbol} can be specified.  To specify more than one
7087 requirement, just specify the strongest one among them.  The hierarchy among
7088 the various alternatives is as follows: @samp{need-formatstring-macros}
7089 implies @samp{need-ngettext}.
7090
7091 @var{intldir} is used to find the intl libraries.  If empty, the value
7092 @samp{$(top_builddir)/intl/} is used.
7093
7094 The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is
7095 available and should be used.  If so, it sets the @code{USE_NLS} variable
7096 to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf
7097 generated configuration file (usually called @file{config.h}); it sets
7098 the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options
7099 for use in a Makefile (@code{LIBINTL} for use without libtool,
7100 @code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to
7101 @code{CPPFLAGS} if necessary.  In the negative case, it sets
7102 @code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL}
7103 to empty and doesn't change @code{CPPFLAGS}.
7104
7105 The complexities that @code{AM_GNU_GETTEXT} deals with are the following:
7106
7107 @itemize @bullet
7108 @item
7109 @cindex @code{libintl} library
7110 Some operating systems have @code{gettext} in the C library, for example
7111 glibc.  Some have it in a separate library @code{libintl}.  GNU @code{libintl}
7112 might have been installed as part of the GNU @code{gettext} package.
7113
7114 @item
7115 GNU @code{libintl}, if installed, is not necessarily already in the search
7116 path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
7117 the library search path).
7118
7119 @item
7120 Except for glibc, the operating system's native @code{gettext} cannot
7121 exploit the GNU mo files, doesn't have the necessary locale dependency
7122 features, and cannot convert messages from the catalog's text encoding
7123 to the user's locale encoding.
7124
7125 @item
7126 GNU @code{libintl}, if installed, is not necessarily already in the
7127 run time library search path.  To avoid the need for setting an environment
7128 variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
7129 run time search path options to the @code{LIBINTL} and @code{LTLIBINTL}
7130 variables.  This works on most systems, but not on some operating systems
7131 with limited shared library support, like SCO.
7132
7133 @item
7134 GNU @code{libintl} relies on POSIX/XSI @code{iconv}.  The macro checks for
7135 linker options needed to use iconv and appends them to the @code{LIBINTL}
7136 and @code{LTLIBINTL} variables.
7137 @end itemize
7138
7139 @node AM_GNU_GETTEXT_VERSION, AM_PO_SUBDIRS, AM_GNU_GETTEXT, autoconf macros
7140 @subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4}
7141
7142 @amindex AM_GNU_GETTEXT_VERSION
7143 The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of
7144 the GNU gettext infrastructure that is used by the package.
7145
7146 The use of this macro is optional; only the @code{autopoint} program makes
7147 use of it (@pxref{CVS Issues}).
7148
7149 @node AM_PO_SUBDIRS, AM_ICONV, AM_GNU_GETTEXT_VERSION, autoconf macros
7150 @subsection AM_PO_SUBDIRS in @file{po.m4}
7151
7152 @amindex AM_PO_SUBDIRS
7153 The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the
7154 package for building.  This macro should be used in internationalized
7155 programs written in other programming languages than C, C++, Objective C,
7156 for example @code{sh}, @code{Python}, @code{Lisp}.  See @ref{Programming
7157 Languages} for a list of programming languages that support localization
7158 through PO files.
7159
7160 The @code{AM_PO_SUBDIRS} macro determines whether internationalization
7161 should be used.  If so, it sets the @code{USE_NLS} variable to @samp{yes},
7162 otherwise to @samp{no}.  It also determines the right values for Makefile
7163 variables in each @file{po/} directory.
7164
7165 @node AM_ICONV,  , AM_PO_SUBDIRS, autoconf macros
7166 @subsection AM_ICONV in @file{iconv.m4}
7167
7168 @amindex AM_ICONV
7169 The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI
7170 @code{iconv} function family in either the C library or a separate
7171 @code{libiconv} library.  If found, it sets the @code{am_cv_func_iconv}
7172 variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf
7173 generated configuration file (usually called @file{config.h}); it defines
7174 @code{ICONV_CONST} to @samp{const} or to empty, depending on whether the
7175 second argument of @code{iconv()} is of type @samp{const char **} or
7176 @samp{char **}; it sets the variables @code{LIBICONV} and
7177 @code{LTLIBICONV} to the linker options for use in a Makefile
7178 (@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with
7179 libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if
7180 necessary.  If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to
7181 empty and doesn't change @code{CPPFLAGS}.
7182
7183 The complexities that @code{AM_ICONV} deals with are the following:
7184
7185 @itemize @bullet
7186 @item
7187 @cindex @code{libiconv} library
7188 Some operating systems have @code{iconv} in the C library, for example
7189 glibc.  Some have it in a separate library @code{libiconv}, for example
7190 OSF/1 or FreeBSD.  Regardless of the operating system, GNU @code{libiconv}
7191 might have been installed.  In that case, it should be used instead of the
7192 operating system's native @code{iconv}.
7193
7194 @item
7195 GNU @code{libiconv}, if installed, is not necessarily already in the search
7196 path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for
7197 the library search path).
7198
7199 @item
7200 GNU @code{libiconv} is binary incompatible with some operating system's
7201 native @code{iconv}, for example on FreeBSD.  Use of an @file{iconv.h}
7202 and @file{libiconv.so} that don't fit together would produce program
7203 crashes.
7204
7205 @item
7206 GNU @code{libiconv}, if installed, is not necessarily already in the
7207 run time library search path.  To avoid the need for setting an environment
7208 variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate
7209 run time search path options to the @code{LIBICONV} variable.  This works
7210 on most systems, but not on some operating systems with limited shared
7211 library support, like SCO.
7212 @end itemize
7213
7214 @file{iconv.m4} is distributed with the GNU gettext package because
7215 @file{gettext.m4} relies on it.
7216
7217 @node CVS Issues, Release Management, autoconf macros, Maintainers
7218 @section Integrating with CVS
7219
7220 Many projects use CVS for distributed development, version control and
7221 source backup.  This section gives some advice how to manage the uses
7222 of @code{cvs}, @code{gettextize}, @code{autopoint} and @code{autoconf}.
7223
7224 @menu
7225 * Distributed CVS::             Avoiding version mismatch in distributed development
7226 * Files under CVS::             Files to put under CVS version control
7227 * autopoint Invocation::        Invoking the @code{autopoint} Program
7228 @end menu
7229
7230 @node Distributed CVS, Files under CVS, CVS Issues, CVS Issues
7231 @subsection Avoiding version mismatch in distributed development
7232
7233 In a project development with multiple developers, using CVS, there
7234 should be a single developer who occasionally - when there is desire to
7235 upgrade to a new @code{gettext} version - runs @code{gettextize} and
7236 performs the changes listed in @ref{Adjusting Files}, and then commits
7237 his changes to the CVS.
7238
7239 It is highly recommended that all developers on a project use the same
7240 version of GNU @code{gettext} in the package.  In other words, if a
7241 developer runs @code{gettextize}, he should go the whole way, make the
7242 necessary remaining changes and commit his changes to the CVS.
7243 Otherwise the following damages will likely occur:
7244
7245 @itemize @bullet
7246 @item
7247 Apparent version mismatch between developers.  Since some @code{gettext}
7248 specific portions in @file{configure.in}, @file{configure.ac} and
7249 @code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext}
7250 version, the use of infrastructure files belonging to different
7251 @code{gettext} versions can easily lead to build errors.
7252
7253 @item
7254 Hidden version mismatch.  Such version mismatch can also lead to
7255 malfunctioning of the package, that may be undiscovered by the developers.
7256 The worst case of hidden version mismatch is that internationalization
7257 of the package doesn't work at all.
7258
7259 @item
7260 Release risks.  All developers implicitly perform constant testing on
7261 a package.  This is important in the days and weeks before a release.
7262 If the guy who makes the release tar files uses a different version
7263 of GNU @code{gettext} than the other developers, the distribution will
7264 be less well tested than if all had been using the same @code{gettext}
7265 version.  For example, it is possible that a platform specific bug goes
7266 undiscovered due to this constellation.
7267 @end itemize
7268
7269 @node Files under CVS, autopoint Invocation, Distributed CVS, CVS Issues
7270 @subsection Files to put under CVS version control
7271
7272 There are basically three ways to deal with generated files in the
7273 context of a CVS repository, such as @file{configure} generated from
7274 @file{configure.in}, @code{@var{parser}.c} generated from
7275 @code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled by
7276 @code{gettextize} or @code{autopoint}.
7277
7278 @enumerate
7279 @item
7280 All generated files are always committed into the repository.
7281
7282 @item
7283 All generated files are committed into the repository occasionally,
7284 for example each time a release is made.
7285
7286 @item
7287 Generated files are never committed into the repository.
7288 @end enumerate
7289
7290 Each of these three approaches has different advantages and drawbacks.
7291
7292 @enumerate
7293 @item
7294 The advantage is that anyone can check out the CVS at any moment and
7295 gets a working build.  The drawbacks are:  1a. It requires some frequent
7296 "cvs commit" actions by the maintainers.  1b. The repository grows in size
7297 quite fast.
7298
7299 @item
7300 The advantage is that anyone can check out the CVS, and the usual
7301 "./configure; make" will work.  The drawbacks are:  2a. The one who
7302 checks out the repository needs tools like GNU @code{automake},
7303 GNU @code{autoconf}, GNU @code{m4} installed in his PATH; sometimes
7304 he even needs particular versions of them.  2b. When a release is made
7305 and a commit is made on the generated files, the other developers get
7306 conflicts on the generated files after doing "cvs update".  Although
7307 these conflicts are easy to resolve, they are annoying.
7308
7309 @item
7310 The advantage is less work for the maintainers.  The drawback is that
7311 anyone who checks out the CVS not only needs tools like GNU @code{automake},
7312 GNU @code{autoconf}, GNU @code{m4} installed in his PATH, but also that
7313 he needs to perform a package specific pre-build step before being able
7314 to "./configure; make".
7315 @end enumerate
7316
7317 For the first and second approach, all files modified or brought in
7318 by the occasional @code{gettextize} invocation and update should be
7319 committed into the CVS.
7320
7321 For the third approach, the maintainer can omit from the CVS repository
7322 all the files that @code{gettextize} mentions as "copy".  Instead, he
7323 adds to the @file{configure.in} or @file{configure.ac} a line of the
7324 form
7325
7326 @example
7327 AM_GNU_GETTEXT_VERSION(@value{VERSION})
7328 @end example
7329
7330 @noindent
7331 and adds to the package's pre-build script an invocation of
7332 @samp{autopoint}.  For everyone who checks out the CVS, this
7333 @code{autopoint} invocation will copy into the right place the
7334 @code{gettext} infrastructure files that have been omitted from the CVS.
7335
7336 The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is
7337 the version of the @code{gettext} infrastructure that the package wants
7338 to use.  It is also the minimum version number of the @samp{autopoint}
7339 program.  So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the
7340 developers can have any version >= 0.11.5 installed; the package will work
7341 with the 0.11.5 infrastructure in all developers' builds.  When the
7342 maintainer then runs gettextize from, say, version 0.12.1 on the package,
7343 the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed
7344 into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that
7345 use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer
7346 installed.
7347
7348 @node autopoint Invocation,  , Files under CVS, CVS Issues
7349 @subsection Invoking the @code{autopoint} Program
7350
7351 @include autopoint.texi
7352
7353 @node Release Management,  , CVS Issues, Maintainers
7354 @section Creating a Distribution Tarball
7355
7356 @cindex release
7357 @cindex distribution tarball
7358 In projects that use GNU @code{automake}, the usual commands for creating
7359 a distribution tarball, @samp{make dist} or @samp{make distcheck},
7360 automatically update the PO files as needed.
7361
7362 If GNU @code{automake} is not used, the maintainer needs to perform this
7363 update before making a release:
7364
7365 @example
7366 $ ./configure
7367 $ (cd po; make update-po)
7368 $ make distclean
7369 @end example
7370
7371 @node Programming Languages, Conclusion, Maintainers, Top
7372 @chapter Other Programming Languages
7373
7374 While the presentation of @code{gettext} focuses mostly on C and
7375 implicitly applies to C++ as well, its scope is far broader than that:
7376 Many programming languages, scripting languages and other textual data
7377 like GUI resources or package descriptions can make use of the gettext
7378 approach.
7379
7380 @menu
7381 * Language Implementors::       The Language Implementor's View
7382 * Programmers for other Languages::  The Programmer's View
7383 * Translators for other Languages::  The Translator's View
7384 * Maintainers for other Languages::  The Maintainer's View
7385 * List of Programming Languages::  Individual Programming Languages
7386 * List of Data Formats::        Internationalizable Data
7387 @end menu
7388
7389 @node Language Implementors, Programmers for other Languages, Programming Languages, Programming Languages
7390 @section The Language Implementor's View
7391 @cindex programming languages
7392 @cindex scripting languages
7393
7394 All programming and scripting languages that have the notion of strings
7395 are eligible to supporting @code{gettext}.  Supporting @code{gettext}
7396 means the following:
7397
7398 @enumerate
7399 @item
7400 You should add to the language a syntax for translatable strings.  In
7401 principle, a function call of @code{gettext} would do, but a shorthand
7402 syntax helps keeping the legibility of internationalized programs.  For
7403 example, in C we use the syntax @code{_("string")}, and in GNU awk we use
7404 the shorthand @code{_"string"}.
7405
7406 @item
7407 You should arrange that evaluation of such a translatable string at
7408 runtime calls the @code{gettext} function, or performs equivalent
7409 processing.
7410
7411 @item
7412 Similarly, you should make the functions @code{ngettext},
7413 @code{dcgettext}, @code{dcngettext} available from within the language.
7414 These functions are less often used, but are nevertheless necessary for
7415 particular purposes: @code{ngettext} for correct plural handling, and
7416 @code{dcgettext} and @code{dcngettext} for obeying other locale
7417 environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or
7418 @code{LC_MONETARY}.  For these latter functions, you need to make the
7419 @code{LC_*} constants, available in the C header @code{<locale.h>},
7420 referenceable from within the language, usually either as enumeration
7421 values or as strings.
7422
7423 @item
7424 You should allow the programmer to designate a message domain, either by
7425 making the @code{textdomain} function available from within the
7426 language, or by introducing a magic variable called @code{TEXTDOMAIN}.
7427 Similarly, you should allow the programmer to designate where to search
7428 for message catalogs, by providing access to the @code{bindtextdomain}
7429 function.
7430
7431 @item
7432 You should either perform a @code{setlocale (LC_ALL, "")} call during
7433 the startup of your language runtime, or allow the programmer to do so.
7434 Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and
7435 @code{LC_CTYPE} locale facets are not both set.
7436
7437 @item
7438 A programmer should have a way to extract translatable strings from a
7439 program into a PO file.  The GNU @code{xgettext} program is being
7440 extended to support very different programming languages.  Please
7441 contact the GNU @code{gettext} maintainers to help them doing this.  If
7442 the string extractor is best integrated into your language's parser, GNU
7443 @code{xgettext} can function as a front end to your string extractor.
7444
7445 @item
7446 The language's library should have a string formatting facility where
7447 the arguments of a format string are denoted by a positional number or a
7448 name.  This is needed because for some languages and some messages with
7449 more than one substitutable argument, the translation will need to
7450 output the substituted arguments in different order.  @xref{c-format Flag}.
7451
7452 @item
7453 If the language has more than one implementation, and not all of the
7454 implementations use @code{gettext}, but the programs should be portable
7455 across implementations, you should provide a no-i18n emulation, that
7456 makes the other implementations accept programs written for yours,
7457 without actually translating the strings.
7458
7459 @item
7460 To help the programmer in the task of marking translatable strings,
7461 which is usually performed using the Emacs PO mode, you are welcome to
7462 contact the GNU @code{gettext} maintainers, so they can add support for
7463 your language to @file{po-mode.el}.
7464 @end enumerate
7465
7466 On the implementation side, three approaches are possible, with
7467 different effects on portability and copyright:
7468
7469 @itemize @bullet
7470 @item
7471 You may integrate the GNU @code{gettext}'s @file{intl/} directory in
7472 your package, as described in @ref{Maintainers}.  This allows you to
7473 have internationalization on all kinds of platforms.  Note that when you
7474 then distribute your package, it legally falls under the GNU General
7475 Public License, and the GNU project will be glad about your contribution
7476 to the Free Software pool.
7477
7478 @item
7479 You may link against GNU @code{gettext} functions if they are found in
7480 the C library.  For example, an autoconf test for @code{gettext()} and
7481 @code{ngettext()} will detect this situation.  For the moment, this test
7482 will succeed on GNU systems and not on other platforms.  No severe
7483 copyright restrictions apply.
7484
7485 @item
7486 You may emulate or reimplement the GNU @code{gettext} functionality.
7487 This has the advantage of full portability and no copyright
7488 restrictions, but also the drawback that you have to reimplement the GNU
7489 @code{gettext} features (such as the @code{LANGUAGE} environment
7490 variable, the locale aliases database, the automatic charset conversion,
7491 and plural handling).
7492 @end itemize
7493
7494 @node Programmers for other Languages, Translators for other Languages, Language Implementors, Programming Languages
7495 @section The Programmer's View
7496
7497 For the programmer, the general procedure is the same as for the C
7498 language.  The Emacs PO mode supports other languages, and the GNU
7499 @code{xgettext} string extractor recognizes other languages based on the
7500 file extension or a command-line option.  In some languages,
7501 @code{setlocale} is not needed because it is already performed by the
7502 underlying language runtime.
7503
7504 @node Translators for other Languages, Maintainers for other Languages, Programmers for other Languages, Programming Languages
7505 @section The Translator's View
7506
7507 The translator works exactly as in the C language case.  The only
7508 difference is that when translating format strings, she has to be aware
7509 of the language's particular syntax for positional arguments in format
7510 strings.
7511
7512 @menu
7513 * c-format::                    C Format Strings
7514 * objc-format::                 Objective C Format Strings
7515 * sh-format::                   Shell Format Strings
7516 * python-format::               Python Format Strings
7517 * lisp-format::                 Lisp Format Strings
7518 * elisp-format::                Emacs Lisp Format Strings
7519 * librep-format::               librep Format Strings
7520 * scheme-format::               Scheme Format Strings
7521 * smalltalk-format::            Smalltalk Format Strings
7522 * java-format::                 Java Format Strings
7523 * csharp-format::               C# Format Strings
7524 * awk-format::                  awk Format Strings
7525 * object-pascal-format::        Object Pascal Format Strings
7526 * ycp-format::                  YCP Format Strings
7527 * tcl-format::                  Tcl Format Strings
7528 * perl-format::                 Perl Format Strings
7529 * php-format::                  PHP Format Strings
7530 * gcc-internal-format::         GCC internal Format Strings
7531 * qt-format::                   Qt Format Strings
7532 @end menu
7533
7534 @node c-format, objc-format, Translators for other Languages, Translators for other Languages
7535 @subsection C Format Strings
7536
7537 C format strings are described in POSIX (IEEE P1003.1 2001), section
7538 XSH 3 fprintf(),
7539 @uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}.
7540 See also the fprintf(3) manual page,
7541 @uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php},
7542 @uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}.
7543
7544 Although format strings with positions that reorder arguments, such as
7545
7546 @example
7547 "Only %2$d bytes free on '%1$s'."
7548 @end example
7549
7550 @noindent
7551 which is semantically equivalent to
7552
7553 @example
7554 "'%s' has only %d bytes free."
7555 @end example
7556
7557 @noindent
7558 are a POSIX/XSI feature and not specified by ISO C 99, translators can rely
7559 on this reordering ability: On the few platforms where @code{printf()},
7560 @code{fprintf()} etc. don't support this feature natively, @file{libintl.a}
7561 or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>}
7562 activates these replacement functions automatically.
7563
7564 @cindex outdigits
7565 @cindex Arabic digits
7566 As a special feature for Farsi (Persian) and maybe Arabic, translators can
7567 insert an @samp{I} flag into numeric format directives.  For example, the
7568 translation of @code{"%d"} can be @code{"%Id"}.  The effect of this flag,
7569 on systems with GNU @code{libc}, is that in the output, the ASCII digits are
7570 replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale
7571 facet.  On other systems, the @code{gettext} function removes this flag,
7572 so that it has no effect.
7573
7574 Note that the programmer should @emph{not} put this flag into the
7575 untranslated string.  (Putting the @samp{I} format directive flag into an
7576 @var{msgid} string would lead to undefined behaviour on platforms without
7577 glibc when NLS is disabled.)
7578
7579 @node objc-format, sh-format, c-format, Translators for other Languages
7580 @subsection Objective C Format Strings
7581
7582 Objective C format strings are like C format strings.  They support an
7583 additional format directive: "$@@", which when executed consumes an argument
7584 of type @code{Object *}.
7585
7586 @node sh-format, python-format, objc-format, Translators for other Languages
7587 @subsection Shell Format Strings
7588
7589 Shell format strings, as supported by GNU gettext and the @samp{envsubst}
7590 program, are strings with references to shell variables in the form
7591 @code{$@var{variable}} or @code{$@{@var{variable}@}}.  References of the form
7592 @code{$@{@var{variable}-@var{default}@}},
7593 @code{$@{@var{variable}:-@var{default}@}},
7594 @code{$@{@var{variable}=@var{default}@}},
7595 @code{$@{@var{variable}:=@var{default}@}},
7596 @code{$@{@var{variable}+@var{replacement}@}},
7597 @code{$@{@var{variable}:+@var{replacement}@}},
7598 @code{$@{@var{variable}?@var{ignored}@}},
7599 @code{$@{@var{variable}:?@var{ignored}@}},
7600 that would be valid inside shell scripts, are not supported.  The
7601 @var{variable} names must consist solely of alphanumeric or underscore
7602 ASCII characters, not start with a digit and be nonempty; otherwise such
7603 a variable reference is ignored.
7604
7605 @node python-format, lisp-format, sh-format, Translators for other Languages
7606 @subsection Python Format Strings
7607
7608 Python format strings are described in
7609 @w{Python Library reference} /
7610 @w{2. Built-in Types, Exceptions and Functions} /
7611 @w{2.2. Built-in Types} /
7612 @w{2.2.6. Sequence Types} /
7613 @w{2.2.6.2. String Formatting Operations}.
7614 @uref{http://www.python.org/doc/2.2.1/lib/typesseq-strings.html}.
7615
7616 @node lisp-format, elisp-format, python-format, Translators for other Languages
7617 @subsection Lisp Format Strings
7618
7619 Lisp format strings are described in the Common Lisp HyperSpec,
7620 chapter 22.3 @w{Formatted Output},
7621 @uref{http://www.lisp.org/HyperSpec/Body/sec_22-3.html}.
7622
7623 @node elisp-format, librep-format, lisp-format, Translators for other Languages
7624 @subsection Emacs Lisp Format Strings
7625
7626 Emacs Lisp format strings are documented in the Emacs Lisp reference,
7627 section @w{Formatting Strings},
7628 @uref{http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}.
7629 Note that as of version 21, XEmacs supports numbered argument specifications
7630 in format strings while FSF Emacs doesn't.
7631
7632 @node librep-format, scheme-format, elisp-format, Translators for other Languages
7633 @subsection librep Format Strings
7634
7635 librep format strings are documented in the librep manual, section
7636 @w{Formatted Output},
7637 @url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output},
7638 @url{http://www.gwinnup.org/research/docs/librep.html#SEC122}.
7639
7640 @node scheme-format, smalltalk-format, librep-format, Translators for other Languages
7641 @subsection Scheme Format Strings
7642
7643 Scheme format strings are documented in the SLIB manual, section
7644 @w{Format Specification}.
7645
7646 @node smalltalk-format, java-format, scheme-format, Translators for other Languages
7647 @subsection Smalltalk Format Strings
7648
7649 Smalltalk format strings are described in the GNU Smalltalk documentation,
7650 class @code{CharArray}, methods @samp{bindWith:} and
7651 @samp{bindWithArguments:}.
7652 @uref{http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}.
7653 In summary, a directive starts with @samp{%} and is followed by @samp{%}
7654 or a nonzero digit (@samp{1} to @samp{9}).
7655
7656 @node java-format, csharp-format, smalltalk-format, Translators for other Languages
7657 @subsection Java Format Strings
7658
7659 Java format strings are described in the JDK documentation for class
7660 @code{java.text.MessageFormat},
7661 @uref{http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html}.
7662 See also the ICU documentation
7663 @uref{http://oss.software.ibm.com/icu/apiref/classMessageFormat.html}.
7664
7665 @node csharp-format, awk-format, java-format, Translators for other Languages
7666 @subsection C# Format Strings
7667
7668 C# format strings are described in the .NET documentation for class
7669 @code{System.String} and in
7670 @uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}.
7671
7672 @node awk-format, object-pascal-format, csharp-format, Translators for other Languages
7673 @subsection awk Format Strings
7674
7675 awk format strings are described in the gawk documentation, section
7676 @w{Printf},
7677 @uref{http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}.
7678
7679 @node object-pascal-format, ycp-format, awk-format, Translators for other Languages
7680 @subsection Object Pascal Format Strings
7681
7682 Where is this documented?
7683
7684 @node ycp-format, tcl-format, object-pascal-format, Translators for other Languages
7685 @subsection YCP Format Strings
7686
7687 YCP sformat strings are described in the libycp documentation
7688 @uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}.
7689 In summary, a directive starts with @samp{%} and is followed by @samp{%}
7690 or a nonzero digit (@samp{1} to @samp{9}).
7691
7692 @node tcl-format, perl-format, ycp-format, Translators for other Languages
7693 @subsection Tcl Format Strings
7694
7695 Tcl format strings are described in the @file{format.n} manual page,
7696 @uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}.
7697
7698 @node perl-format, php-format, tcl-format, Translators for other Languages
7699 @subsection Perl Format Strings
7700
7701 There are two kinds format strings in Perl: those acceptable to the
7702 Perl built-in function @code{printf}, labelled as @samp{perl-format},
7703 and those acceptable to the @code{libintl-perl} function @code{__x},
7704 labelled as @samp{perl-brace-format}.
7705
7706 Perl @code{printf} format strings are described in the @code{sprintf}
7707 section of @samp{man perlfunc}.
7708
7709 Perl brace format strings are described in the
7710 @file{Locale::TextDomain(3pm)} manual page of the CPAN package
7711 libintl-perl.  In brief, Perl format uses placeholders put between
7712 braces (@samp{@{} and @samp{@}}).  The placeholder must have the syntax
7713 of simple identifiers.
7714
7715 @node php-format, gcc-internal-format, perl-format, Translators for other Languages
7716 @subsection PHP Format Strings
7717
7718 PHP format strings are described in the documentation of the PHP function
7719 @code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or
7720 @uref{http://www.php.net/manual/en/function.sprintf.php}.
7721
7722 @node gcc-internal-format, qt-format, php-format, Translators for other Languages
7723 @subsection GCC internal Format Strings
7724
7725 These format strings are used inside the GCC sources.  In such a format
7726 string, a directive starts with @samp{%}, is optionally followed by a
7727 size specifier @samp{l}, an optional flag @samp{+}, another optional flag
7728 @samp{#}, and is finished by a specifier: @samp{%} denotes a literal
7729 percent sign, @samp{c} denotes a character, @samp{s} denotes a string,
7730 @samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x}
7731 denote an unsigned integer, @samp{.*s} denotes a string preceded by a
7732 width specification, @samp{H} denotes a @samp{location_t *} pointer,
7733 @samp{D} denotes a general declaration, @samp{F} denotes a function
7734 declaration, @samp{T} denotes a type, @samp{A} denotes a function argument,
7735 @samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L}
7736 denotes a programming language, @samp{O} denotes a binary operator,
7737 @samp{P} denotes a function parameter, @samp{Q} denotes an assignment
7738 operator, @samp{V} denotes a const/volatile qualifier.
7739
7740 @node qt-format,  , gcc-internal-format, Translators for other Languages
7741 @subsection Qt Format Strings
7742
7743 Qt format strings are described in the documentation of the QString class
7744 @uref{file:/usr/lib/qt-3.0.5/doc/html/qstring.html}.
7745 In summary, a directive consists of a @samp{%} followed by a digit. The same
7746 directive cannot occur more than once in a format string.
7747
7748 @node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages
7749 @section The Maintainer's View
7750
7751 For the maintainer, the general procedure differs from the C language
7752 case in two ways.
7753
7754 @itemize @bullet
7755 @item
7756 For those languages that don't use GNU gettext, the @file{intl/} directory
7757 is not needed and can be omitted.  This means that the maintainer calls the
7758 @code{gettextize} program without the @samp{--intl} option, and that he
7759 invokes the @code{AM_GNU_GETTEXT} autoconf macro via
7760 @samp{AM_GNU_GETTEXT([external])}.
7761
7762 @item
7763 If only a single programming language is used, the @code{XGETTEXT_OPTIONS}
7764 variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to
7765 match the @code{xgettext} options for that particular programming language.
7766 If the package uses more than one programming language with @code{gettext}
7767 support, it becomes necessary to change the POT file construction rule
7768 in @file{po/Makefile.in.in}.  It is recommended to make one @code{xgettext}
7769 invocation per programming language, each with the options appropriate for
7770 that language, and to combine the resulting files using @code{msgcat}.
7771 @end itemize
7772
7773 @node List of Programming Languages, List of Data Formats, Maintainers for other Languages, Programming Languages
7774 @section Individual Programming Languages
7775
7776 @c Here is a list of programming languages, as used for Free Software projects
7777 @c on SourceForge/Freshmeat, as of February 2002.  Those supported by gettext
7778 @c are marked with a star.
7779 @c   C                       3580     *
7780 @c   Perl                    1911     *
7781 @c   C++                     1379     *
7782 @c   Java                    1200     *
7783 @c   PHP                     1051     *
7784 @c   Python                   613     *
7785 @c   Unix Shell               357     *
7786 @c   Tcl                      266     *
7787 @c   SQL                      174
7788 @c   JavaScript               118
7789 @c   Assembly                 108
7790 @c   Scheme                    51
7791 @c   Ruby                      47
7792 @c   Lisp                      45     *
7793 @c   Objective C               39     *
7794 @c   PL/SQL                    29
7795 @c   Fortran                   25
7796 @c   Ada                       24
7797 @c   Delphi                    22
7798 @c   Awk                       19     *
7799 @c   Pascal                    19
7800 @c   ML                        19
7801 @c   Eiffel                    17
7802 @c   Emacs-Lisp                14     *
7803 @c   Zope                      14
7804 @c   ASP                       12
7805 @c   Forth                     12
7806 @c   Cold Fusion               10
7807 @c   Haskell                    9
7808 @c   Visual Basic               9
7809 @c   C#                         6     *
7810 @c   Smalltalk                  6     *
7811 @c   Basic                      5
7812 @c   Erlang                     5
7813 @c   Modula                     5
7814 @c   Object Pascal              5     *
7815 @c   Rexx                       5
7816 @c   Dylan                      4
7817 @c   Prolog                     4
7818 @c   APL                        3
7819 @c   PROGRESS                   2
7820 @c   Euler                      1
7821 @c   Euphoria                   1
7822 @c   Pliant                     1
7823 @c   Simula                     1
7824 @c   XBasic                     1
7825 @c   Logo                       0
7826 @c   Other Scripting Engines   49
7827 @c   Other                    116
7828
7829 @menu
7830 * C::                           C, C++, Objective C
7831 * sh::                          sh - Shell Script
7832 * bash::                        bash - Bourne-Again Shell Script
7833 * Python::                      Python
7834 * Common Lisp::                 GNU clisp - Common Lisp
7835 * clisp C::                     GNU clisp C sources
7836 * Emacs Lisp::                  Emacs Lisp
7837 * librep::                      librep
7838 * Scheme::                      GNU guile - Scheme
7839 * Smalltalk::                   GNU Smalltalk
7840 * Java::                        Java
7841 * C#::                          C#
7842 * gawk::                        GNU awk
7843 * Pascal::                      Pascal - Free Pascal Compiler
7844 * wxWindows::                   wxWindows library
7845 * YCP::                         YCP - YaST2 scripting language
7846 * Tcl::                         Tcl - Tk's scripting language
7847 * Perl::                        Perl
7848 * PHP::                         PHP Hypertext Preprocessor
7849 * Pike::                        Pike
7850 * GCC-source::                  GNU Compiler Collection sources
7851 @end menu
7852
7853 @node C, sh, List of Programming Languages, List of Programming Languages
7854 @subsection C, C++, Objective C
7855 @cindex C and C-like languages
7856
7857 @table @asis
7858 @item RPMs
7859 gcc, gpp, gobjc, glibc, gettext
7860
7861 @item File extension
7862 For C: @code{c}, @code{h}.
7863 @*For C++: @code{C}, @code{c++}, @code{cc}, @code{cxx}, @code{cpp}, @code{hpp}.
7864 @*For Objective C: @code{m}.
7865
7866 @item String syntax
7867 @code{"abc"}
7868
7869 @item gettext shorthand
7870 @code{_("abc")}
7871
7872 @item gettext/ngettext functions
7873 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
7874 @code{dngettext}, @code{dcngettext}
7875
7876 @item textdomain
7877 @code{textdomain} function
7878
7879 @item bindtextdomain
7880 @code{bindtextdomain} function
7881
7882 @item setlocale
7883 Programmer must call @code{setlocale (LC_ALL, "")}
7884
7885 @item Prerequisite
7886 @code{#include <libintl.h>}
7887 @*@code{#include <locale.h>}
7888 @*@code{#define _(string) gettext (string)}
7889
7890 @item Use or emulate GNU gettext
7891 Use
7892
7893 @item Extractor
7894 @code{xgettext -k_}
7895
7896 @item Formatting with positions
7897 @code{fprintf "%2$d %1$d"}
7898 @*In C++: @code{autosprintf "%2$d %1$d"}
7899 (@pxref{Top, , Introduction, autosprintf, GNU autosprintf})
7900
7901 @item Portability
7902 autoconf (gettext.m4) and #if ENABLE_NLS
7903
7904 @item po-mode marking
7905 yes
7906 @end table
7907
7908 The following examples are available in the @file{examples} directory:
7909 @code{hello-c}, @code{hello-c-gnome}, @code{hello-c++}, @code{hello-c++-qt},
7910 @code{hello-c++-kde}, @code{hello-c++-gnome}, @code{hello-objc},
7911 @code{hello-objc-gnustep}, @code{hello-objc-gnome}.
7912
7913 @node sh, bash, C, List of Programming Languages
7914 @subsection sh - Shell Script
7915 @cindex shell scripts
7916
7917 @table @asis
7918 @item RPMs
7919 bash, gettext
7920
7921 @item File extension
7922 @code{sh}
7923
7924 @item String syntax
7925 @code{"abc"}, @code{'abc'}, @code{abc}
7926
7927 @item gettext shorthand
7928 @code{"`gettext \"abc\"`"}
7929
7930 @item gettext/ngettext functions
7931 @pindex gettext
7932 @pindex ngettext
7933 @code{gettext}, @code{ngettext} programs
7934 @*@code{eval_gettext}, @code{eval_ngettext} shell functions
7935
7936 @item textdomain
7937 @vindex TEXTDOMAIN@r{, environment variable}
7938 environment variable @code{TEXTDOMAIN}
7939
7940 @item bindtextdomain
7941 @vindex TEXTDOMAINDIR@r{, environment variable}
7942 environment variable @code{TEXTDOMAINDIR}
7943
7944 @item setlocale
7945 automatic
7946
7947 @item Prerequisite
7948 @code{. gettext.sh}
7949
7950 @item Use or emulate GNU gettext
7951 use
7952
7953 @item Extractor
7954 @code{xgettext}
7955
7956 @item Formatting with positions
7957 ---
7958
7959 @item Portability
7960 fully portable
7961
7962 @item po-mode marking
7963 ---
7964 @end table
7965
7966 An example is available in the @file{examples} directory: @code{hello-sh}.
7967
7968 @menu
7969 * Preparing Shell Scripts::     Preparing Shell Scripts for Internationalization
7970 * gettext.sh::                  Contents of @code{gettext.sh}
7971 * gettext Invocation::          Invoking the @code{gettext} program
7972 * ngettext Invocation::         Invoking the @code{ngettext} program
7973 * envsubst Invocation::         Invoking the @code{envsubst} program
7974 * eval_gettext Invocation::     Invoking the @code{eval_gettext} function
7975 * eval_ngettext Invocation::    Invoking the @code{eval_ngettext} function
7976 @end menu
7977
7978 @node Preparing Shell Scripts, gettext.sh, sh, sh
7979 @subsubsection Preparing Shell Scripts for Internationalization
7980 @cindex preparing shell scripts for translation
7981
7982 Preparing a shell script for internationalization is conceptually similar
7983 to the steps described in @ref{Sources}.  The concrete steps for shell
7984 scripts are as follows.
7985
7986 @enumerate
7987 @item
7988 Insert the line
7989
7990 @smallexample
7991 . gettext.sh
7992 @end smallexample
7993
7994 near the top of the script.  @code{gettext.sh} is a shell function library
7995 that provides the functions
7996 @code{eval_gettext} (see @ref{eval_gettext Invocation}) and
7997 @code{eval_ngettext} (see @ref{eval_ngettext Invocation}).
7998 You have to ensure that @code{gettext.sh} can be found in the @code{PATH}.
7999
8000 @item
8001 Set and export the @code{TEXTDOMAIN} and @code{TEXTDOMAINDIR} environment
8002 variables.  Usually @code{TEXTDOMAIN} is the package or program name, and
8003 @code{TEXTDOMAINDIR} is the absolute pathname corresponding to
8004 @code{$prefix/share/locale}, where @code{$prefix} is the installation location.
8005
8006 @smallexample
8007 TEXTDOMAIN=@@PACKAGE@@
8008 export TEXTDOMAIN
8009 TEXTDOMAINDIR=@@LOCALEDIR@@
8010 export TEXTDOMAINDIR
8011 @end smallexample
8012
8013 @item
8014 Prepare the strings for translation, as described in @ref{Preparing Strings}.
8015
8016 @item
8017 Simplify translatable strings so that they don't contain command substitution
8018 (@code{"`...`"} or @code{"$(...)"}), variable access with defaulting (like
8019 @code{$@{@var{variable}-@var{default}@}}), access to positional arguments
8020 (like @code{$0}, @code{$1}, ...) or highly volatile shell variables (like
8021 @code{$?}). This can always be done through simple local code restructuring.
8022 For example,
8023
8024 @smallexample
8025 echo "Usage: $0 [OPTION] FILE..."
8026 @end smallexample
8027
8028 becomes
8029
8030 @smallexample
8031 program_name=$0
8032 echo "Usage: $program_name [OPTION] FILE..."
8033 @end smallexample
8034
8035 Similarly,
8036
8037 @smallexample
8038 echo "Remaining files: `ls | wc -l`"
8039 @end smallexample
8040
8041 becomes
8042
8043 @smallexample
8044 filecount="`ls | wc -l`"
8045 echo "Remaining files: $filecount"
8046 @end smallexample
8047
8048 @item
8049 For each translatable string, change the output command @samp{echo} or
8050 @samp{$echo} to @samp{gettext} (if the string contains no references to
8051 shell variables) or to @samp{eval_gettext} (if it refers to shell variables),
8052 followed by a no-argument @samp{echo} command (to account for the terminating
8053 newline). Similarly, for cases with plural handling, replace a conditional
8054 @samp{echo} command with an invocation of @samp{ngettext} or
8055 @samp{eval_ngettext}, followed by a no-argument @samp{echo} command.
8056
8057 When doing this, you also need to add an extra backslash before the dollar
8058 sign in references to shell variables, so that the @samp{eval_gettext}
8059 function receives the translatable string before the variable values are
8060 substituted into it. For example,
8061
8062 @smallexample
8063 echo "Remaining files: $filecount"
8064 @end smallexample
8065
8066 becomes
8067
8068 @smallexample
8069 eval_gettext "Remaining files: \$filecount"; echo
8070 @end smallexample
8071
8072 If the output command is not @samp{echo}, you can make it use @samp{echo}
8073 nevertheless, through the use of backquotes. However, note that inside
8074 backquotes, backslashes must be doubled to be effective (because the
8075 backquoting eats one level of backslashes). For example, assuming that
8076 @samp{error} is a shell function that signals an error,
8077
8078 @smallexample
8079 error "file not found: $filename"
8080 @end smallexample
8081
8082 is first transformed into
8083
8084 @smallexample
8085 error "`echo \"file not found: \$filename\"`"
8086 @end smallexample
8087
8088 which then becomes
8089
8090 @smallexample
8091 error "`eval_gettext \"file not found: \\\$filename\"`"
8092 @end smallexample
8093 @end enumerate
8094
8095 @node gettext.sh, gettext Invocation, Preparing Shell Scripts, sh
8096 @subsubsection Contents of @code{gettext.sh}
8097
8098 @code{gettext.sh}, contained in the run-time package of GNU gettext, provides
8099 the following:
8100
8101 @itemize @bullet
8102 @item $echo
8103 The variable @code{echo} is set to a command that outputs its first argument
8104 and a newline, without interpreting backslashes in the argument string.
8105
8106 @item eval_gettext
8107 See @ref{eval_gettext Invocation}.
8108
8109 @item eval_ngettext
8110 See @ref{eval_ngettext Invocation}.
8111 @end itemize
8112
8113 @node gettext Invocation, ngettext Invocation, gettext.sh, sh
8114 @subsubsection Invoking the @code{gettext} program
8115
8116 @include rt-gettext.texi
8117
8118 @node ngettext Invocation, envsubst Invocation, gettext Invocation, sh
8119 @subsubsection Invoking the @code{ngettext} program
8120
8121 @include rt-ngettext.texi
8122
8123 @node envsubst Invocation, eval_gettext Invocation, ngettext Invocation, sh
8124 @subsubsection Invoking the @code{envsubst} program
8125
8126 @include rt-envsubst.texi
8127
8128 @node eval_gettext Invocation, eval_ngettext Invocation, envsubst Invocation, sh
8129 @subsubsection Invoking the @code{eval_gettext} function
8130
8131 @cindex @code{eval_gettext} function, usage
8132 @example
8133 eval_gettext @var{msgid}
8134 @end example
8135
8136 @cindex lookup message translation
8137 This function outputs the native language translation of a textual message,
8138 performing dollar-substitution on the result.  Note that only shell variables
8139 mentioned in @var{msgid} will be dollar-substituted in the result.
8140
8141 @node eval_ngettext Invocation,  , eval_gettext Invocation, sh
8142 @subsubsection Invoking the @code{eval_ngettext} function
8143
8144 @cindex @code{eval_ngettext} function, usage
8145 @example
8146 eval_ngettext @var{msgid} @var{msgid-plural} @var{count}
8147 @end example
8148
8149 @cindex lookup plural message translation
8150 This function outputs the native language translation of a textual message
8151 whose grammatical form depends on a number, performing dollar-substitution
8152 on the result.  Note that only shell variables mentioned in @var{msgid} or
8153 @var{msgid-plural} will be dollar-substituted in the result.
8154
8155 @node bash, Python, sh, List of Programming Languages
8156 @subsection bash - Bourne-Again Shell Script
8157 @cindex bash
8158
8159 GNU @code{bash} 2.0 or newer has a special shorthand for translating a
8160 string and substituting variable values in it: @code{$"msgid"}.  But
8161 the use of this construct is @strong{discouraged}, due to the security
8162 holes it opens and due to its portability problems.
8163
8164 The security holes of @code{$"..."} come from the fact that after looking up
8165 the translation of the string, @code{bash} processes it like it processes
8166 any double-quoted string: dollar and backquote processing, like @samp{eval}
8167 does.
8168
8169 @enumerate
8170 @item
8171 In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,
8172 JOHAB, some double-byte characters have a second byte whose value is
8173 @code{0x60}.  For example, the byte sequence @code{\xe0\x60} is a single
8174 character in these locales.  Many versions of @code{bash} (all versions
8175 up to bash-2.05, and newer versions on platforms without @code{mbsrtowcs()}
8176 function) don't know about character boundaries and see a backquote character
8177 where there is only a particular Chinese character.  Thus it can start
8178 executing part of the translation as a command list.  This situation can occur
8179 even without the translator being aware of it: if the translator provides
8180 translations in the UTF-8 encoding, it is the @code{gettext()} function which
8181 will, during its conversion from the translator's encoding to the user's
8182 locale's encoding, produce the dangerous @code{\x60} bytes.
8183
8184 @item
8185 A translator could - voluntarily or inadvertantly - use backquotes
8186 @code{"`...`"} or dollar-parentheses @code{"$(...)"} in her translations.
8187 The enclosed strings would be executed as command lists by the shell.
8188 @end enumerate
8189
8190 The portability problem is that @code{bash} must be built with
8191 internationalization support; this is normally not the case on systems
8192 that don't have the @code{gettext()} function in libc.
8193
8194 @node Python, Common Lisp, bash, List of Programming Languages
8195 @subsection Python
8196 @cindex Python
8197
8198 @table @asis
8199 @item RPMs
8200 python
8201
8202 @item File extension
8203 @code{py}
8204
8205 @item String syntax
8206 @code{'abc'}, @code{u'abc'}, @code{r'abc'}, @code{ur'abc'},
8207 @*@code{"abc"}, @code{u"abc"}, @code{r"abc"}, @code{ur"abc"},
8208 @*@code{'''abc'''}, @code{u'''abc'''}, @code{r'''abc'''}, @code{ur'''abc'''},
8209 @*@code{"""abc"""}, @code{u"""abc"""}, @code{r"""abc"""}, @code{ur"""abc"""}
8210
8211 @item gettext shorthand
8212 @code{_('abc')} etc.
8213
8214 @item gettext/ngettext functions
8215 @code{gettext.gettext}, @code{gettext.dgettext},
8216 @code{gettext.ngettext}, @code{gettext.dngettext},
8217 also @code{ugettext}, @code{ungettext}
8218
8219 @item textdomain
8220 @code{gettext.textdomain} function, or
8221 @code{gettext.install(@var{domain})} function
8222
8223 @item bindtextdomain
8224 @code{gettext.bindtextdomain} function, or
8225 @code{gettext.install(@var{domain},@var{localedir})} function
8226
8227 @item setlocale
8228 not used by the gettext emulation
8229
8230 @item Prerequisite
8231 @code{import gettext}
8232
8233 @item Use or emulate GNU gettext
8234 emulate
8235
8236 @item Extractor
8237 @code{xgettext}
8238
8239 @item Formatting with positions
8240 @code{'...%(ident)d...' % @{ 'ident': value @}}
8241
8242 @item Portability
8243 fully portable
8244
8245 @item po-mode marking
8246 ---
8247 @end table
8248
8249 An example is available in the @file{examples} directory: @code{hello-python}.
8250
8251 @node Common Lisp, clisp C, Python, List of Programming Languages
8252 @subsection GNU clisp - Common Lisp
8253 @cindex Common Lisp
8254 @cindex Lisp
8255 @cindex clisp
8256
8257 @table @asis
8258 @item RPMs
8259 clisp 2.28 or newer
8260
8261 @item File extension
8262 @code{lisp}
8263
8264 @item String syntax
8265 @code{"abc"}
8266
8267 @item gettext shorthand
8268 @code{(_ "abc")}, @code{(ENGLISH "abc")}
8269
8270 @item gettext/ngettext functions
8271 @code{i18n:gettext}, @code{i18n:ngettext}
8272
8273 @item textdomain
8274 @code{i18n:textdomain}
8275
8276 @item bindtextdomain
8277 @code{i18n:textdomaindir}
8278
8279 @item setlocale
8280 automatic
8281
8282 @item Prerequisite
8283 ---
8284
8285 @item Use or emulate GNU gettext
8286 use
8287
8288 @item Extractor
8289 @code{xgettext -k_ -kENGLISH}
8290
8291 @item Formatting with positions
8292 @code{format "~1@@*~D ~0@@*~D"}
8293
8294 @item Portability
8295 On platforms without gettext, no translation.
8296
8297 @item po-mode marking
8298 ---
8299 @end table
8300
8301 An example is available in the @file{examples} directory: @code{hello-clisp}.
8302
8303 @node clisp C, Emacs Lisp, Common Lisp, List of Programming Languages
8304 @subsection GNU clisp C sources
8305 @cindex clisp C sources
8306
8307 @table @asis
8308 @item RPMs
8309 clisp
8310
8311 @item File extension
8312 @code{d}
8313
8314 @item String syntax
8315 @code{"abc"}
8316
8317 @item gettext shorthand
8318 @code{ENGLISH ? "abc" : ""}
8319 @*@code{GETTEXT("abc")}
8320 @*@code{GETTEXTL("abc")}
8321
8322 @item gettext/ngettext functions
8323 @code{clgettext}, @code{clgettextl}
8324
8325 @item textdomain
8326 ---
8327
8328 @item bindtextdomain
8329 ---
8330
8331 @item setlocale
8332 automatic
8333
8334 @item Prerequisite
8335 @code{#include "lispbibl.c"}
8336
8337 @item Use or emulate GNU gettext
8338 use
8339
8340 @item Extractor
8341 @code{clisp-xgettext}
8342
8343 @item Formatting with positions
8344 @code{fprintf "%2$d %1$d"}
8345
8346 @item Portability
8347 On platforms without gettext, no translation.
8348
8349 @item po-mode marking
8350 ---
8351 @end table
8352
8353 @node Emacs Lisp, librep, clisp C, List of Programming Languages
8354 @subsection Emacs Lisp
8355 @cindex Emacs Lisp
8356
8357 @table @asis
8358 @item RPMs
8359 emacs, xemacs
8360
8361 @item File extension
8362 @code{el}
8363
8364 @item String syntax
8365 @code{"abc"}
8366
8367 @item gettext shorthand
8368 @code{(_"abc")}
8369
8370 @item gettext/ngettext functions
8371 @code{gettext}, @code{dgettext} (xemacs only)
8372
8373 @item textdomain
8374 @code{domain} special form (xemacs only)
8375
8376 @item bindtextdomain
8377 @code{bind-text-domain} function (xemacs only)
8378
8379 @item setlocale
8380 automatic
8381
8382 @item Prerequisite
8383 ---
8384
8385 @item Use or emulate GNU gettext
8386 use
8387
8388 @item Extractor
8389 @code{xgettext}
8390
8391 @item Formatting with positions
8392 @code{format "%2$d %1$d"}
8393
8394 @item Portability
8395 Only XEmacs.  Without @code{I18N3} defined at build time, no translation.
8396
8397 @item po-mode marking
8398 ---
8399 @end table
8400
8401 @node librep, Scheme, Emacs Lisp, List of Programming Languages
8402 @subsection librep
8403 @cindex @code{librep} Lisp
8404
8405 @table @asis
8406 @item RPMs
8407 librep 0.15.3 or newer
8408
8409 @item File extension
8410 @code{jl}
8411
8412 @item String syntax
8413 @code{"abc"}
8414
8415 @item gettext shorthand
8416 @code{(_"abc")}
8417
8418 @item gettext/ngettext functions
8419 @code{gettext}
8420
8421 @item textdomain
8422 @code{textdomain} function
8423
8424 @item bindtextdomain
8425 @code{bindtextdomain} function
8426
8427 @item setlocale
8428 ---
8429
8430 @item Prerequisite
8431 @code{(require 'rep.i18n.gettext)}
8432
8433 @item Use or emulate GNU gettext
8434 use
8435
8436 @item Extractor
8437 @code{xgettext}
8438
8439 @item Formatting with positions
8440 @code{format "%2$d %1$d"}
8441
8442 @item Portability
8443 On platforms without gettext, no translation.
8444
8445 @item po-mode marking
8446 ---
8447 @end table
8448
8449 An example is available in the @file{examples} directory: @code{hello-librep}.
8450
8451 @node Scheme, Smalltalk, librep, List of Programming Languages
8452 @subsection GNU guile - Scheme
8453 @cindex Scheme
8454 @cindex guile
8455
8456 @table @asis
8457 @item RPMs
8458 guile
8459
8460 @item File extension
8461 @code{scm}
8462
8463 @item String syntax
8464 @code{"abc"}
8465
8466 @item gettext shorthand
8467 @code{(_ "abc")}
8468
8469 @item gettext/ngettext functions
8470 @code{gettext}, @code{ngettext}
8471
8472 @item textdomain
8473 @code{textdomain}
8474
8475 @item bindtextdomain
8476 @code{bindtextdomain}
8477
8478 @item setlocale
8479 @code{(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))}
8480
8481 @item Prerequisite
8482 @code{(use-modules (ice-9 format))}
8483
8484 @item Use or emulate GNU gettext
8485 use
8486
8487 @item Extractor
8488 @code{xgettext -k_}
8489
8490 @item Formatting with positions
8491 @c @code{format "~1@@*~D ~0@@*~D~2@@*"}, requires @code{(use-modules (ice-9 format))}
8492 @c not yet supported
8493 ---
8494
8495 @item Portability
8496 On platforms without gettext, no translation.
8497
8498 @item po-mode marking
8499 ---
8500 @end table
8501
8502 An example is available in the @file{examples} directory: @code{hello-guile}.
8503
8504 @node Smalltalk, Java, Scheme, List of Programming Languages
8505 @subsection GNU Smalltalk
8506 @cindex Smalltalk
8507
8508 @table @asis
8509 @item RPMs
8510 smalltalk
8511
8512 @item File extension
8513 @code{st}
8514
8515 @item String syntax
8516 @code{'abc'}
8517
8518 @item gettext shorthand
8519 @code{NLS ? 'abc'}
8520
8521 @item gettext/ngettext functions
8522 @code{LcMessagesDomain>>#at:}, @code{LcMessagesDomain>>#at:plural:with:}
8523
8524 @item textdomain
8525 @code{LcMessages>>#domain:localeDirectory:} (returns a @code{LcMessagesDomain}
8526 object).@*
8527 Example: @code{I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'}
8528
8529 @item bindtextdomain
8530 @code{LcMessages>>#domain:localeDirectory:}, see above.
8531
8532 @item setlocale
8533 Automatic if you use @code{I18N Locale default}.
8534
8535 @item Prerequisite
8536 @code{PackageLoader fileInPackage: 'I18N'!}
8537
8538 @item Use or emulate GNU gettext
8539 emulate
8540
8541 @item Extractor
8542 @code{xgettext}
8543
8544 @item Formatting with positions
8545 @code{'%1 %2' bindWith: 'Hello' with: 'world'}
8546
8547 @item Portability
8548 fully portable
8549
8550 @item po-mode marking
8551 ---
8552 @end table
8553
8554 An example is available in the @file{examples} directory:
8555 @code{hello-smalltalk}.
8556
8557 @node Java, C#, Smalltalk, List of Programming Languages
8558 @subsection Java
8559 @cindex Java
8560
8561 @table @asis
8562 @item RPMs
8563 java, java2
8564
8565 @item File extension
8566 @code{java}
8567
8568 @item String syntax
8569 "abc"
8570
8571 @item gettext shorthand
8572 _("abc")
8573
8574 @item gettext/ngettext functions
8575 @code{GettextResource.gettext}, @code{GettextResource.ngettext}
8576
8577 @item textdomain
8578 ---, use @code{ResourceBundle.getResource} instead
8579
8580 @item bindtextdomain
8581 ---, use CLASSPATH instead
8582
8583 @item setlocale
8584 automatic
8585
8586 @item Prerequisite
8587 ---
8588
8589 @item Use or emulate GNU gettext
8590 ---, uses a Java specific message catalog format
8591
8592 @item Extractor
8593 @code{xgettext -k_}
8594
8595 @item Formatting with positions
8596 @code{MessageFormat.format "@{1,number@} @{0,number@}"}
8597
8598 @item Portability
8599 fully portable
8600
8601 @item po-mode marking
8602 ---
8603 @end table
8604
8605 Before marking strings as internationalizable, uses of the string
8606 concatenation operator need to be converted to @code{MessageFormat}
8607 applications.  For example, @code{"file "+filename+" not found"} becomes
8608 @code{MessageFormat.format("file @{0@} not found", new Object[] @{ filename @})}.
8609 Only after this is done, can the strings be marked and extracted.
8610
8611 GNU gettext uses the native Java internationalization mechanism, namely
8612 @code{ResourceBundle}s.  There are two formats of @code{ResourceBundle}s:
8613 @code{.properties} files and @code{.class} files.  The @code{.properties}
8614 format is a text file which the translators can directly edit, like PO
8615 files, but which doesn't support plural forms.  Whereas the @code{.class}
8616 format is compiled from @code{.java} source code and can support plural
8617 forms (provided it is accessed through an appropriate API, see below).
8618
8619 To convert a PO file to a @code{.properties} file, the @code{msgcat}
8620 program can be used with the option @code{--properties-output}.  To convert
8621 a @code{.properties} file back to a PO file, the @code{msgcat} program
8622 can be used with the option @code{--properties-input}.  All the tools
8623 that manipulate PO files can work with @code{.properties} files as well,
8624 if given the @code{--properties-input} and/or @code{--properties-output}
8625 option.
8626
8627 To convert a PO file to a ResourceBundle class, the @code{msgfmt} program
8628 can be used with the option @code{--java} or @code{--java2}.  To convert a
8629 ResourceBundle back to a PO file, the @code{msgunfmt} program can be used
8630 with the option @code{--java}.
8631
8632 Two different programmatic APIs can be used to access ResourceBundles.
8633 Note that both APIs work with all kinds of ResourceBundles, whether
8634 GNU gettext generated classes, or other @code{.class} or @code{.properties}
8635 files.
8636
8637 @enumerate
8638 @item
8639 The @code{java.util.ResourceBundle} API.
8640
8641 In particular, its @code{getString} function returns a string translation.
8642 Note that a missing translation yields a @code{MissingResourceException}.
8643
8644 This has the advantage of being the standard API.  And it does not require
8645 any additional libraries, only the @code{msgcat} generated @code{.properties}
8646 files or the @code{msgfmt} generated @code{.class} files.  But it cannot do
8647 plural handling, even if the resource was generated by @code{msgfmt} from
8648 a PO file with plural handling.
8649
8650 @item
8651 The @code{gnu.gettext.GettextResource} API.
8652
8653 Reference documentation in Javadoc 1.1 style format
8654 is in the @uref{javadoc1/tree.html,javadoc1 directory} and
8655 in Javadoc 2 style format
8656 in the @uref{javadoc2/index.html,javadoc2 directory}.
8657
8658 Its @code{gettext} function returns a string translation.  Note that when
8659 a translation is missing, the @var{msgid} argument is returned unchanged.
8660
8661 This has the advantage of having the @code{ngettext} function for plural
8662 handling.
8663
8664 @cindex @code{libintl} for Java
8665 To use this API, one needs the @code{libintl.jar} file which is part of
8666 the GNU gettext package and distributed under the LGPL.
8667 @end enumerate
8668
8669 Three examples, using the second API, are available in the @file{examples}
8670 directory: @code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing}.
8671
8672 Now, to make use of the API and define a shorthand for @samp{getString},
8673 there are two idioms that you can choose from:
8674
8675 @itemize @bullet
8676 @item
8677 In a unique class of your project, say @samp{Util}, define a static variable
8678 holding the @code{ResourceBundle} instance:
8679
8680 @smallexample
8681 public static ResourceBundle myResources =
8682   ResourceBundle.getBundle("domain-name");
8683 @end smallexample
8684
8685 All classes containing internationalized strings then contain
8686
8687 @smallexample
8688 private static ResourceBundle res = Util.myResources;
8689 private static String _(String s) @{ return res.getString(s); @}
8690 @end smallexample
8691
8692 @noindent
8693 and the shorthand is used like this:
8694
8695 @smallexample
8696 System.out.println(_("Operation completed."));
8697 @end smallexample
8698
8699 @item
8700 You add a class with a very short name, say @samp{S}, containing just the
8701 definition of the resource bundle and of the shorthand:
8702
8703 @smallexample
8704 public class S @{
8705   public static ResourceBundle myResources =
8706     ResourceBundle.getBundle("domain-name");
8707   public static String _(String s) @{
8708     return myResources.getString(s);
8709   @}
8710 @}
8711 @end smallexample
8712
8713 @noindent
8714 and the shorthand is used like this:
8715
8716 @smallexample
8717 System.out.println(S._("Operation completed."));
8718 @end smallexample
8719 @end itemize
8720
8721 Which of the two idioms you choose, will depend on whether copying two lines
8722 of codes into every class is more acceptable in your project than a class
8723 with a single-letter name.
8724
8725 @node C#, gawk, Java, List of Programming Languages
8726 @subsection C#
8727 @cindex C#
8728
8729 @table @asis
8730 @item RPMs
8731 pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer
8732
8733 @item File extension
8734 @code{cs}
8735
8736 @item String syntax
8737 @code{"abc"}, @code{@@"abc"}
8738
8739 @item gettext shorthand
8740 _("abc")
8741
8742 @item gettext/ngettext functions
8743 @code{GettextResourceManager.GetString},
8744 @code{GettextResourceManager.GetPluralString}
8745
8746 @item textdomain
8747 @code{new GettextResourceManager(domain)}
8748
8749 @item bindtextdomain
8750 ---, compiled message catalogs are located in subdirectories of the directory
8751 containing the executable
8752
8753 @item setlocale
8754 automatic
8755
8756 @item Prerequisite
8757 ---
8758
8759 @item Use or emulate GNU gettext
8760 ---, uses a C# specific message catalog format
8761
8762 @item Extractor
8763 @code{xgettext -k_}
8764
8765 @item Formatting with positions
8766 @code{String.Format "@{1@} @{0@}"}
8767
8768 @item Portability
8769 fully portable
8770
8771 @item po-mode marking
8772 ---
8773 @end table
8774
8775 Before marking strings as internationalizable, uses of the string
8776 concatenation operator need to be converted to @code{String.Format}
8777 invocations.  For example, @code{"file "+filename+" not found"} becomes
8778 @code{String.Format("file @{0@} not found", filename)}.
8779 Only after this is done, can the strings be marked and extracted.
8780
8781 GNU gettext uses the native C#/.NET internationalization mechanism, namely
8782 the classes @code{ResourceManager} and @code{ResourceSet}.  Applications
8783 use the @code{ResourceManager} methods to retrieve the native language
8784 translation of strings.  An instance of @code{ResourceSet} is the in-memory
8785 representation of a message catalog file.  The @code{ResourceManager} loads
8786 and accesses @code{ResourceSet} instances as needed to look up the
8787 translations.
8788
8789 There are two formats of @code{ResourceSet}s that can be directly loaded by
8790 the C# runtime: @code{.resources} files and @code{.dll} files.
8791
8792 @itemize @bullet
8793 @item
8794 The @code{.resources} format is a binary file usually generated through the
8795 @code{resgen} or @code{monoresgen} utility, but which doesn't support plural
8796 forms.  @code{.resources} files can also be embedded in .NET @code{.exe} files.
8797 This only affects whether a file system access is performed to load the message
8798 catalog; it doesn't affect the contents of the message catalog.
8799
8800 @item
8801 On the other hand, the @code{.dll} format is a binary file that is compiled
8802 from @code{.cs} source code and can support plural forms (provided it is
8803 accessed through the GNU gettext API, see below).
8804 @end itemize
8805
8806 Note that these .NET @code{.dll} and @code{.exe} files are not tied to a
8807 particular platform; their file format and GNU gettext for C# can be used
8808 on any platform.
8809
8810 To convert a PO file to a @code{.resources} file, the @code{msgfmt} program
8811 can be used with the option @samp{--csharp-resources}.  To convert a
8812 @code{.resources} file back to a PO file, the @code{msgunfmt} program can be
8813 used with the option @samp{--csharp-resources}.  You can also, in some cases,
8814 use the @code{resgen} program (from the @code{pnet} package) or the
8815 @code{monoresgen} program (from the @code{mono}/@code{mcs} package).  These
8816 programs can also convert a @code{.resources} file back to a PO file.  But
8817 beware: as of this writing (January 2004), the @code{monoresgen} converter is
8818 quite buggy and the @code{resgen} converter ignores the encoding of the PO
8819 files.
8820
8821 To convert a PO file to a @code{.dll} file, the @code{msgfmt} program can be
8822 used with the option @code{--csharp}.  The result will be a @code{.dll} file
8823 containing a subclass of @code{GettextResourceSet}, which itself is a subclass
8824 of @code{ResourceSet}.  To convert a @code{.dll} file containing a
8825 @code{GettextResourceSet} subclass back to a PO file, the @code{msgunfmt}
8826 program can be used with the option @code{--csharp}.
8827
8828 The advantages of the @code{.dll} format over the @code{.resources} format
8829 are:
8830
8831 @enumerate
8832 @item
8833 Freedom to localize: Users can add their own translations to an application
8834 after it has been built and distributed.  Whereas when the programmer uses
8835 a @code{ResourceManager} constructor provided by the system, the set of
8836 @code{.resources} files for an application must be specified when the
8837 application is built and cannot be extended afterwards.
8838 @c If this were the only issue with the @code{.resources} format, one could
8839 @c use the @code{ResourceManager.CreateFileBasedResourceManager} function.
8840
8841 @item
8842 Plural handling: A message catalog in @code{.dll} format supports the plural
8843 handling function @code{GetPluralString}.  Whereas @code{.resources} files can
8844 only contain data and only support lookups that depend on a single string.
8845
8846 @item
8847 The @code{GettextResourceManager} that loads the message catalogs in
8848 @code{.dll} format also provides for inheritance on a per-message basis.
8849 For example, in Austrian (@code{de_AT}) locale, translations from the German
8850 (@code{de}) message catalog will be used for messages not found in the
8851 Austrian message catalog.  This has the consequence that the Austrian
8852 translators need only translate those few messages for which the translation
8853 into Austrian differs from the German one.  Whereas when working with
8854 @code{.resources} files, each message catalog must provide the translations
8855 of all messages by itself.
8856
8857 @item
8858 The @code{GettextResourceManager} that loads the message catalogs in
8859 @code{.dll} format also provides for a fallback: The English @var{msgid} is
8860 returned when no translation can be found.  Whereas when working with
8861 @code{.resources} files, a language-neutral @code{.resources} file must
8862 explicitly be provided as a fallback.
8863 @end enumerate
8864
8865 On the side of the programmatic APIs, the programmer can use either the
8866 standard @code{ResourceManager} API and the GNU @code{GettextResourceManager}
8867 API.  The latter is an extension of the former, because
8868 @code{GettextResourceManager} is a subclass of @code{ResourceManager}.
8869
8870 @enumerate
8871 @item
8872 The @code{System.Resources.ResourceManager} API.
8873
8874 This API works with resources in @code{.resources} format.
8875
8876 The creation of the @code{ResourceManager} is done through
8877 @smallexample
8878   new ResourceManager(domainname, Assembly.GetExecutingAssembly())
8879 @end smallexample
8880 @noindent
8881
8882 The @code{GetString} function returns a string's translation.  Note that this
8883 function returns null when a translation is missing (i.e. not even found in
8884 the fallback resource file).
8885
8886 @item
8887 The @code{GNU.Gettext.GettextResourceManager} API.
8888
8889 This API works with resources in @code{.dll} format.
8890
8891 Reference documentation is in the
8892 @uref{csharpdoc/index.html,csharpdoc directory}.
8893
8894 The creation of the @code{ResourceManager} is done through
8895 @smallexample
8896   new GettextResourceManager(domainname)
8897 @end smallexample
8898
8899 The @code{GetString} function returns a string's translation.  Note that when
8900 a translation is missing, the @var{msgid} argument is returned unchanged.
8901
8902 The @code{GetPluralString} function returns a string translation with plural
8903 handling, like the @code{ngettext} function in C.
8904
8905 @cindex @code{libintl} for C#
8906 To use this API, one needs the @code{GNU.Gettext.dll} file which is part of
8907 the GNU gettext package and distributed under the LGPL.
8908 @end enumerate
8909
8910 You can also mix both approaches: use the
8911 @code{GNU.Gettext.GettextResourceManager} constructor, but otherwise use
8912 only the @code{ResourceManager} type and only the @code{GetString} method.
8913 This is appropriate when you want to profit from the tools for PO files,
8914 but don't want to change an existing source code that uses
8915 @code{ResourceManager} and don't (yet) need the @code{GetPluralString} method.
8916
8917 Two examples, using the second API, are available in the @file{examples}
8918 directory: @code{hello-csharp}, @code{hello-csharp-forms}.
8919
8920 Now, to make use of the API and define a shorthand for @samp{GetString},
8921 there are two idioms that you can choose from:
8922
8923 @itemize @bullet
8924 @item
8925 In a unique class of your project, say @samp{Util}, define a static variable
8926 holding the @code{ResourceManager} instance:
8927
8928 @smallexample
8929 public static GettextResourceManager MyResourceManager =
8930   new GettextResourceManager("domain-name");
8931 @end smallexample
8932
8933 All classes containing internationalized strings then contain
8934
8935 @smallexample
8936 private static GettextResourceManager Res = Util.MyResourceManager;
8937 private static String _(String s) @{ return Res.GetString(s); @}
8938 @end smallexample
8939
8940 @noindent
8941 and the shorthand is used like this:
8942
8943 @smallexample
8944 Console.WriteLine(_("Operation completed."));
8945 @end smallexample
8946
8947 @item
8948 You add a class with a very short name, say @samp{S}, containing just the
8949 definition of the resource manager and of the shorthand:
8950
8951 @smallexample
8952 public class S @{
8953   public static GettextResourceManager MyResourceManager =
8954     new GettextResourceManager("domain-name");
8955   public static String _(String s) @{
8956      return MyResourceManager.GetString(s);
8957   @}
8958 @}
8959 @end smallexample
8960
8961 @noindent
8962 and the shorthand is used like this:
8963
8964 @smallexample
8965 Console.WriteLine(S._("Operation completed."));
8966 @end smallexample
8967 @end itemize
8968
8969 Which of the two idioms you choose, will depend on whether copying two lines
8970 of codes into every class is more acceptable in your project than a class
8971 with a single-letter name.
8972
8973 @node gawk, Pascal, C#, List of Programming Languages
8974 @subsection GNU awk
8975 @cindex awk
8976 @cindex gawk
8977
8978 @table @asis
8979 @item RPMs
8980 gawk 3.1 or newer
8981
8982 @item File extension
8983 @code{awk}
8984
8985 @item String syntax
8986 @code{"abc"}
8987
8988 @item gettext shorthand
8989 @code{_"abc"}
8990
8991 @item gettext/ngettext functions
8992 @code{dcgettext}, missing @code{dcngettext} in gawk-3.1.0
8993
8994 @item textdomain
8995 @code{TEXTDOMAIN} variable
8996
8997 @item bindtextdomain
8998 @code{bindtextdomain} function
8999
9000 @item setlocale
9001 automatic, but missing @code{setlocale (LC_MESSAGES, "")} in gawk-3.1.0
9002
9003 @item Prerequisite
9004 ---
9005
9006 @item Use or emulate GNU gettext
9007 use
9008
9009 @item Extractor
9010 @code{xgettext}
9011
9012 @item Formatting with positions
9013 @code{printf "%2$d %1$d"} (GNU awk only)
9014
9015 @item Portability
9016 On platforms without gettext, no translation.  On non-GNU awks, you must
9017 define @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain}
9018 yourself.
9019
9020 @item po-mode marking
9021 ---
9022 @end table
9023
9024 An example is available in the @file{examples} directory: @code{hello-gawk}.
9025
9026 @node Pascal, wxWindows, gawk, List of Programming Languages
9027 @subsection Pascal - Free Pascal Compiler
9028 @cindex Pascal
9029 @cindex Free Pascal
9030 @cindex Object Pascal
9031
9032 @table @asis
9033 @item RPMs
9034 fpk
9035
9036 @item File extension
9037 @code{pp}, @code{pas}
9038
9039 @item String syntax
9040 @code{'abc'}
9041
9042 @item gettext shorthand
9043 automatic
9044
9045 @item gettext/ngettext functions
9046 ---, use @code{ResourceString} data type instead
9047
9048 @item textdomain
9049 ---, use @code{TranslateResourceStrings} function instead
9050
9051 @item bindtextdomain
9052 ---, use @code{TranslateResourceStrings} function instead
9053
9054 @item setlocale
9055 automatic, but uses only LANG, not LC_MESSAGES or LC_ALL
9056
9057 @item Prerequisite
9058 @code{@{$mode delphi@}} or @code{@{$mode objfpc@}}@*@code{uses gettext;}
9059
9060 @item Use or emulate GNU gettext
9061 emulate partially
9062
9063 @item Extractor
9064 @code{ppc386} followed by @code{xgettext} or @code{rstconv}
9065
9066 @item Formatting with positions
9067 @code{uses sysutils;}@*@code{format "%1:d %0:d"}
9068
9069 @item Portability
9070 ?
9071
9072 @item po-mode marking
9073 ---
9074 @end table
9075
9076 The Pascal compiler has special support for the @code{ResourceString} data
9077 type.  It generates a @code{.rst} file.  This is then converted to a
9078 @code{.pot} file by use of @code{xgettext} or @code{rstconv}.  At runtime,
9079 a @code{.mo} file corresponding to translations of this @code{.pot} file
9080 can be loaded using the @code{TranslateResourceStrings} function in the
9081 @code{gettext} unit.
9082
9083 An example is available in the @file{examples} directory: @code{hello-pascal}.
9084
9085 @node wxWindows, YCP, Pascal, List of Programming Languages
9086 @subsection wxWindows library
9087 @cindex @code{wxWindows} library
9088
9089 @table @asis
9090 @item RPMs
9091 wxGTK, gettext
9092
9093 @item File extension
9094 @code{cpp}
9095
9096 @item String syntax
9097 @code{"abc"}
9098
9099 @item gettext shorthand
9100 @code{_("abc")}
9101
9102 @item gettext/ngettext functions
9103 @code{wxLocale::GetString}, @code{wxGetTranslation}
9104
9105 @item textdomain
9106 @code{wxLocale::AddCatalog}
9107
9108 @item bindtextdomain
9109 @code{wxLocale::AddCatalogLookupPathPrefix}
9110
9111 @item setlocale
9112 @code{wxLocale::Init}, @code{wxSetLocale}
9113
9114 @item Prerequisite
9115 @code{#include <wx/intl.h>}
9116
9117 @item Use or emulate GNU gettext
9118 emulate, see @code{include/wx/intl.h} and @code{src/common/intl.cpp}
9119
9120 @item Extractor
9121 @code{xgettext}
9122
9123 @item Formatting with positions
9124 ---
9125
9126 @item Portability
9127 fully portable
9128
9129 @item po-mode marking
9130 yes
9131 @end table
9132
9133 @node YCP, Tcl, wxWindows, List of Programming Languages
9134 @subsection YCP - YaST2 scripting language
9135 @cindex YCP
9136 @cindex YaST2 scripting language
9137
9138 @table @asis
9139 @item RPMs
9140 libycp, libycp-devel, yast2-core, yast2-core-devel
9141
9142 @item File extension
9143 @code{ycp}
9144
9145 @item String syntax
9146 @code{"abc"}
9147
9148 @item gettext shorthand
9149 @code{_("abc")}
9150
9151 @item gettext/ngettext functions
9152 @code{_()} with 1 or 3 arguments
9153
9154 @item textdomain
9155 @code{textdomain} statement
9156
9157 @item bindtextdomain
9158 ---
9159
9160 @item setlocale
9161 ---
9162
9163 @item Prerequisite
9164 ---
9165
9166 @item Use or emulate GNU gettext
9167 use
9168
9169 @item Extractor
9170 @code{xgettext}
9171
9172 @item Formatting with positions
9173 @code{sformat "%2 %1"}
9174
9175 @item Portability
9176 fully portable
9177
9178 @item po-mode marking
9179 ---
9180 @end table
9181
9182 An example is available in the @file{examples} directory: @code{hello-ycp}.
9183
9184 @node Tcl, Perl, YCP, List of Programming Languages
9185 @subsection Tcl - Tk's scripting language
9186 @cindex Tcl
9187 @cindex Tk's scripting language
9188
9189 @table @asis
9190 @item RPMs
9191 tcl
9192
9193 @item File extension
9194 @code{tcl}
9195
9196 @item String syntax
9197 @code{"abc"}
9198
9199 @item gettext shorthand
9200 @code{[_ "abc"]}
9201
9202 @item gettext/ngettext functions
9203 @code{::msgcat::mc}
9204
9205 @item textdomain
9206 ---
9207
9208 @item bindtextdomain
9209 ---, use @code{::msgcat::mcload} instead
9210
9211 @item setlocale
9212 automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL
9213
9214 @item Prerequisite
9215 @code{package require msgcat}
9216 @*@code{proc _ @{s@} @{return [::msgcat::mc $s]@}}
9217
9218 @item Use or emulate GNU gettext
9219 ---, uses a Tcl specific message catalog format
9220
9221 @item Extractor
9222 @code{xgettext -k_}
9223
9224 @item Formatting with positions
9225 @code{format "%2\$d %1\$d"}
9226
9227 @item Portability
9228 fully portable
9229
9230 @item po-mode marking
9231 ---
9232 @end table
9233
9234 Two examples are available in the @file{examples} directory:
9235 @code{hello-tcl}, @code{hello-tcl-tk}.
9236
9237 Before marking strings as internationalizable, substitutions of variables
9238 into the string need to be converted to @code{format} applications.  For
9239 example, @code{"file $filename not found"} becomes
9240 @code{[format "file %s not found" $filename]}.
9241 Only after this is done, can the strings be marked and extracted.
9242 After marking, this example becomes
9243 @code{[format [_ "file %s not found"] $filename]} or
9244 @code{[msgcat::mc "file %s not found" $filename]}.  Note that the
9245 @code{msgcat::mc} function implicitly calls @code{format} when more than one
9246 argument is given.
9247
9248 @node Perl, PHP, Tcl, List of Programming Languages
9249 @subsection Perl
9250 @cindex Perl
9251
9252 @table @asis
9253 @item RPMs
9254 perl
9255
9256 @item File extension
9257 @code{pl}, @code{PL}, @code{pm}, @code{cgi}
9258
9259 @item String syntax
9260 @itemize @bullet
9261
9262 @item @code{"abc"}
9263
9264 @item @code{'abc'}
9265
9266 @item @code{qq (abc)}
9267
9268 @item @code{q (abc)}
9269
9270 @item @code{qr /abc/}
9271
9272 @item @code{qx (/bin/date)}
9273
9274 @item @code{/pattern match/}
9275
9276 @item @code{?pattern match?}
9277
9278 @item @code{s/substitution/operators/}
9279
9280 @item @code{$tied_hash@{"message"@}}
9281
9282 @item @code{$tied_hash_reference->@{"message"@}}
9283
9284 @item etc., issue the command @samp{man perlsyn} for details
9285
9286 @end itemize
9287
9288 @item gettext shorthand
9289 @code{__} (double underscore)
9290
9291 @item gettext/ngettext functions
9292 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
9293 @code{dngettext}, @code{dcngettext}
9294
9295 @item textdomain
9296 @code{textdomain} function
9297
9298 @item bindtextdomain
9299 @code{bindtextdomain} function
9300
9301 @item bind_textdomain_codeset
9302 @code{bind_textdomain_codeset} function
9303
9304 @item setlocale
9305 Use @code{setlocale (LC_ALL, "");}
9306
9307 @item Prerequisite
9308 @code{use POSIX;}
9309 @*@code{use Locale::TextDomain;} (included in the package libintl-perl
9310 which is available on the Comprehensive Perl Archive Network CPAN,
9311 http://www.cpan.org/).
9312
9313 @item Use or emulate GNU gettext
9314 platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext
9315
9316 @item Extractor
9317 @code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k}
9318
9319 @item Formatting with positions
9320 Both kinds of format strings support formatting with positions.
9321 @*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer)
9322 @*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)}
9323
9324 @item Portability
9325 The @code{libintl-perl} package is platform independent but is not
9326 part of the Perl core.  The programmer is responsible for
9327 providing a dummy implementation of the required functions if the
9328 package is not installed on the target system.
9329
9330 @item po-mode marking
9331 ---
9332
9333 @item Documentation
9334 Included in @code{libintl-perl}, available on CPAN
9335 (http://www.cpan.org/).
9336
9337 @end table
9338
9339 An example is available in the @file{examples} directory: @code{hello-perl}.
9340
9341 @cindex marking Perl sources
9342
9343 The @code{xgettext} parser backend for Perl differs significantly from
9344 the parser backends for other programming languages, just as Perl
9345 itself differs significantly from other programming languages.  The
9346 Perl parser backend offers many more string marking facilities than
9347 the other backends but it also has some Perl specific limitations, the
9348 worst probably being its imperfectness.
9349
9350 @menu
9351 * General Problems::            General Problems Parsing Perl Code
9352 * Default Keywords::            Which Keywords Will xgettext Look For?
9353 * Special Keywords::            How to Extract Hash Keys
9354 * Quote-like Expressions::      What are Strings And Quote-like Expressions?
9355 * Interpolation I::             Invalid String Interpolation
9356 * Interpolation II::            Valid String Interpolation
9357 * Parentheses::                 When To Use Parentheses
9358 * Long Lines::                  How To Grok with Long Lines
9359 * Perl Pitfalls::               Bugs, Pitfalls, and Things That Do Not Work
9360 @end menu
9361
9362 @node General Problems, Default Keywords,  , Perl
9363 @subsubsection General Problems Parsing Perl Code
9364
9365 It is often heard that only Perl can parse Perl.  This is not true.
9366 Perl cannot be @emph{parsed} at all, it can only be @emph{executed}.
9367 Perl has various built-in ambiguities that can only be resolved at runtime.
9368
9369 The following example may illustrate one common problem:
9370
9371 @example
9372 print gettext "Hello World!";
9373 @end example
9374
9375 Although this example looks like a bullet-proof case of a function
9376 invocation, it is not:
9377
9378 @example
9379 open gettext, ">testfile" or die;
9380 print gettext "Hello world!"
9381 @end example
9382
9383 In this context, the string @code{gettext} looks more like a
9384 file handle.  But not necessarily:
9385
9386 @example
9387 use Locale::Messages qw (:libintl_h);
9388 open gettext ">testfile" or die;
9389 print gettext "Hello world!";
9390 @end example
9391
9392 Now, the file is probably syntactically incorrect, provided that the module
9393 @code{Locale::Messages} found first in the Perl include path exports a
9394 function @code{gettext}.  But what if the module
9395 @code{Locale::Messages} really looks like this?
9396
9397 @example
9398 use vars qw (*gettext);
9399
9400 1;
9401 @end example
9402
9403 In this case, the string @code{gettext} will be interpreted as a file
9404 handle again, and the above example will create a file @file{testfile}
9405 and write the string ``Hello world!'' into it.  Even advanced
9406 control flow analysis will not really help:
9407
9408 @example
9409 if (0.5 < rand) @{
9410    eval "use Sane";
9411 @} else @{
9412    eval "use InSane";
9413 @}
9414 print gettext "Hello world!";
9415 @end example
9416
9417 If the module @code{Sane} exports a function @code{gettext} that does
9418 what we expect, and the module @code{InSane} opens a file for writing
9419 and associates the @emph{handle} @code{gettext} with this output
9420 stream, we are clueless again about what will happen at runtime.  It is
9421 completely unpredictable.  The truth is that Perl has so many ways to
9422 fill its symbol table at runtime that it is impossible to interpret a
9423 particular piece of code without executing it.
9424
9425 Of course, @code{xgettext} will not execute your Perl sources while
9426 scanning for translatable strings, but rather use heuristics in order
9427 to guess what you meant.
9428
9429 Another problem is the ambiguity of the slash and the question mark.
9430 Their interpretation depends on the context:
9431
9432 @example
9433 # A pattern match.
9434 print "OK\n" if /foobar/;
9435
9436 # A division.
9437 print 1 / 2;
9438
9439 # Another pattern match.
9440 print "OK\n" if ?foobar?;
9441
9442 # Conditional.
9443 print $x ? "foo" : "bar";
9444 @end example
9445
9446 The slash may either act as the division operator or introduce a
9447 pattern match, whereas the question mark may act as the ternary
9448 conditional operator or as a pattern match, too.  Other programming
9449 languages like @code{awk} present similar problems, but the consequences of a
9450 misinterpretation are particularly nasty with Perl sources.  In @code{awk}
9451 for instance, a statement can never exceed one line and the parser
9452 can recover from a parsing error at the next newline and interpret
9453 the rest of the input stream correctly.  Perl is different, as a
9454 pattern match is terminated by the next appearance of the delimiter
9455 (the slash or the question mark) in the input stream, regardless of
9456 the semantic context.  If a slash is really a division sign but
9457 mis-interpreted as a pattern match, the rest of the input file is most
9458 probably parsed incorrectly.
9459
9460 If you find that @code{xgettext} fails to extract strings from
9461 portions of your sources, you should therefore look out for slashes
9462 and/or question marks preceding these sections.  You may have come
9463 across a bug in @code{xgettext}'s Perl parser (and of course you
9464 should report that bug).  In the meantime you should consider to
9465 reformulate your code in a manner less challenging to @code{xgettext}.
9466
9467 @node Default Keywords, Special Keywords, General Problems, Perl
9468 @subsubsection Which keywords will xgettext look for?
9469 @cindex Perl default keywords
9470
9471 Unless you instruct @code{xgettext} otherwise by invoking it with one
9472 of the options @code{--keyword} or @code{-k}, it will recognize the
9473 following keywords in your Perl sources:
9474
9475 @itemize @bullet
9476
9477 @item @code{gettext}
9478
9479 @item @code{dgettext}
9480
9481 @item @code{dcgettext}
9482
9483 @item @code{ngettext:1,2}
9484
9485 The first (singular) and the second (plural) argument will be
9486 extracted.
9487
9488 @item @code{dngettext:1,2}
9489
9490 The first (singular) and the second (plural) argument will be
9491 extracted.
9492
9493 @item @code{dcngettext:1,2}
9494
9495 The first (singular) and the second (plural) argument will be
9496 extracted.
9497
9498 @item @code{gettext_noop}
9499
9500 @item @code{%gettext}
9501
9502 The keys of lookups into the hash @code{%gettext} will be extracted.
9503
9504 @item @code{$gettext}
9505
9506 The keys of lookups into the hash reference @code{$gettext} will be extracted.
9507
9508 @end itemize
9509
9510 @node Special Keywords, Quote-like Expressions, Default Keywords, Perl
9511 @subsubsection How to Extract Hash Keys
9512 @cindex Perl special keywords for hash-lookups
9513
9514 Translating messages at runtime is normally performed by looking up the
9515 original string in the translation database and returning the
9516 translated version.  The ``natural'' Perl implementation is a hash
9517 lookup, and, of course, @code{xgettext} supports such practice.
9518
9519 @example
9520 print __"Hello world!";
9521 print $__@{"Hello world!"@};
9522 print $__->@{"Hello world!"@};
9523 print $$__@{"Hello world!"@};
9524 @end example
9525
9526 The above four lines all do the same thing.  The Perl module
9527 @code{Locale::TextDomain} exports by default a hash @code{%__} that
9528 is tied to the function @code{__()}.  It also exports a reference
9529 @code{$__} to @code{%__}.
9530
9531 If an argument to the @code{xgettext} option @code{--keyword},
9532 resp. @code{-k} starts with a percent sign, the rest of the keyword is
9533 interpreted as the name of a hash.  If it starts with a dollar
9534 sign, the rest of the keyword is interpreted as a reference to a
9535 hash.
9536
9537 Note that you can omit the quotation marks (single or double) around
9538 the hash key (almost) whenever Perl itself allows it:
9539
9540 @example
9541 print $gettext@{Error@};
9542 @end example
9543
9544 The exact rule is: You can omit the surrounding quotes, when the hash
9545 key is a valid C (!) identifier, i. e. when it starts with an
9546 underscore or an ASCII letter and is followed by an arbitrary number
9547 of underscores, ASCII letters or digits.  Other Unicode characters
9548 are @emph{not} allowed, regardless of the @code{use utf8} pragma.
9549
9550 @node Quote-like Expressions, Interpolation I, Special Keywords, Perl
9551 @subsubsection What are Strings And Quote-like Expressions?
9552 @cindex Perl quote-like expressions
9553
9554 Perl offers a plethora of different string constructs.  Those that can
9555 be used either as arguments to functions or inside braces for hash
9556 lookups are generally supported by @code{xgettext}.
9557
9558 @itemize @bullet
9559 @item @strong{double-quoted strings}
9560 @*
9561 @example
9562 print gettext "Hello World!";
9563 @end example
9564
9565 @item @strong{single-quoted strings}
9566 @*
9567 @example
9568 print gettext 'Hello World!';
9569 @end example
9570
9571 @item @strong{the operator qq}
9572 @*
9573 @example
9574 print gettext qq |Hello World!|;
9575 print gettext qq <E-mail: <guido\@@imperia.net>>;
9576 @end example
9577
9578 The operator @code{qq} is fully supported.  You can use arbitrary
9579 delimiters, including the four bracketing delimiters (round, angle,
9580 square, curly) that nest.
9581
9582 @item @strong{the operator q}
9583 @*
9584 @example
9585 print gettext q |Hello World!|;
9586 print gettext q <E-mail: <guido@@imperia.net>>;
9587 @end example
9588
9589 The operator @code{q} is fully supported.  You can use arbitrary
9590 delimiters, including the four bracketing delimiters (round, angle,
9591 square, curly) that nest.
9592
9593 @item @strong{the operator qx}
9594 @*
9595 @example
9596 print gettext qx ;LANGUAGE=C /bin/date;
9597 print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
9598 @end example
9599
9600 The operator @code{qx} is fully supported.  You can use arbitrary
9601 delimiters, including the four bracketing delimiters (round, angle,
9602 square, curly) that nest.
9603
9604 The example is actually a useless use of @code{gettext}.  It will
9605 invoke the @code{gettext} function on the output of the command
9606 specified with the @code{qx} operator.  The feature was included
9607 in order to make the interface consistent (the parser will extract
9608 all strings and quote-like expressions).
9609
9610 @item @strong{here documents}
9611 @*
9612 @example
9613 @group
9614 print gettext <<'EOF';
9615 program not found in $PATH
9616 EOF
9617
9618 print ngettext <<EOF, <<"EOF";
9619 one file deleted
9620 EOF
9621 several files deleted
9622 EOF
9623 @end group
9624 @end example
9625
9626 Here-documents are recognized.  If the delimiter is enclosed in single
9627 quotes, the string is not interpolated.  If it is enclosed in double
9628 quotes or has no quotes at all, the string is interpolated.
9629
9630 Delimiters that start with a digit are not supported!
9631
9632 @end itemize
9633
9634 @node Interpolation I, Interpolation II, Quote-like Expressions, Perl
9635 @subsubsection Invalid Uses Of String Interpolation
9636 @cindex Perl invalid string interpolation
9637
9638 Perl is capable of interpolating variables into strings.  This offers
9639 some nice features in localized programs but can also lead to
9640 problems.
9641
9642 A common error is a construct like the following:
9643
9644 @example
9645 print gettext "This is the program $0!\n";
9646 @end example
9647
9648 Perl will interpolate at runtime the value of the variable @code{$0}
9649 into the argument of the @code{gettext()} function.  Hence, this
9650 argument is not a string constant but a variable argument (@code{$0}
9651 is a global variable that holds the name of the Perl script being
9652 executed).  The interpolation is performed by Perl before the string
9653 argument is passed to @code{gettext()} and will therefore depend on
9654 the name of the script which can only be determined at runtime.
9655 Consequently, it is almost impossible that a translation can be looked
9656 up at runtime (except if, by accident, the interpolated string is found
9657 in the message catalog).
9658
9659 The @code{xgettext} program will therefore terminate parsing with a fatal
9660 error if it encounters a variable inside of an extracted string.  In
9661 general, this will happen for all kinds of string interpolations that
9662 cannot be safely performed at compile time.  If you absolutely know
9663 what you are doing, you can always circumvent this behavior:
9664
9665 @example
9666 my $know_what_i_am_doing = "This is program $0!\n";
9667 print gettext $know_what_i_am_doing;
9668 @end example
9669
9670 Since the parser only recognizes strings and quote-like expressions,
9671 but not variables or other terms, the above construct will be
9672 accepted.  You will have to find another way, however, to let your
9673 original string make it into your message catalog.
9674
9675 If invoked with the option @code{--extract-all}, resp. @code{-a},
9676 variable interpolation will be accepted.  Rationale: You will
9677 generally use this option in order to prepare your sources for
9678 internationalization.
9679
9680 Please see the manual page @samp{man perlop} for details of strings and
9681 quote-like expressions that are subject to interpolation and those
9682 that are not.  Safe interpolations (that will not lead to a fatal
9683 error) are:
9684
9685 @itemize @bullet
9686
9687 @item the escape sequences @code{\t} (tab, HT, TAB), @code{\n}
9688 (newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF),
9689 @code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e}
9690 (escape, ESC).
9691
9692 @item octal chars, like @code{\033}
9693 @*
9694 Note that octal escapes in the range of 400-777 are translated into a
9695 UTF-8 representation, regardless of the presence of the @code{use utf8} pragma.
9696
9697 @item hex chars, like @code{\x1b}
9698
9699 @item wide hex chars, like @code{\x@{263a@}}
9700 @*
9701 Note that this escape is translated into a UTF-8 representation,
9702 regardless of the presence of the @code{use utf8} pragma.
9703
9704 @item control chars, like @code{\c[} (CTRL-[)
9705
9706 @item named Unicode chars, like @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}}
9707 @*
9708 Note that this escape is translated into a UTF-8 representation,
9709 regardless of the presence of the @code{use utf8} pragma.
9710 @end itemize
9711
9712 The following escapes are considered partially safe:
9713
9714 @itemize @bullet
9715
9716 @item @code{\l} lowercase next char
9717
9718 @item @code{\u} uppercase next char
9719
9720 @item @code{\L} lowercase till \E
9721
9722 @item @code{\U} uppercase till \E
9723
9724 @item @code{\E} end case modification
9725
9726 @item @code{\Q} quote non-word characters till \E
9727
9728 @end itemize
9729
9730 These escapes are only considered safe if the string consists of
9731 ASCII characters only.  Translation of characters outside the range
9732 defined by ASCII is locale-dependent and can actually only be performed
9733 at runtime; @code{xgettext} doesn't do these locale-dependent translations
9734 at extraction time.
9735
9736 Except for the modifier @code{\Q}, these translations, albeit valid,
9737 are generally useless and only obfuscate your sources.  If a
9738 translation can be safely performed at compile time you can just as
9739 well write what you mean.
9740
9741 @node Interpolation II, Parentheses, Interpolation I, Perl
9742 @subsubsection Valid Uses Of String Interpolation
9743 @cindex Perl valid string interpolation
9744
9745 Perl is often used to generate sources for other programming languages
9746 or arbitrary file formats.  Web applications that output HTML code
9747 make a prominent example for such usage.
9748
9749 You will often come across situations where you want to intersperse
9750 code written in the target (programming) language with translatable
9751 messages, like in the following HTML example:
9752
9753 @example
9754 print gettext <<EOF;
9755 <h1>My Homepage</h1>
9756 <script language="JavaScript"><!--
9757 for (i = 0; i < 100; ++i) @{
9758     alert ("Thank you so much for visiting my homepage!");
9759 @}
9760 //--></script>
9761 EOF
9762 @end example
9763
9764 The parser will extract the entire here document, and it will appear
9765 entirely in the resulting PO file, including the JavaScript snippet
9766 embedded in the HTML code.  If you exaggerate with constructs like
9767 the above, you will run the risk that the translators of your package
9768 will look out for a less challenging project.  You should consider an
9769 alternative expression here:
9770
9771 @example
9772 print <<EOF;
9773 <h1>$gettext@{"My Homepage"@}</h1>
9774 <script language="JavaScript"><!--
9775 for (i = 0; i < 100; ++i) @{
9776     alert ("$gettext@{'Thank you so much for visiting my homepage!'@}");
9777 @}
9778 //--></script>
9779 EOF
9780 @end example
9781
9782 Only the translatable portions of the code will be extracted here, and
9783 the resulting PO file will begrudgingly improve in terms of readability.
9784
9785 You can interpolate hash lookups in all strings or quote-like
9786 expressions that are subject to interpolation (see the manual page
9787 @samp{man perlop} for details).  Double interpolation is invalid, however:
9788
9789 @example
9790 # TRANSLATORS: Replace "the earth" with the name of your planet.
9791 print gettext qq@{Welcome to $gettext->@{"the earth"@}@};
9792 @end example
9793
9794 The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in
9795 the first place, and checked for invalid variable interpolation.  The
9796 dollar sign of hash-dereferencing will therefore terminate the parser
9797 with an ``invalid interpolation'' error.
9798
9799 It is valid to interpolate hash lookups in regular expressions:
9800
9801 @example
9802 if ($var =~ /$gettext@{"the earth"@}/) @{
9803    print gettext "Match!\n";
9804 @}
9805 s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g;
9806 @end example
9807
9808 @node Parentheses, Long Lines, Interpolation II, Perl
9809 @subsubsection When To Use Parentheses
9810 @cindex Perl parentheses
9811
9812 In Perl, parentheses around function arguments are mostly optional.
9813 @code{xgettext} will always assume that all
9814 recognized keywords (except for hashs and hash references) are names
9815 of properly prototyped functions, and will (hopefully) only require
9816 parentheses where Perl itself requires them.  All constructs in the
9817 following example are therefore ok to use:
9818
9819 @example
9820 @group
9821 print gettext ("Hello World!\n");
9822 print gettext "Hello World!\n";
9823 print dgettext ($package => "Hello World!\n");
9824 print dgettext $package, "Hello World!\n";
9825
9826 # The "fat comma" => turns the left-hand side argument into a
9827 # single-quoted string!
9828 print dgettext smellovision => "Hello World!\n";
9829
9830 # The following assignment only works with prototyped functions.
9831 # Otherwise, the functions will act as "greedy" list operators and
9832 # eat up all following arguments.
9833 my $anonymous_hash = @{
9834    planet => gettext "earth",
9835    cakes => ngettext "one cake", "several cakes", $n,
9836    still => $works,
9837 @};
9838 # The same without fat comma:
9839 my $other_hash = @{
9840    'planet', gettext "earth",
9841    'cakes', ngettext "one cake", "several cakes", $n,
9842    'still', $works,
9843 @};
9844
9845 # Parentheses are only significant for the first argument.
9846 print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
9847 @end group
9848 @end example
9849
9850 @node Long Lines, Perl Pitfalls, Parentheses, Perl
9851 @subsubsection How To Grok with Long Lines
9852 @cindex Perl long lines
9853
9854 The necessity of long messages can often lead to a cumbersome or
9855 unreadable coding style.  Perl has several options that may prevent
9856 you from writing unreadable code, and
9857 @code{xgettext} does its best to do likewise.  This is where the dot
9858 operator (the string concatenation operator) may come in handy:
9859
9860 @example
9861 @group
9862 print gettext ("This is a very long"
9863                . " message that is still"
9864                . " readable, because"
9865                . " it is split into"
9866                . " multiple lines.\n");
9867 @end group
9868 @end example
9869
9870 Perl is smart enough to concatenate these constant string fragments
9871 into one long string at compile time, and so is
9872 @code{xgettext}.  You will only find one long message in the resulting
9873 POT file.
9874
9875 Note that the future Perl 6 will probably use the underscore
9876 (@samp{_}) as the string concatenation operator, and the dot
9877 (@samp{.}) for dereferencing.  This new syntax is not yet supported by
9878 @code{xgettext}.
9879
9880 If embedded newline characters are not an issue, or even desired, you
9881 may also insert newline characters inside quoted strings wherever you
9882 feel like it:
9883
9884 @example
9885 @group
9886 print gettext ("<em>In HTML output
9887 embedded newlines are generally no
9888 problem, since adjacent whitespace
9889 is always rendered into a single
9890 space character.</em>");
9891 @end group
9892 @end example
9893
9894 You may also consider to use here documents:
9895
9896 @example
9897 @group
9898 print gettext <<EOF;
9899 <em>In HTML output
9900 embedded newlines are generally no
9901 problem, since adjacent whitespace
9902 is always rendered into a single
9903 space character.</em>
9904 EOF
9905 @end group
9906 @end example
9907
9908 Please do not forget, that the line breaks are real, i. e. they
9909 translate into newline characters that will consequently show up in
9910 the resulting POT file.
9911
9912 @node Perl Pitfalls,  , Long Lines, Perl
9913 @subsubsection Bugs, Pitfalls, And Things That Do Not Work
9914 @cindex Perl pitfalls
9915
9916 The foregoing sections should have proven that
9917 @code{xgettext} is quite smart in extracting translatable strings from
9918 Perl sources.  Yet, some more or less exotic constructs that could be
9919 expected to work, actually do not work.
9920
9921 One of the more relevant limitations can be found in the
9922 implementation of variable interpolation inside quoted strings.  Only
9923 simple hash lookups can be used there:
9924
9925 @example
9926 print <<EOF;
9927 $gettext@{"The dot operator"
9928           . " does not work"
9929           . "here!"@}
9930 Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@}
9931 inside quoted strings or quote-like expressions.
9932 EOF
9933 @end example
9934
9935 This is valid Perl code and will actually trigger invocations of the
9936 @code{gettext} function at runtime.  Yet, the Perl parser in
9937 @code{xgettext} will fail to recognize the strings.  A less obvious
9938 example can be found in the interpolation of regular expressions:
9939
9940 @example
9941 s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;
9942 @end example
9943
9944 The modifier @code{e} will cause the substitution to be interpreted as
9945 an evaluable statement.  Consequently, at runtime the function
9946 @code{gettext()} is called, but again, the parser fails to extract the
9947 string ``Sunday''.  Use a temporary variable as a simple workaround if
9948 you really happen to need this feature:
9949
9950 @example
9951 my $sunday = gettext "Sunday";
9952 s/<!--START_OF_WEEK-->/$sunday/;
9953 @end example
9954
9955 Hash slices would also be handy but are not recognized:
9956
9957 @example
9958 my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
9959                         'Thursday', 'Friday', 'Saturday'@};
9960 # Or even:
9961 @@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday
9962                          Friday Saturday) @};
9963 @end example
9964
9965 This is perfectly valid usage of the tied hash @code{%gettext} but the
9966 strings are not recognized and therefore will not be extracted.
9967
9968 Another caveat of the current version is its rudimentary support for
9969 non-ASCII characters in identifiers.  You may encounter serious
9970 problems if you use identifiers with characters outside the range of
9971 'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
9972
9973 Maybe some of these missing features will be implemented in future
9974 versions, but since you can always make do without them at minimal effort,
9975 these todos have very low priority.
9976
9977 A nasty problem are brace format strings that already contain braces
9978 as part of the normal text, for example the usage strings typically
9979 encountered in programs:
9980
9981 @example
9982 die "usage: $0 @{OPTIONS@} FILENAME...\n";
9983 @end example
9984
9985 If you want to internationalize this code with Perl brace format strings,
9986 you will run into a problem:
9987
9988 @example
9989 die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0);
9990 @end example
9991
9992 Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}}
9993 is not and should probably be translated. Yet, there is no way to teach
9994 the Perl parser in @code{xgettext} to recognize the first one, and leave
9995 the other one alone.
9996
9997 There are two possible work-arounds for this problem.  If you are
9998 sure that your program will run under Perl 5.8.0 or newer (these
9999 Perl versions handle positional parameters in @code{printf()}) or
10000 if you are sure that the translator will not have to reorder the arguments
10001 in her translation -- for example if you have only one brace placeholder
10002 in your string, or if it describes a syntax, like in this one --, you can
10003 mark the string as @code{no-perl-brace-format} and use @code{printf()}:
10004
10005 @example
10006 # xgettext: no-perl-brace-format
10007 die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0);
10008 @end example
10009
10010 If you want to use the more portable Perl brace format, you will have to do
10011 put placeholders in place of the literal braces:
10012
10013 @example
10014 die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n",
10015          program => $0, '[' => '@{', ']' => '@}');
10016 @end example
10017
10018 Perl brace format strings know no escaping mechanism.  No matter how this
10019 escaping mechanism looked like, it would either give the programmer a
10020 hard time, make translating Perl brace format strings heavy-going, or
10021 result in a performance penalty at runtime, when the format directives
10022 get executed.  Most of the time you will happily get along with
10023 @code{printf()} for this special case.
10024
10025 @node PHP, Pike, Perl, List of Programming Languages
10026 @subsection PHP Hypertext Preprocessor
10027 @cindex PHP
10028
10029 @table @asis
10030 @item RPMs
10031 mod_php4, mod_php4-core, phpdoc
10032
10033 @item File extension
10034 @code{php}, @code{php3}, @code{php4}
10035
10036 @item String syntax
10037 @code{"abc"}, @code{'abc'}
10038
10039 @item gettext shorthand
10040 @code{_("abc")}
10041
10042 @item gettext/ngettext functions
10043 @code{gettext}, @code{dgettext}, @code{dcgettext}; starting with PHP 4.2.0
10044 also @code{ngettext}, @code{dngettext}, @code{dcngettext}
10045
10046 @item textdomain
10047 @code{textdomain} function
10048
10049 @item bindtextdomain
10050 @code{bindtextdomain} function
10051
10052 @item setlocale
10053 Programmer must call @code{setlocale (LC_ALL, "")}
10054
10055 @item Prerequisite
10056 ---
10057
10058 @item Use or emulate GNU gettext
10059 use
10060
10061 @item Extractor
10062 @code{xgettext}
10063
10064 @item Formatting with positions
10065 @code{printf "%2\$d %1\$d"}
10066
10067 @item Portability
10068 On platforms without gettext, the functions are not available.
10069
10070 @item po-mode marking
10071 ---
10072 @end table
10073
10074 An example is available in the @file{examples} directory: @code{hello-php}.
10075
10076 @node Pike, GCC-source, PHP, List of Programming Languages
10077 @subsection Pike
10078 @cindex Pike
10079
10080 @table @asis
10081 @item RPMs
10082 roxen
10083
10084 @item File extension
10085 @code{pike}
10086
10087 @item String syntax
10088 @code{"abc"}
10089
10090 @item gettext shorthand
10091 ---
10092
10093 @item gettext/ngettext functions
10094 @code{gettext}, @code{dgettext}, @code{dcgettext}
10095
10096 @item textdomain
10097 @code{textdomain} function
10098
10099 @item bindtextdomain
10100 @code{bindtextdomain} function
10101
10102 @item setlocale
10103 @code{setlocale} function
10104
10105 @item Prerequisite
10106 @code{import Locale.Gettext;}
10107
10108 @item Use or emulate GNU gettext
10109 use
10110
10111 @item Extractor
10112 ---
10113
10114 @item Formatting with positions
10115 ---
10116
10117 @item Portability
10118 On platforms without gettext, the functions are not available.
10119
10120 @item po-mode marking
10121 ---
10122 @end table
10123
10124 @node GCC-source,  , Pike, List of Programming Languages
10125 @subsection GNU Compiler Collection sources
10126 @cindex GCC-source
10127
10128 @table @asis
10129 @item RPMs
10130 gcc
10131
10132 @item File extension
10133 @code{c}, @code{h}.
10134
10135 @item String syntax
10136 @code{"abc"}
10137
10138 @item gettext shorthand
10139 @code{_("abc")}
10140
10141 @item gettext/ngettext functions
10142 @code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
10143 @code{dngettext}, @code{dcngettext}
10144
10145 @item textdomain
10146 @code{textdomain} function
10147
10148 @item bindtextdomain
10149 @code{bindtextdomain} function
10150
10151 @item setlocale
10152 Programmer must call @code{setlocale (LC_ALL, "")}
10153
10154 @item Prerequisite
10155 @code{#include "intl.h"}
10156
10157 @item Use or emulate GNU gettext
10158 Use
10159
10160 @item Extractor
10161 @code{xgettext -k_}
10162
10163 @item Formatting with positions
10164 ---
10165
10166 @item Portability
10167 Uses autoconf macros
10168
10169 @item po-mode marking
10170 yes
10171 @end table
10172
10173 @c This is the template for new languages.
10174 @ignore
10175
10176 @ node
10177 @ subsection
10178
10179 @table @asis
10180 @item RPMs
10181
10182 @item File extension
10183
10184 @item String syntax
10185
10186 @item gettext shorthand
10187
10188 @item gettext/ngettext functions
10189
10190 @item textdomain
10191
10192 @item bindtextdomain
10193
10194 @item setlocale
10195
10196 @item Prerequisite
10197
10198 @item Use or emulate GNU gettext
10199
10200 @item Extractor
10201
10202 @item Formatting with positions
10203
10204 @item Portability
10205
10206 @item po-mode marking
10207 @end table
10208
10209 @end ignore
10210
10211 @node List of Data Formats,  , List of Programming Languages, Programming Languages
10212 @section Internationalizable Data
10213
10214 Here is a list of other data formats which can be internationalized
10215 using GNU gettext.
10216
10217 @menu
10218 * POT::                         POT - Portable Object Template
10219 * RST::                         Resource String Table
10220 * Glade::                       Glade - GNOME user interface description
10221 @end menu
10222
10223 @node POT, RST, List of Data Formats, List of Data Formats
10224 @subsection POT - Portable Object Template
10225
10226 @table @asis
10227 @item RPMs
10228 gettext
10229
10230 @item File extension
10231 @code{pot}, @code{po}
10232
10233 @item Extractor
10234 @code{xgettext}
10235 @end table
10236
10237 @node RST, Glade, POT, List of Data Formats
10238 @subsection Resource String Table
10239 @cindex RST
10240
10241 @table @asis
10242 @item RPMs
10243 fpk
10244
10245 @item File extension
10246 @code{rst}
10247
10248 @item Extractor
10249 @code{xgettext}, @code{rstconv}
10250 @end table
10251
10252 @node Glade,  , RST, List of Data Formats
10253 @subsection Glade - GNOME user interface description
10254
10255 @table @asis
10256 @item RPMs
10257 glade, libglade, glade2, libglade2, intltool
10258
10259 @item File extension
10260 @code{glade}, @code{glade2}
10261
10262 @item Extractor
10263 @code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract}
10264 @end table
10265
10266 @c This is the template for new data formats.
10267 @ignore
10268
10269 @ node
10270 @ subsection
10271
10272 @table @asis
10273 @item RPMs
10274
10275 @item File extension
10276
10277 @item Extractor
10278 @end table
10279
10280 @end ignore
10281
10282 @node Conclusion, Language Codes, Programming Languages, Top
10283 @chapter Concluding Remarks
10284
10285 We would like to conclude this GNU @code{gettext} manual by presenting
10286 an history of the Translation Project so far.  We finally give
10287 a few pointers for those who want to do further research or readings
10288 about Native Language Support matters.
10289
10290 @menu
10291 * History::                     History of GNU @code{gettext}
10292 * References::                  Related Readings
10293 @end menu
10294
10295 @node History, References, Conclusion, Conclusion
10296 @section History of GNU @code{gettext}
10297 @cindex history of GNU @code{gettext}
10298
10299 Internationalization concerns and algorithms have been informally
10300 and casually discussed for years in GNU, sometimes around GNU
10301 @code{libc}, maybe around the incoming @code{Hurd}, or otherwise
10302 (nobody clearly remembers).  And even then, when the work started for
10303 real, this was somewhat independently of these previous discussions.
10304
10305 This all began in July 1994, when Patrick D'Cruze had the idea and
10306 initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.
10307 He then asked Jim Meyering, the maintainer, how to get those changes
10308 folded into an official release.  That first draft was full of
10309 @code{#ifdef}s and somewhat disconcerting, and Jim wanted to find
10310 nicer ways.  Patrick and Jim shared some tries and experimentations
10311 in this area.  Then, feeling that this might eventually have a deeper
10312 impact on GNU, Jim wanted to know what standards were, and contacted
10313 Richard Stallman, who very quickly and verbally described an overall
10314 design for what was meant to become @code{glocale}, at that time.
10315
10316 Jim implemented @code{glocale} and got a lot of exhausting feedback
10317 from Patrick and Richard, of course, but also from Mitchum DSouza
10318 (who wrote a @code{catgets}-like package), Roland McGrath, maybe David
10319 MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
10320 pulling in various directions, not always compatible, to the extent
10321 that after a couple of test releases, @code{glocale} was torn apart.
10322 In particular, Paul Eggert -- always keeping an eye on developments
10323 in Solaris -- advocated the use of the @code{gettext} API over
10324 @code{glocale}'s @code{catgets}-based API.
10325
10326 While Jim took some distance and time and became dad for a second
10327 time, Roland wanted to get GNU @code{libc} internationalized, and
10328 got Ulrich Drepper involved in that project.  Instead of starting
10329 from @code{glocale}, Ulrich rewrote something from scratch, but
10330 more conformant to the set of guidelines who emerged out of the
10331 @code{glocale} effort.  Then, Ulrich got people from the previous
10332 forum to involve themselves into this new project, and the switch
10333 from @code{glocale} to what was first named @code{msgutils}, renamed
10334 @code{nlsutils}, and later @code{gettext}, became officially accepted
10335 by Richard in May 1995 or so.
10336
10337 Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}
10338 in April 1995.  The first official release of the package, including
10339 PO mode, occurred in July 1995, and was numbered 0.7.  Other people
10340 contributed to the effort by providing a discussion forum around
10341 Ulrich, writing little pieces of code, or testing.  These are quoted
10342 in the @code{THANKS} file which comes with the GNU @code{gettext}
10343 distribution.
10344
10345 While this was being done, Fran@,{c}ois adapted half a dozen of
10346 GNU packages to @code{glocale} first, then later to @code{gettext},
10347 putting them in pretest, so providing along the way an effective
10348 user environment for fine tuning the evolving tools.  He also took
10349 the responsibility of organizing and coordinating the Translation
10350 Project.  After nearly a year of informal exchanges between people from
10351 many countries, translator teams started to exist in May 1995, through
10352 the creation and support by Patrick D'Cruze of twenty unmoderated
10353 mailing lists for that many native languages, and two moderated
10354 lists: one for reaching all teams at once, the other for reaching
10355 all willing maintainers of internationalized free software packages.
10356
10357 Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
10358 of Greg McGary, as a kind of contribution to Ulrich's package.
10359 He also gave a hand with the GNU @code{gettext} Texinfo manual.
10360
10361 In 1997, Ulrich Drepper released the GNU libc 2.0, which included the
10362 @code{gettext}, @code{textdomain} and @code{bindtextdomain} functions.
10363
10364 In 2000, Ulrich Drepper added plural form handling (the @code{ngettext}
10365 function) to GNU libc.  Later, in 2001, he released GNU libc 2.2.x,
10366 which is the first free C library with full internationalization support.
10367
10368 Ulrich being quite busy in his role of General Maintainer of GNU libc,
10369 he handed over the GNU @code{gettext} maintenance to Bruno Haible in
10370 2000.  Bruno added the plural form handling to the tools as well, added
10371 support for UTF-8 and CJK locales, and wrote a few new tools for
10372 manipulating PO files.
10373
10374 @node References,  , History, Conclusion
10375 @section Related Readings
10376 @cindex related reading
10377 @cindex bibliography
10378
10379 Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting
10380 bibliography on internationalization matters, called
10381 @cite{Internationalization Reference List}, which is available as:
10382 @example
10383 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
10384 @end example
10385
10386 Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a
10387 Frequently Asked Questions (FAQ) list, entitled @cite{Programming for
10388 Internationalisation}.  This FAQ discusses writing programs which
10389 can handle different language conventions, character sets, etc.;
10390 and is applicable to all character set encodings, with particular
10391 emphasis on @w{ISO 8859-1}.  It is regularly published in Usenet
10392 groups @file{comp.unix.questions}, @file{comp.std.internat},
10393 @file{comp.software.international}, @file{comp.lang.c},
10394 @file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}
10395 and @file{news.answers}.  The home location of this document is:
10396 @example
10397 ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
10398 @end example
10399
10400 Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS
10401 matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took
10402 over the responsibility of maintaining it.  It may be found as:
10403 @example
10404 ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
10405      ...locale-tutorial-0.8.txt.gz
10406 @end example
10407 @noindent
10408 This site is mirrored in:
10409 @example
10410 ftp://ftp.ibp.fr/pub/linux/sunsite/
10411 @end example
10412
10413 A French version of the same tutorial should be findable at:
10414 @example
10415 ftp://ftp.ibp.fr/pub/linux/french/docs/
10416 @end example
10417 @noindent
10418 together with French translations of many Linux-related documents.
10419
10420 @node Language Codes, Country Codes, Conclusion, Top
10421 @appendix Language Codes
10422 @cindex language codes
10423 @cindex ISO 639
10424
10425 The @w{ISO 639} standard defines two character codes for many languages.
10426 All abbreviations for languages used in the Translation Project should
10427 come from this standard.
10428
10429 @table @samp
10430 @include iso-639.texi
10431 @end table
10432
10433 @node Country Codes, Program Index, Language Codes, Top
10434 @appendix Country Codes
10435 @cindex country codes
10436 @cindex ISO 3166
10437
10438 The @w{ISO 3166} standard defines two character codes for many countries
10439 and territories.  All abbreviations for countries used in the Translation
10440 Project should come from this standard.
10441
10442 @table @samp
10443 @include iso-3166.texi
10444 @end table
10445
10446 @node Program Index, Option Index, Country Codes, Top
10447 @unnumbered Program Index
10448
10449 @printindex pg
10450
10451 @node Option Index, Variable Index, Program Index, Top
10452 @unnumbered Option Index
10453
10454 @printindex op
10455
10456 @node Variable Index, PO Mode Index, Option Index, Top
10457 @unnumbered Variable Index
10458
10459 @printindex vr
10460
10461 @node PO Mode Index, Autoconf Macro Index, Variable Index, Top
10462 @unnumbered PO Mode Index
10463
10464 @printindex em
10465
10466 @node Autoconf Macro Index, Index, PO Mode Index, Top
10467 @unnumbered Autoconf Macro Index
10468
10469 @printindex am
10470
10471 @node Index,  , Autoconf Macro Index, Top
10472 @unnumbered General Index
10473
10474 @printindex cp
10475
10476 @iftex
10477 @c Table of Contents
10478 @contents
10479 @end iftex
10480
10481 @bye
10482
10483 @c Local variables:
10484 @c texinfo-column-for-description: 32
10485 @c End: