src/timezone/README

   1 src/timezone/README
   2
   3 This is a PostgreSQL adapted version of the IANA timezone library from
   4
   5         https://www.iana.org/time-zones
   6
   7 The latest version of the timezone data and library source code is
   8 available right from that page.  It's best to get the merged file
   9 tzdb-NNNNX.tar.lz, since the other archive formats omit tzdata.zi.
  10 Historical versions, as well as release announcements, can be found
  11 elsewhere on the site.
  12
  13 Since time zone rules change frequently in some parts of the world,
  14 we should endeavor to update the data files before each PostgreSQL
  15 release.  The code need not be updated as often, but we must track
  16 changes that might affect interpretation of the data files.
  17
  18
  19 Time Zone data
  20 ==============
  21
  22 We distribute the time zone source data as-is under src/timezone/data/.
  23 Currently, we distribute just the abbreviated single-file format
  24 "tzdata.zi", to reduce the size of our tarballs as well as churn
  25 in our git repo.  Feeding that file to zic produces the same compiled
  26 output as feeding the bulkier individual data files would do.
  27
  28 While data/tzdata.zi can just be duplicated when updating, manual effort
  29 is needed to update the time zone abbreviation lists under tznames/.
  30 These need to be changed whenever new abbreviations are invented or the
  31 UTC offset associated with an existing abbreviation changes.  To detect
  32 if this has happened, after installing new files under data/ do
  33         make abbrevs.txt
  34 which will produce a file showing all abbreviations that are in current
  35 use according to the data/ files.  Compare this to known_abbrevs.txt,
  36 which is the list that existed last time the tznames/ files were updated.
  37 Update tznames/ as seems appropriate, then replace known_abbrevs.txt
  38 in the same commit.  Usually, if a known abbreviation has changed meaning,
  39 the appropriate fix is to make it refer to a long-form zone name instead
  40 of a fixed GMT offset.
  41
  42 The core regression test suite does some simple validation of the zone
  43 data and abbreviations data (notably by checking that the pg_timezone_names
  44 and pg_timezone_abbrevs views don't throw errors).  It's worth running it
  45 as a cross-check on proposed updates.
  46
  47 When there has been a new release of Windows (probably including Service
  48 Packs), findtimezone.c's mapping from Windows zones to IANA zones may
  49 need to be updated.  We have two approaches to doing this:
  50 1. Consult the CLDR project's windowsZones.xml file, and add any zones
  51    listed there that we don't have.  Use their "territory=001" mapping
  52    if there's more than one IANA zone listed.
  53 2. Run the script in src/tools/win32tzlist.pl on a Windows machine
  54    running the new release, and add any new timezones that it detects.
  55    (This is not a full substitute for #1, though, as win32tzlist.pl
  56    can't tell you which IANA zone to map to.)
  57 In either case, never remove any zone names that have disappeared from
  58 Windows, since we still need to match properly on older versions.
  59
  60
  61 Time Zone code
  62 ==============
  63
  64 The code in this directory is currently synced with tzcode release 2020d.
  65 There are many cosmetic (and not so cosmetic) differences from the
  66 original tzcode library, but diffs in the upstream version should usually
  67 be propagated to our version.  Here are some notes about that.
  68
  69 For the most part we want to use the upstream code as-is, but there are
  70 several considerations preventing an exact match:
  71
  72 * For readability/maintainability we reformat the code to match our own
  73 conventions; this includes pgindent'ing it and getting rid of upstream's
  74 overuse of "register" declarations.  (It used to include conversion of
  75 old-style function declarations to C89 style, but thank goodness they
  76 fixed that.)
  77
  78 * We need the code to follow Postgres' portability conventions; this
  79 includes relying on configure's results rather than hand-hacked
  80 #defines (see private.h in particular).
  81
  82 * Similarly, avoid relying on <stdint.h> features that may not exist on old
  83 systems.  In particular this means using Postgres' definitions of the int32
  84 and int64 typedefs, not int_fast32_t/int_fast64_t.  Likewise we use
  85 PG_INT32_MIN/MAX not INT32_MIN/MAX.  (Once we desupport all PG versions
  86 that don't require C99, it'd be practical to rely on <stdint.h> and remove
  87 this set of diffs; but that day is not yet.)
  88
  89 * Since Postgres is typically built on a system that has its own copy
  90 of the <time.h> functions, we must avoid conflicting with those.  This
  91 mandates renaming typedef time_t to pg_time_t, and similarly for most
  92 other exposed names.
  93
  94 * zic.c's typedef "lineno" is renamed to "lineno_t", because having
  95 "lineno" in our typedefs list would cause unfortunate pgindent behavior
  96 in some other files where we have variables named that.
  97
  98 * We have exposed the tzload() and tzparse() internal functions, and
  99 slightly modified the API of the former, in part because it now relies
 100 on our own pg_open_tzfile() rather than opening files for itself.
 101
 102 * tzparse() is adjusted to never try to load the TZDEFRULES zone.
 103
 104 * There's a fair amount of code we don't need and have removed,
 105 including all the nonstandard optional APIs.  We have also added
 106 a few functions of our own at the bottom of localtime.c.
 107
 108 * In zic.c, we have added support for a -P (print_abbrevs) switch, which
 109 is used to create the "abbrevs.txt" summary of currently-in-use zone
 110 abbreviations that was described above.
 111
 112
 113 The most convenient way to compare a new tzcode release to our code is
 114 to first run the tzcode source files through a sed filter like this:
 115
 116     sed -r \
 117         -e 's/^([ \t]*)\*\*([ \t])/\1 *\2/' \
 118         -e 's/^([ \t]*)\*\*$/\1 */' \
 119         -e 's|^\*/| */|' \
 120         -e 's/\bregister[ \t]//g' \
 121         -e 's/\bATTRIBUTE_PURE[ \t]//g' \
 122         -e 's/int_fast32_t/int32/g' \
 123         -e 's/int_fast64_t/int64/g' \
 124         -e 's/intmax_t/int64/g' \
 125         -e 's/INT32_MIN/PG_INT32_MIN/g' \
 126         -e 's/INT32_MAX/PG_INT32_MAX/g' \
 127         -e 's/INTMAX_MIN/PG_INT64_MIN/g' \
 128         -e 's/INTMAX_MAX/PG_INT64_MAX/g' \
 129         -e 's/struct[ \t]+tm\b/struct pg_tm/g' \
 130         -e 's/\btime_t\b/pg_time_t/g' \
 131         -e 's/lineno/lineno_t/g' \
 132
 133 and then run them through pgindent.  (The first three sed patterns deal
 134 with conversion of their block comment style to something pgindent
 135 won't make a hash of; the remainder address other points noted above.)
 136 After that, the files can be diff'd directly against our corresponding
 137 files.  Also, it's typically helpful to diff against the previous tzcode
 138 release (after processing that the same way), and then try to apply the
 139 diff to our files.  This will take care of most of the changes
 140 mechanically.