1 Updating Unicode versions (notes on the 10.0.0-15.0.0 updates):
3 1. Download ucd.zip for the version to update to, along with UCA's
4 allkeys.txt and CollationTest.zip, and confusables.txt from
5 security; replace all files in tools-for-build/ and tests/data/
6 with the corresponding versions.
8 (See tools-for-build/update-unicode-datafiles.sh)
10 a. obsolete: replaced by confusables.txt. (think about
11 ConfusablesEdited.txt which claims to be version 12: maybe
12 that's the first version Confusables.txt existed? Cross that
13 bridge when we get to it.)
15 2. Look at the data files, and adjust more-ucd-consts.lisp-expr
17 At least scripts are likely to have been added.
21 (This didn't go wrong this time; there may be fewer hard-coded
22 things than there were in the distant past. I was slightly
23 surprised that I adjusted more-ucd-consts.lisp-expr correctly first
26 4. Run tests. Expect failures:
28 a. failures in unicode-properties.pure.lisp
30 - a failure in :BIDI-CLASS might mean that the unallocated
31 character properties have changed. Look at
32 DerivedBidiClass.txt and compare the ranges documented in the
33 header with ucd.lisp `unallocated-bidi-class`.
35 - failures in :GRAPHEME-BREAK-CLASS and :WORD-BREAK-CLASS come
36 from the change in the breaking algorithm. The major change
37 comes from handling emoji in a more principled way, but this
38 involves us needing to parse and preserve emoji data, so add
39 emoji-data.txt from UTS 51. There are also more minor
40 changes: some characters added to the explicit list of
41 :aletter in WORD-BREAK-CLASS, and the new :wsegspace class.
44 b. failures in unicode-breaking.pure.lisp
46 - failures in :GRAPHEME-BREAKING / :WORD-BREAKING /
47 :LINE-BREAKING comes from needing to update those breaking
48 algorithms for the new classes, and the refinements they
49 bring. Grapheme- and word-breaking is in UAX 29;
50 line-breaking is UAX 14.
52 - we implement the approximation to line-breaking rule LB25
53 ("don't break in numbers"), not the regular-expression form.
54 The test files assume the full implementation, so we override
55 the expected test answers for a small number of lines in
56 LineBreakTest; see tests/data/line-break-exceptions.lisp-expr
58 c. failures in unicode-collation.pure.lisp
60 - sometimes new scripts get implicit collation key weights, or
61 existing scripts have new blocks added. Supporting this
62 involves changing the else branch in COLLATION-KEY in