Allow utf8 in mappings
commit3b3f86b71e71b4adc7a48486ee9881276a4ba27c
authorchrisjbillington <chrisjbillington@gmail.com>
Wed, 25 Mar 2020 16:31:16 +0000 (25 12:31 -0400)
committerchrisjbillington <chrisjbillington@gmail.com>
Wed, 25 Mar 2020 16:33:42 +0000 (25 12:33 -0400)
tree53841b6ffc094a8cc93a87a195474fe7736639d5
parente51844cd652860681df25ab2bdd5fb8228c8152b
Allow utf8 in mappings

We were previously processing entries in mapping files (when
`--mappings-are-raw` is not given) with
`.decode('unicode_escape').encode('utf8')` to replace backslash escape
sequences in bytestrings with the utf-8 encoded characters they
represent. However, it turns out that `.decode
('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii
bytes: https://bugs.python.org/issue21331. So this gave incorrect
results if non-ascii utf8 data was present in the mapping.

To fix this, we now add an extra layer of `.decode('utf8').encode
('unicode-escape')` in order to convert any non-ascii characters into
their backslash escape sequences. Then the subsequent
`.decode('unicode_escape')` only encounters ascii characters and gives
correct results.
README.md
hg-fast-export.py