Modify replacement properties of `encodeStringUtf8`/`decodeStringUtf8`
commit4aedd00c818111d6843d4e8d6446a1f66c559aa9
authorHerbert Valerio Riedel <hvr@gnu.org>
Sun, 3 Dec 2017 21:35:05 +0000 (3 22:35 +0100)
committerHerbert Valerio Riedel <hvr@gnu.org>
Sun, 3 Dec 2017 21:35:05 +0000 (3 22:35 +0100)
tree493ef2ebf7e9f46c7f903f38c021be0b079b53b6
parentb67871c7e215d8c366f8a3cf4d8aac0236d4146a
Modify replacement properties of `encodeStringUtf8`/`decodeStringUtf8`

This changes `decodeStringUtf8` to not replace U+FFFE and U+FFFF into
U+FFFD, while `encodeStringUtf8` now replaces surrogate pairs
(i.e. code-points U+D800 through U+DFFF which are invalid in UTF-8)
with U+FFFD.

Consequently, `decodeStringUtf8 . encodeStringUtf8` can now properly
round-trip all scalar code-points
(i.e. [U+0000..U+D7FF] ∪ [U+E000..U+10FFFF]).

This should finally address #4644
Cabal/Distribution/Utils/Generic.hs
Cabal/Distribution/Utils/String.hs