Intelligent parsing of ambiguously encoded meta data.
commit0d72690a43e66b1ed95d2b66c77c35436ddd5faf
authorShawn O. Pearce <spearce@spearce.org>
Mon, 13 Oct 2008 01:57:51 +0000 (12 18:57 -0700)
committerShawn O. Pearce <spearce@spearce.org>
Mon, 13 Oct 2008 17:54:00 +0000 (13 10:54 -0700)
tree1aa30c30530e3fc86cb0cb2a143a0e6819e27206
parent4a23e50307fdb52675990aa997756b44081900d4
Intelligent parsing of ambiguously encoded meta data.

We cannot trust meta data to be encoded in any particular way,
so we try different encodings.

First we try UTF-8, which is the only sane encoding for non-local
data, even when used in regions where eight bit legacy encodings
are common. The chance of mistakenly parsing non-UTF-8 data as
valid UTF-8 is varies from extremely low (western encodings) to
low for most other encodings.

If the data does not look like UTF-8, we try the suggested encoding,
as parsed out of the commit "encoding" header, or as otherwise
specified by the caller.

If that fails we try the user locale and finally, if that fails we
force to ISO-8859-1, which cannot fail.

[sp: RawParseUtil code by Shawn; RevCommitParseTest by Robin]

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
org.spearce.jgit.test/tst/org/spearce/jgit/revwalk/RevCommitParseTest.java
org.spearce.jgit/src/org/spearce/jgit/util/RawParseUtils.java