TIKA-40 - Tika needs to support diverse character encodings
commitb7cbb239a8e00305cb311f07a12ab4b62a4666f6
authorJukka Lauri Zitting <jukka@apache.org>
Wed, 10 Oct 2007 12:07:41 +0000 (10 12:07 +0000)
committerJukka Lauri Zitting <jukka@apache.org>
Wed, 10 Oct 2007 12:07:41 +0000 (10 12:07 +0000)
treeda98d4c3fe8ee571205aadb30ebc582e81fefb73
parent1e040e6a07f3446cffa32c2f489ea3729bc5dcc4
TIKA-40 - Tika needs to support diverse character encodings
    - Use ICU4J to parse text content
    - Support Metadata.CONTENT_ENCODING hints in TXTParser
    - Added specific test cases for TXTParser

git-svn-id: https://svn.eu.apache.org/repos/asf/incubator/tika/trunk@583443 13f79535-47bb-0310-9956-ffa450edef68
CHANGES.txt
pom.xml
src/main/java/org/apache/tika/parser/txt/TXTParser.java
src/test/java/org/apache/tika/parser/txt/TXTParserTest.java [new file with mode: 0644]