Do not treat fullwidth Latin and symbols as unbroken script
Up until now all Unicode codepoints in the Halfwidth and
Fullwidth Forms Block (U+FF00..U+FFEF) were treated as
unbroken script. This causes terms that consist of fullwidth
Latin characters in this range to not being lowercased
before indexing, resulting in queries not finding such text.
This patch changes word-breaker to only consider halfwidth
Katakana and Hanul characters as unbroken script, handling
all fullwidth Latin characters, numbers and symbols in this
block as broken script.