msys.dll: basic Unicode support
The msys.dll currently uses Windows' *A APIs, i.e. all strings are in the
default Windows ANSI encoding, which unfortunately depends on the installed
Windows version and system settings. This leads to a bunch of problems
when dealing with non-ASCII file names, scripts and environment variables
(including the infamous %HOME%-directory, which features at least five bug
reports on the msysgit issue tracker and some more on the mailing list).
Luckily, Windows natively supports Unicode, and cramming Windows' native
UTF-16 strings into UTF-8 encoded char* strings that are compatible with
the POSIX APIs exposed by msys.dll is not that difficult.
This patch replaces the Win32 *A APIs used by msys.dll with wrappers that
accept and return UTF-8 encoded char* strings and delegate to the Windows
native *W (UTF-16) APIs instead. API functions that only take strings as
input can mostly be handled by a set of simple macros. The rest usually
require just a few lines of code each.
The UTF-8 to UTF-16 conversion function (ported from the msysgit Unicode
patch) tolerates invalid UTF-8, so that it doesn't fail with e.g. legacy
encoded shell scripts.
Console output needs a bit more work than just wrapping APIs. First, we
need to replace WriteFile (which expects GetConsoleOutputCP() encoded
bytes) with WriteConsoleOutputW. Screen position must be tracked based on
printed wchar_t characters instead of UTF-8 bytes. Finally, strings passed
to the [f]write APIs may be split anywhere, even in the middle of a UTF-8
byte sequence. Use a small buffer to store incomplete UTF-8 byte sequences
until we get to print the remaining bytes.
Still TODO:
- handle console input
- add wrappers for dynamically loaded *A functions (see autoload.cc 310ff)
Signed-off-by: Karsten Blees <blees@dcon.de>