s4/scripting/bin: open unicode files with utf8 encoding and write unicode string
In files like `libcli/util/werror_err_table.txt` and `libcli/util/ntstatus_err_table.txt`,
there were unicode quote symbols at line 6:
...(“this documentation”)...
In `libcli/util/wscript_build`, it will run `gen_werror.py` and `gen_ntstatus.py`
to `open` above files, read content from them and write to other files.
When encoding not specified, `open` in both python 2/3 will guess encoding from locale.
When locale is not set, it defaults to POSIX or C, and then python will use
encoding `ANSI_X3.4-1968`.
So, on a system locale is not set, `make` will fail with encoding error
for both python 2 and 3:
File "/home/ubuntu/samba/source4/scripting/bin/gen_werror.py", line 139, in main
errors = parseErrorDescriptions(input_file, True, transformErrorName)
File "/home/ubuntu/samba/source4/scripting/bin/gen_error_common.py", line 52, in parseErrorDescriptions
for line in file_contents:
File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 318: ordinal not in range(128)
In this case, we have to use `io.open` with `encoding='utf8'`.
However, then we got unicode strs and try to write them with other strs
into new file, which means the new file must also open with utf-8 and
all other strs have to be unicode, too.
Instead of prefix `u` to all strs, a more easier/elegant way is to enable
unicode literals for the python scripts, which we normally didn't do in samba.
Since both `gen_werror.py` and `gen_ntstatus.py` are bin scripts and no
other modules import them, it should be ok for this case.
Signed-off-by: Joe Guo <joeg@catalyst.net.nz>
Autobuild-User(master): Douglas Bagnall <dbagnall@samba.org>
Autobuild-Date(master): Fri Feb 8 06:34:47 CET 2019 on sn-devel-144