join,uniq: support multi-byte separators
commit11b01fc21f1dff2685477c03596a0a4009aec7da
authorPaul Eggert <eggert@cs.ucla.edu>
Mon, 30 Oct 2023 07:32:51 +0000 (30 00:32 -0700)
committerPaul Eggert <eggert@cs.ucla.edu>
Mon, 30 Oct 2023 07:58:04 +0000 (30 00:58 -0700)
tree0d4dc199bed6808d0168bd5e915d1bd48fbda64d
parent2709bea0f440507ac009e6e7ded453bb792d6842
join,uniq: support multi-byte separators

* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove cu-ctype, as this module
is now more trouble than it’s worth.  All uses removed.
Add skipchars.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
Remove.
* gl/lib/skipchars.c, gl/lib/skipchars.h, gl/modules/skipchars:
* tests/misc/join-utf8.sh:
New files.
* src/join.c: Include skipchars.h and mcel.h instead of cu-ctype.h.
(tab): Now mcel_t, not int.  All uses changed.
(output_separator, output_seplen): New static vars.
(eq_tab, newline_or_blank, comma_or_blank): New functions.
(xfields, prfields, prjoin, add_field_list, main):
Support multi-byte characters.
* src/numfmt.c: Include ctype.h, skipchars.h.
Do not include cu-ctype.h.
(newline_or_blank): New function.
(next_field): Support multi-byte characters.
* src/sort.c: Include ctype.h instead of cu-ctype.h.
(inittables): Open-code field_sep since it no longer exists.
‘sort’ is not multi-byte safe yet, but when it is this code
will need revamping anyway.
* src/uniq.c: Include mcel.h and skipchars.h instead of cu-ctype.h.
(newline_or_blank): New function.
(find_field): Support multi-byte characters.
* tests/local.mk (all_tests): Add tests/misc/join-utf8.sh
14 files changed:
NEWS
bootstrap.conf
gl/lib/cu-ctype.c [deleted file]
gl/lib/cu-ctype.h [deleted file]
gl/lib/skipchars.c [new file with mode: 0644]
gl/lib/skipchars.h [new file with mode: 0644]
gl/modules/cu-ctype [deleted file]
gl/modules/skipchars [new file with mode: 0644]
src/join.c
src/numfmt.c
src/sort.c
src/uniq.c
tests/local.mk
tests/misc/join-utf8.sh [new file with mode: 0755]