grep/pcre2: use PCRE2_UTF even with ASCII patterns
commitdc2c44fbb100fa609174d9069a70e2b54b0591ca
authorRené Scharfe <l.s.r@web.de>
Sat, 18 Dec 2021 19:50:02 +0000 (18 20:50 +0100)
committerJunio C Hamano <gitster@pobox.com>
Mon, 20 Dec 2021 20:45:02 +0000 (20 12:45 -0800)
treeed54967a4db81bca347a379d243850586a035c80
parente9d7761bb94f20acc98824275e317fa82436c25d
grep/pcre2: use PCRE2_UTF even with ASCII patterns

compile_pcre2_pattern() currently uses the option PCRE2_UTF only for
patterns with non-ASCII characters.  Patterns with ASCII wildcards can
match non-ASCII strings, though.  Without that option PCRE2 mishandles
UTF-8 input, though -- it matches parts of multi-byte characters.  Fix
that by using PCRE2_UTF even for ASCII-only patterns.

This is a remake of the reverted ae39ba431a (grep/pcre2: fix an edge
case concerning ascii patterns and UTF-8 data, 2021-10-15).  The change
to the condition and the test are simplified and more targeted.

Original-patch-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep.c
t/t7812-grep-icase-non-ascii.sh