Decode email headers with quoted encoded words
Although not clearly RFC-2047 compliant, there are cases in the wild where
email headers contain encoded words within quotes. E.g.:
From: "=?UTF-8?q?Christian=20K=C3=B6nig?=" <name@example.com>
Python2 and Python3 have different behavior when parsing such a header
using email.header.decode_words(). Python3 will decode the encoded words
between the quotes whereas Python2 will leave the encoded words in their
encoded state. The goal for this change is to make Python2 behave as
Python3.
A regular expression (email.header.ecre) is used in
email.header.decode_header() to detect and parse encoded words. This regex
is subtly different between Python2 and Python3--the Python2 regex requires
whitespace or the end-of-string after encoded words whereas Python3 does
not. Monkey-patching the Python2 email.header.ecre regex to not require
trailing whitespace is sufficient to make Python2 behave the same as
Python3.
A new test is added to t1800-import.sh to verify this behavior.
Signed-off-by: Peter Grayson <jpgrayson@gmail.com>