vcl PDF tokenizer: fix EOF position when \r is not followed by \n
commit1e0ee8141207a425b56592c136ac5e94fc821173
authorMiklos Vajna <vmiklos@collabora.com>
Wed, 12 May 2021 08:51:09 +0000 (12 10:51 +0200)
committerTomaž Vajngerl <quikee@gmail.com>
Fri, 14 May 2021 03:53:16 +0000 (14 05:53 +0200)
tree505c3a138bf2345a4e1e625d39b4d9e12e413f81
parent285a41709835eba970cd3528a23b47eaea42d282
vcl PDF tokenizer: fix EOF position when \r is not followed by \n

Otherwise this would break partial tokenize when we only read a trailer
in the middle of the file: m_aEOFs.back() is one byte larger than
rStream.Tell(), so we reader past the end of the trailer, resulting in a
tokenize failure.

What's special about the bugdoc:

- it has 2 xrefs, the first is incomplete, and refers to a second which
is later in the file
- the object length is as indirect object, triggering an xref lookup
- the first EOF is followed by a \r, but then not with a \n

This results in reading past the end of the first trailer and then
triggering a lookup failure.

FWIW, pdfium does the same in
<https://pdfium.googlesource.com/pdfium/+/59d107323f6727bbd5f8a4d0843081790638a1dd/core/fpdfapi/parser/cpdf_syntax_parser.cpp#446>,
we're on in sync with it.

(cherry picked from commit 6b1d5bafdc722d07d3dc4980764275a6caa707ba)

Conflicts:
vcl/qa/cppunit/filter/ipdf/ipdf.cxx

Change-Id: Ia556a25e333b5e4f1418d92a98d74358862120e2
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/115537
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
Reviewed-by: Tomaž Vajngerl <quikee@gmail.com>
vcl/qa/cppunit/filter/ipdf/data/comment-end.pdf [new file with mode: 0644]
vcl/qa/cppunit/filter/ipdf/ipdf.cxx
vcl/source/filter/ipdf/pdfdocument.cxx