1 .\" Copyright 1997-2007 Glyph & Cog, LLC
2 .TH pdftotext 1 "27 Febuary 2007"
4 pdftotext \- Portable Document Format (PDF) to text converter
13 converts Portable Document Format (PDF) files to plain text.
15 Pdftotext reads the PDF file,
17 and writes a text file,
21 is not specified, pdftotext converts
27 is \'-', the text is sent to stdout.
28 .SH CONFIGURATION FILE
29 Pdftotext reads a configuration file at startup. It first tries to
30 find the user's private config file, ~/.xpdfrc. If that doesn't
31 exist, it looks for a system-wide config file, typically
32 /usr/local/etc/xpdfrc (but this location can be changed when pdftotext
37 Many of the following options can be set with configuration file
38 commands. These are listed in square brackets with the description of
39 the corresponding command line option.
42 Specifies the first page to convert.
45 Specifies the last page to convert.
48 Maintain (as best as possible) the original physical layout of the
49 text. The default is to \'undo' physical layout (columns,
50 hyphenation, etc.) and output the text in reading order.
53 Keep the text in content stream order. This is a hack which often
54 "undoes" column formatting, etc. Use of raw mode is no longer
58 Generate a simple HTML file, including the meta information. This
59 simply wraps the text in <pre> and </pre> and prepends the meta
62 .BI \-enc " encoding-name"
63 Sets the encoding to use for text output. The
65 must be defined with the unicodeMap command (see
67 The encoding name is case-sensitive. This defaults to "Latin1" (which
68 is a built-in encoding).
69 .RB "[config file: " textEncoding ]
71 .BI \-eol " unix | dos | mac"
72 Sets the end-of-line convention to use for text output.
73 .RB "[config file: " textEOL ]
76 Don't insert page breaks (form feed characters) between pages.
77 .RB "[config file: " textPageBreaks ]
80 Specify the owner password for the PDF file. Providing this will
81 bypass all security restrictions.
84 Specify the user password for the PDF file.
87 Don't print any messages or errors.
88 .RB "[config file: " errQuiet ]
90 .BI \-cfg " config-file"
93 in place of ~/.xpdfrc or the system-wide config file.
96 Print copyright and version information.
99 Print usage information.
105 Some PDF files contain fonts whose encodings have been mangled beyond
106 recognition. There is no way (short of OCR) to extract text from
109 The Xpdf tools use the following exit codes:
115 Error opening a PDF file.
118 Error opening an output file.
121 Error related to PDF permissions.
126 The pdftotext software and documentation are copyright 1996-2007 Glyph
137 .B http://www.foolabs.com/xpdf/