1 .\" $File: file.man,v 1.82 2009/11/04 22:30:34 christos Exp $
7 .Nd determine file type
18 .Op Fl m Ar magicfiles
23 .Op Fl m Ar magicfiles
27 This manual page documents version 5.04 of the
32 tests each argument in an attempt to classify it.
33 There are three sets of tests, performed in this order:
34 filesystem tests, magic tests, and language tests.
37 test that succeeds causes the file type to be printed.
39 The type printed will usually contain one of the words
41 (the file contains only
42 printing characters and a few common control
43 characters and is probably safe to read on an
47 (the file contains the result of compiling a program
48 in a form understandable to some
53 meaning anything else (data is usually
56 Exceptions are well-known file formats (core files, tar archives)
57 that are known to contain binary data.
58 When modifying magic files or the program itself, make sure to
59 .Em "preserve these keywords" .
60 Users depend on knowing that all the readable files in a directory
64 Don't do as Berkeley did and change
65 .Sq shell commands text
69 The filesystem tests are based on examining the return from a
72 The program checks to see if the file is empty,
73 or if it's some sort of special file.
74 Any known file types appropriate to the system you are running on
75 (sockets, symbolic links, or named pipes (FIFOs) on those systems that
77 are intuited if they are defined in
78 the system header file
81 The magic tests are used to check for files with data in
82 particular fixed formats.
83 The canonical example of this is a binary executable (compiled program)
85 file, whose format is defined in
90 in the standard include directory.
93 stored in a particular place
94 near the beginning of the file that tells the
95 .Dv UNIX operating system
96 that the file is a binary executable, and which of several types thereof.
99 has been applied by extension to data files.
100 Any file with some invariant identifier at a small fixed
101 offset into the file can usually be described in this way.
102 The information identifying these files is read from the compiled
104 .Pa /mingw/share/misc/magic.mgc ,
105 or the files in the directory
106 .Pa /mingw/share/misc/magic
107 if the compiled file does not exist. In addition, if
111 exists, it will be used in preference to the system magic files.
113 If a file does not match any of the entries in the magic file,
114 it is examined to see if it seems to be a text file.
115 ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
116 (such as those used on Macintosh and IBM PC systems),
117 UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
118 character sets can be distinguished by the different
119 ranges and sequences of bytes that constitute printable text
121 If a file passes any of these tests, its character set is reported.
122 ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
125 because they will be mostly readable on nearly any terminal;
126 UTF-16 and EBCDIC are only
129 they contain text, it is text that will require translation
130 before it can be read.
133 will attempt to determine other characteristics of text-type files.
134 If the lines of a file are terminated by CR, CRLF, or NEL, instead
135 of the Unix-standard LF, this will be reported.
136 Files that contain embedded escape sequences or overstriking
137 will also be identified.
141 has determined the character set used in a text-type file,
143 attempt to determine in what language the file is written.
144 The language tests look for particular strings (cf.
146 ) that can appear anywhere in the first few blocks of a file.
147 For example, the keyword
149 indicates that the file is most likely a
151 input file, just as the keyword
153 indicates a C program.
154 These tests are less reliable than the previous
155 two groups, so they are performed last.
156 The language test routines also test for some miscellany
161 Any file that cannot be identified as having been written
162 in any of the character sets listed above is simply said to be
165 .Bl -tag -width indent
167 Do not prepend filenames to output lines (brief mode).
171 output file that contains a pre-parsed version of the magic file or directory.
172 .It Fl c , -checking-printout
173 Cause a checking printout of the parsed form of the magic file.
174 This is usually used in conjunction with the
176 flag to debug a new magic file before installing it.
177 .It Fl e , -exclude Ar testname
178 Exclude the test named in
180 from the list of tests made to determine the file type. Valid test names
182 .Bl -tag -width compress
185 application type (only on EMX).
187 Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the
191 Different text encodings for soft magic tests.
193 Looks for known tokens inside text files.
195 Prints details of Compound Document Files.
197 Checks for, and looks inside, compressed files.
199 Prints ELF file details.
201 Consults magic files.
205 .It Fl F , -separator Ar separator
206 Use the specified string as the separator between the filename and the
207 file result returned. Defaults to
209 .It Fl f , -files-from Ar namefile
210 Read the names of the files to be examined from
213 before the argument list.
216 or at least one filename argument must be present;
217 to test the standard input, use
219 as a filename argument.
220 .It Fl h , -no-dereference
221 option causes symlinks not to be followed
222 (on systems that support symbolic links). This is the default if the
227 Causes the file command to output mime type strings rather than the more
228 traditional human readable ones. Thus it may say
229 .Sq text/plain; charset=us-ascii
232 In order for this option to work, file changes the way
233 it handles files recognized by the command itself (such as many of the
234 text file types, directories etc), and makes use of an alternative
237 (See the FILES section, below).
238 .It Fl -mime-type , -mime-encoding
241 but print only the specified element(s).
242 .It Fl k , -keep-going
243 Don't stop at the first match, keep going. Subsequent matches will be
247 (If you want a newline, see the
250 .It Fl L , -dereference
251 option causes symlinks to be followed, as the like-named option in
253 (on systems that support symbolic links).
254 This is the default if the environment variable
257 .It Fl m , -magic-file Ar magicfiles
258 Specify an alternate list of files and directories containing magic.
259 This can be a single item, or a colon-separated list.
260 If a compiled magic file is found alongside a file or directory, it will be used instead.
262 Don't pad filenames so that they align in the output.
263 .It Fl n , -no-buffer
264 Force stdout to be flushed after checking each file.
265 This is only useful if checking a list of files.
266 It is intended to be used by programs that want filetype output from a pipe.
267 .It Fl p , -preserve-date
268 On systems that support
272 attempt to preserve the access time of files analyzed, to pretend that
276 Don't translate unprintable characters to \eooo.
279 translates unprintable characters to their octal representation.
280 .It Fl s , -special-files
283 only attempts to read and determine the type of argument files which
285 reports are ordinary files.
286 This prevents problems, because reading special files may have peculiar
292 to also read argument files which are block or character special files.
293 This is useful for determining the filesystem types of the data in raw
294 disk partitions, which are block special files.
295 This option also causes
297 to disregard the file size as reported by
299 since on some systems it reports a zero size for raw disk partitions.
301 Print the version of the program and exit.
302 .It Fl z , -uncompress
303 Try to look inside compressed files.
305 Output a null character
307 after the end of the filename. Nice to
309 the output. This does not affect the separator which is still printed.
311 Print a help message and exit.
314 .Bl -tag -width /mingw/share/misc/magic.mgc -compact
315 .It Pa /mingw/share/misc/magic.mgc
316 Default compiled list of magic.
317 .It Pa /mingw/share/misc/magic
318 Directory containing default magic files.
321 The environment variable
323 can be used to set the default magic file name.
324 If that variable is set, then
326 will not attempt to open
331 to the value of this variable as appropriate.
332 The environment variable
334 controls (on systems that support symbolic links), whether
336 will attempt to follow symlinks or not. If set, then
338 follows symlink, otherwise it does not. This is also controlled
350 .Sh STANDARDS CONFORMANCE
351 This program is believed to exceed the System V Interface Definition
352 of FILE(CMD), as near as one can determine from the vague language
354 Its behavior is mostly compatible with the System V program of the same name.
355 This version knows more magic, however, so it will produce
356 different (albeit more accurate) output in many cases.
357 .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
359 The one significant difference
360 between this version and System V
361 is that this version treats any white space
362 as a delimiter, so that spaces in pattern strings must be escaped.
364 .Bd -literal -offset indent
365 >10 string language impress\ (imPRESS data)
368 in an existing magic file would have to be changed to
369 .Bd -literal -offset indent
370 >10 string language\e impress (imPRESS data)
373 In addition, in this version, if a pattern string contains a backslash,
376 .Bd -literal -offset indent
377 0 string \ebegindata Andrew Toolkit document
380 in an existing magic file would have to be changed to
381 .Bd -literal -offset indent
382 0 string \e\ebegindata Andrew Toolkit document
385 SunOS releases 3.2 and later from Sun Microsystems include a
387 command derived from the System V one, but with some extensions.
388 My version differs from Sun's only in minor ways.
389 It includes the extension of the
393 .Bd -literal -offset indent
394 >16 long&0x7fffffff >0 not stripped
397 The magic file entries have been collected from various sources,
398 mainly USENET, and contributed by various authors.
399 Christos Zoulas (address below) will collect additional
400 or corrected magic file entries.
401 A consolidation of magic file entries
402 will be distributed periodically.
404 The order of entries in the magic file is significant.
405 Depending on what system you are using, the order that
406 they are put together may be incorrect.
409 command uses a magic file,
410 keep the old magic file around for comparison purposes
412 .Pa /mingw/share/misc/magic.orig ).
414 .Bd -literal -offset indent
415 $ file file.c file /dev/{wd0a,hda}
416 file.c: C program text
417 file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
418 dynamically linked (uses shared libs), stripped
419 /dev/wd0a: block special (0/0)
420 /dev/hda: block special (3/0)
422 $ file -s /dev/wd0{b,d}
424 /dev/wd0d: x86 boot sector
426 $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
427 /dev/hda: x86 boot sector
428 /dev/hda1: Linux/i386 ext2 filesystem
429 /dev/hda2: x86 boot sector
430 /dev/hda3: x86 boot sector, extended partition table
431 /dev/hda4: Linux/i386 ext2 filesystem
432 /dev/hda5: Linux/i386 swap file
433 /dev/hda6: Linux/i386 swap file
434 /dev/hda7: Linux/i386 swap file
435 /dev/hda8: Linux/i386 swap file
439 $ file -i file.c file /dev/{wd0a,hda}
441 file: application/x-executable
442 /dev/hda: application/x-not-regular-file
443 /dev/wd0a: application/x-not-regular-file
450 .Dv UNIX since at least Research Version 4
451 (man page dated November, 1973).
452 The System V version introduced one significant major change:
453 the external list of magic types.
454 This slowed the program down slightly but made it a lot more flexible.
456 This program, based on the System V version,
457 was written by Ian Darwin <ian@darwinsys.com>
458 without looking at anybody else's source code.
460 John Gilmore revised the code extensively, making it better than
462 Geoff Collyer found several inadequacies
463 and provided some magic file entries.
464 Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
466 Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
468 Primary development and maintenance from 1990 to the present by
469 Christos Zoulas (christos@astron.com).
471 Altered by Chris Lowth, chris@lowth.com, 2000:
474 option to output mime type strings, using an alternative
475 magic file and internal logic.
477 Altered by Eric Fischer (enf@pobox.com), July, 2000,
478 to identify character codes and attempt to identify the languages
481 Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME
482 support and merge MIME and non-MIME magic, support directories as well
483 as files of magic, apply many bug fixes and improve the build system.
485 The list of contributors to the
487 directory (magic files)
488 is too long to include here.
489 You know who you are; thank you.
490 Many contributors are listed in the source files.
492 Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
493 Covered by the standard Berkeley Software Distribution copyright; see the file
494 LEGAL.NOTICE in the source distribution.
500 were written by John Gilmore from his public-domain
502 program, and are not covered by the above license.
505 There must be a better way to automate the construction of the Magic
506 file from all the glop in Magdir.
510 uses several algorithms that favor speed over accuracy,
511 thus it can be misled about the contents of
515 The support for text files (primarily for programming languages)
516 is simplistic, inefficient and requires recompilation to update.
518 The list of keywords in
520 probably belongs in the Magic file.
521 This could be done by using some keyword like
523 for the offset value.
525 Complain about conflicts in the magic file entries.
526 Make a rule that the magic entries sort based on file offset rather
527 than position within the magic file?
529 The program should provide a way to give an estimate
533 We end up removing guesses (e.g.
535 as first 5 chars of file) because
536 they are not as good as other guesses (e.g.
541 Still, if the others don't pan out, it should be possible to use the
544 This manual page, and particularly this section, is too long.
547 returns 0 on success, and non-zero on error.
549 You can obtain the original author's latest version by anonymous FTP
553 .Dv /pub/file/file-X.YZ.tar.gz