6 xz, unxz, xzcat, lzma, unlzma, lzcat - Compress or decompress .xz and
10 xz [option]... [file]...
12 unxz is equivalent to xz --decompress.
13 xzcat is equivalent to xz --decompress --stdout.
14 lzma is equivalent to xz --format=lzma.
15 unlzma is equivalent to xz --format=lzma --decompress.
16 lzcat is equivalent to xz --format=lzma --decompress --stdout.
18 When writing scripts that need to decompress files, it is recommended
19 to always use the name xz with appropriate arguments (xz -d or xz -dc)
20 instead of the names unxz and xzcat.
23 xz is a general-purpose data compression tool with command line syntax
24 similar to gzip(1) and bzip2(1). The native file format is the .xz
25 format, but the legacy .lzma format used by LZMA Utils and raw com-
26 pressed streams with no container format headers are also supported.
28 xz compresses or decompresses each file according to the selected oper-
29 ation mode. If no files are given or file is -, xz reads from standard
30 input and writes the processed data to standard output. xz will refuse
31 (display an error and skip the file) to write compressed data to stan-
32 dard output if it is a terminal. Similarly, xz will refuse to read
33 compressed data from standard input if it is a terminal.
35 Unless --stdout is specified, files other than - are written to a new
36 file whose name is derived from the source file name:
38 o When compressing, the suffix of the target file format (.xz or
39 .lzma) is appended to the source filename to get the target file-
42 o When decompressing, the .xz or .lzma suffix is removed from the
43 filename to get the target filename. xz also recognizes the suf-
44 fixes .txz and .tlz, and replaces them with the .tar suffix.
46 If the target file already exists, an error is displayed and the file
49 Unless writing to standard output, xz will display a warning and skip
50 the file if any of the following applies:
52 o File is not a regular file. Symbolic links are not followed, and
53 thus they are not considered to be regular files.
55 o File has more than one hard link.
57 o File has setuid, setgid, or sticky bit set.
59 o The operation mode is set to compress and the file already has a
60 suffix of the target file format (.xz or .txz when compressing to
61 the .xz format, and .lzma or .tlz when compressing to the .lzma for-
64 o The operation mode is set to decompress and the file doesn't have a
65 suffix of any of the supported file formats (.xz, .txz, .lzma, or
68 After successfully compressing or decompressing the file, xz copies the
69 owner, group, permissions, access time, and modification time from the
70 source file to the target file. If copying the group fails, the per-
71 missions are modified so that the target file doesn't become accessible
72 to users who didn't have permission to access the source file. xz
73 doesn't support copying other metadata like access control lists or
74 extended attributes yet.
76 Once the target file has been successfully closed, the source file is
77 removed unless --keep was specified. The source file is never removed
78 if the output is written to standard output.
80 Sending SIGINFO or SIGUSR1 to the xz process makes it print progress
81 information to standard error. This has only limited use since when
82 standard error is a terminal, using --verbose will display an automati-
83 cally updating progress indicator.
86 The memory usage of xz varies from a few hundred kilobytes to several
87 gigabytes depending on the compression settings. The settings used
88 when compressing a file determine the memory requirements of the decom-
89 pressor. Typically the decompressor needs 5 % to 20 % of the amount of
90 memory that the compressor needed when creating the file. For example,
91 decompressing a file created with xz -9 currently requires 65 MiB of
92 memory. Still, it is possible to have .xz files that require several
93 gigabytes of memory to decompress.
95 Especially users of older systems may find the possibility of very
96 large memory usage annoying. To prevent uncomfortable surprises, xz
97 has a built-in memory usage limiter, which is disabled by default.
98 While some operating systems provide ways to limit the memory usage of
99 processes, relying on it wasn't deemed to be flexible enough (e.g.
100 using ulimit(1) to limit virtual memory tends to cripple mmap(2)).
102 The memory usage limiter can be enabled with the command line option
103 --memlimit=limit. Often it is more convenient to enable the limiter by
104 default by setting the environment variable XZ_DEFAULTS, e.g.
105 XZ_DEFAULTS=--memlimit=150MiB. It is possible to set the limits sepa-
106 rately for compression and decompression by using --memlimit-com-
107 press=limit and --memlimit-decompress=limit. Using these two options
108 outside XZ_DEFAULTS is rarely useful because a single run of xz cannot
109 do both compression and decompression and --memlimit=limit (or -M
110 limit) is shorter to type on the command line.
112 If the specified memory usage limit is exceeded when decompressing, xz
113 will display an error and decompressing the file will fail. If the
114 limit is exceeded when compressing, xz will try to scale the settings
115 down so that the limit is no longer exceeded (except when using --for-
116 mat=raw or --no-adjust). This way the operation won't fail unless the
117 limit is very small. The scaling of the settings is done in steps that
118 don't match the compression level presets, e.g. if the limit is only
119 slightly less than the amount required for xz -9, the settings will be
120 scaled down only a little, not all the way down to xz -8.
122 Concatenation and padding with .xz files
123 It is possible to concatenate .xz files as is. xz will decompress such
124 files as if they were a single .xz file.
126 It is possible to insert padding between the concatenated parts or
127 after the last part. The padding must consist of null bytes and the
128 size of the padding must be a multiple of four bytes. This can be use-
129 ful e.g. if the .xz file is stored on a medium that measures file sizes
132 Concatenation and padding are not allowed with .lzma files or raw
136 Integer suffixes and special values
137 In most places where an integer argument is expected, an optional suf-
138 fix is supported to easily indicate large integers. There must be no
139 space between the integer and the suffix.
141 KiB Multiply the integer by 1,024 (2^10). Ki, k, kB, K, and KB are
142 accepted as synonyms for KiB.
144 MiB Multiply the integer by 1,048,576 (2^20). Mi, m, M, and MB are
145 accepted as synonyms for MiB.
147 GiB Multiply the integer by 1,073,741,824 (2^30). Gi, g, G, and GB
148 are accepted as synonyms for GiB.
150 The special value max can be used to indicate the maximum integer value
151 supported by the option.
154 If multiple operation mode options are given, the last one takes
158 Compress. This is the default operation mode when no operation
159 mode option is specified and no other operation mode is implied
160 from the command name (for example, unxz implies --decompress).
162 -d, --decompress, --uncompress
166 Test the integrity of compressed files. This option is equiva-
167 lent to --decompress --stdout except that the decompressed data
168 is discarded instead of being written to standard output. No
169 files are created or removed.
172 Print information about compressed files. No uncompressed out-
173 put is produced, and no files are created or removed. In list
174 mode, the program cannot read the compressed data from standard
175 input or from other unseekable sources.
177 The default listing shows basic information about files, one
178 file per line. To get more detailed information, use also the
179 --verbose option. For even more information, use --verbose
180 twice, but note that this may be slow, because getting all the
181 extra information requires many seeks. The width of verbose
182 output exceeds 80 characters, so piping the output to e.g.
183 less -S may be convenient if the terminal isn't wide enough.
185 The exact output may vary between xz versions and different
186 locales. For machine-readable output, --robot --list should be
191 Don't delete the input files.
194 This option has several effects:
196 o If the target file already exists, delete it before compress-
197 ing or decompressing.
199 o Compress or decompress even if the input is a symbolic link
200 to a regular file, has more than one hard link, or has the
201 setuid, setgid, or sticky bit set. The setuid, setgid, and
202 sticky bits are not copied to the target file.
204 o When used with --decompress --stdout and xz cannot recognize
205 the type of the source file, copy the source file as is to
206 standard output. This allows xzcat --force to be used like
207 cat(1) for files that have not been compressed with xz. Note
208 that in future, xz might support new compressed file formats,
209 which may make xz decompress more types of files instead of
210 copying them as is to standard output. --format=format can
211 be used to restrict xz to decompress only a single file for-
214 -c, --stdout, --to-stdout
215 Write the compressed or decompressed data to standard output
216 instead of a file. This implies --keep.
219 Disable creation of sparse files. By default, if decompressing
220 into a regular file, xz tries to make the file sparse if the
221 decompressed data contains long sequences of binary zeros. It
222 also works when writing to standard output as long as standard
223 output is connected to a regular file and certain additional
224 conditions are met to make it safe. Creating sparse files may
225 save disk space and speed up the decompression by reducing the
228 -S .suf, --suffix=.suf
229 When compressing, use .suf as the suffix for the target file
230 instead of .xz or .lzma. If not writing to standard output and
231 the source file already has the suffix .suf, a warning is dis-
232 played and the file is skipped.
234 When decompressing, recognize files with the suffix .suf in
235 addition to files with the .xz, .txz, .lzma, or .tlz suffix. If
236 the source file has the suffix .suf, the suffix is removed to
237 get the target filename.
239 When compressing or decompressing raw streams (--format=raw),
240 the suffix must always be specified unless writing to standard
241 output, because there is no default suffix for raw streams.
244 Read the filenames to process from file; if file is omitted,
245 filenames are read from standard input. Filenames must be ter-
246 minated with the newline character. A dash (-) is taken as a
247 regular filename; it doesn't mean standard input. If filenames
248 are given also as command line arguments, they are processed
249 before the filenames read from file.
252 This is identical to --files[=file] except that each filename
253 must be terminated with the null character.
255 Basic file format and compression options
256 -F format, --format=format
257 Specify the file format to compress or decompress:
259 auto This is the default. When compressing, auto is equiva-
260 lent to xz. When decompressing, the format of the input
261 file is automatically detected. Note that raw streams
262 (created with --format=raw) cannot be auto-detected.
264 xz Compress to the .xz file format, or accept only .xz files
268 Compress to the legacy .lzma file format, or accept only
269 .lzma files when decompressing. The alternative name
270 alone is provided for backwards compatibility with LZMA
273 raw Compress or uncompress a raw stream (no headers). This
274 is meant for advanced users only. To decode raw streams,
275 you need use --format=raw and explicitly specify the fil-
276 ter chain, which normally would have been stored in the
279 -C check, --check=check
280 Specify the type of the integrity check. The check is calcu-
281 lated from the uncompressed data and stored in the .xz file.
282 This option has an effect only when compressing into the .xz
283 format; the .lzma format doesn't support integrity checks. The
284 integrity check (if any) is verified when the .xz file is decom-
287 Supported check types:
289 none Don't calculate an integrity check at all. This is usu-
290 ally a bad idea. This can be useful when integrity of
291 the data is verified by other means anyway.
293 crc32 Calculate CRC32 using the polynomial from IEEE-802.3
296 crc64 Calculate CRC64 using the polynomial from ECMA-182. This
297 is the default, since it is slightly better than CRC32 at
298 detecting damaged files and the speed difference is neg-
301 sha256 Calculate SHA-256. This is somewhat slower than CRC32
304 Integrity of the .xz headers is always verified with CRC32. It
305 is not possible to change or disable it.
308 Select a compression preset level. The default is -6. If mul-
309 tiple preset levels are specified, the last one takes effect.
310 If a custom filter chain was already specified, setting a com-
311 pression preset level clears the custom filter chain.
313 The differences between the presets are more significant than
314 with gzip(1) and bzip2(1). The selected compression settings
315 determine the memory requirements of the decompressor, thus
316 using a too high preset level might make it painful to decom-
317 press the file on an old system with little RAM. Specifically,
318 it's not a good idea to blindly use -9 for everything like it
319 often is with gzip(1) and bzip2(1).
322 These are somewhat fast presets. -0 is sometimes faster
323 than gzip -9 while compressing much better. The higher
324 ones often have speed comparable to bzip2(1) with compa-
325 rable or better compression ratio, although the results
326 depend a lot on the type of data being compressed.
329 Good to very good compression while keeping decompressor
330 memory usage reasonable even for old systems. -6 is the
331 default, which is usually a good choice e.g. for dis-
332 tributing files that need to be decompressible even on
333 systems with only 16 MiB RAM. (-5e or -6e may be worth
334 considering too. See --extreme.)
337 These are like -6 but with higher compressor and decom-
338 pressor memory requirements. These are useful only when
339 compressing files bigger than 8 MiB, 16 MiB, and 32 MiB,
342 On the same hardware, the decompression speed is approximately a
343 constant number of bytes of compressed data per second. In
344 other words, the better the compression, the faster the decom-
345 pression will usually be. This also means that the amount of
346 uncompressed output produced per second can vary a lot.
348 The following table summarises the features of the presets:
350 Preset DictSize CompCPU CompMem DecMem
351 -0 256 KiB 0 3 MiB 1 MiB
352 -1 1 MiB 1 9 MiB 2 MiB
353 -2 2 MiB 2 17 MiB 3 MiB
354 -3 4 MiB 3 32 MiB 5 MiB
355 -4 4 MiB 4 48 MiB 5 MiB
356 -5 8 MiB 5 94 MiB 9 MiB
357 -6 8 MiB 6 94 MiB 9 MiB
358 -7 16 MiB 6 186 MiB 17 MiB
359 -8 32 MiB 6 370 MiB 33 MiB
360 -9 64 MiB 6 674 MiB 65 MiB
364 o DictSize is the LZMA2 dictionary size. It is waste of memory
365 to use a dictionary bigger than the size of the uncompressed
366 file. This is why it is good to avoid using the presets -7
367 ... -9 when there's no real need for them. At -6 and lower,
368 the amount of memory wasted is usually low enough to not mat-
371 o CompCPU is a simplified representation of the LZMA2 settings
372 that affect compression speed. The dictionary size affects
373 speed too, so while CompCPU is the same for levels -6 ... -9,
374 higher levels still tend to be a little slower. To get even
375 slower and thus possibly better compression, see --extreme.
377 o CompMem contains the compressor memory requirements in the
378 single-threaded mode. It may vary slightly between xz ver-
379 sions. Memory requirements of some of the future multi-
380 threaded modes may be dramatically higher than that of the
381 single-threaded mode.
383 o DecMem contains the decompressor memory requirements. That
384 is, the compression settings determine the memory require-
385 ments of the decompressor. The exact decompressor memory
386 usage is slighly more than the LZMA2 dictionary size, but the
387 values in the table have been rounded up to the next full
391 Use a slower variant of the selected compression preset level
392 (-0 ... -9) to hopefully get a little bit better compression
393 ratio, but with bad luck this can also make it worse. Decom-
394 pressor memory usage is not affected, but compressor memory
395 usage increases a little at preset levels -0 ... -3.
397 Since there are two presets with dictionary sizes 4 MiB and
398 8 MiB, the presets -3e and -5e use slightly faster settings
399 (lower CompCPU) than -4e and -6e, respectively. That way no two
400 presets are identical.
402 Preset DictSize CompCPU CompMem DecMem
403 -0e 256 KiB 8 4 MiB 1 MiB
404 -1e 1 MiB 8 13 MiB 2 MiB
405 -2e 2 MiB 8 25 MiB 3 MiB
406 -3e 4 MiB 7 48 MiB 5 MiB
407 -4e 4 MiB 8 48 MiB 5 MiB
408 -5e 8 MiB 7 94 MiB 9 MiB
409 -6e 8 MiB 8 94 MiB 9 MiB
410 -7e 16 MiB 8 186 MiB 17 MiB
411 -8e 32 MiB 8 370 MiB 33 MiB
412 -9e 64 MiB 8 674 MiB 65 MiB
414 For example, there are a total of four presets that use 8 MiB
415 dictionary, whose order from the fastest to the slowest is -5,
419 --best These are somewhat misleading aliases for -0 and -9, respec-
420 tively. These are provided only for backwards compatibility
421 with LZMA Utils. Avoid using these options.
423 --memlimit-compress=limit
424 Set a memory usage limit for compression. If this option is
425 specified multiple times, the last one takes effect.
427 If the compression settings exceed the limit, xz will adjust the
428 settings downwards so that the limit is no longer exceeded and
429 display a notice that automatic adjustment was done. Such
430 adjustments are not made when compressing with --format=raw or
431 if --no-adjust has been specified. In those cases, an error is
432 displayed and xz will exit with exit status 1.
434 The limit can be specified in multiple ways:
436 o The limit can be an absolute value in bytes. Using an inte-
437 ger suffix like MiB can be useful. Example: --memlimit-com-
440 o The limit can be specified as a percentage of total physical
441 memory (RAM). This can be useful especially when setting the
442 XZ_DEFAULTS environment variable in a shell initialization
443 script that is shared between different computers. That way
444 the limit is automatically bigger on systems with more mem-
445 ory. Example: --memlimit-compress=70%
447 o The limit can be reset back to its default value by setting
448 it to 0. This is currently equivalent to setting the limit
449 to max (no memory usage limit). Once multithreading support
450 has been implemented, there may be a difference between 0 and
451 max for the multithreaded case, so it is recommended to use 0
452 instead of max until the details have been decided.
454 See also the section Memory usage.
456 --memlimit-decompress=limit
457 Set a memory usage limit for decompression. This also affects
458 the --list mode. If the operation is not possible without
459 exceeding the limit, xz will display an error and decompressing
460 the file will fail. See --memlimit-compress=limit for possible
461 ways to specify the limit.
463 -M limit, --memlimit=limit, --memory=limit
464 This is equivalent to specifying --memlimit-compress=limit
465 --memlimit-decompress=limit.
468 Display an error and exit if the compression settings exceed the
469 memory usage limit. The default is to adjust the settings down-
470 wards so that the memory usage limit is not exceeded. Automatic
471 adjusting is always disabled when creating raw streams (--for-
474 -T threads, --threads=threads
475 Specify the number of worker threads to use. The actual number
476 of threads can be less than threads if using more threads would
477 exceed the memory usage limit.
479 Multithreaded compression and decompression are not implemented
480 yet, so this option has no effect for now.
482 As of writing (2010-09-27), it hasn't been decided if threads
483 will be used by default on multicore systems once support for
484 threading has been implemented. Comments are welcome. The com-
485 plicating factor is that using many threads will increase the
486 memory usage dramatically. Note that if multithreading will be
487 the default, it will probably be done so that single-threaded
488 and multithreaded modes produce the same output, so compression
489 ratio won't be significantly affected if threading will be
492 Custom compressor filter chains
493 A custom filter chain allows specifying the compression settings in
494 detail instead of relying on the settings associated to the preset lev-
495 els. When a custom filter chain is specified, the compression preset
496 level options (-0 ... -9 and --extreme) are silently ignored.
498 A filter chain is comparable to piping on the command line. When com-
499 pressing, the uncompressed input goes to the first filter, whose output
500 goes to the next filter (if any). The output of the last filter gets
501 written to the compressed file. The maximum number of filters in the
502 chain is four, but typically a filter chain has only one or two fil-
505 Many filters have limitations on where they can be in the filter chain:
506 some filters can work only as the last filter in the chain, some only
507 as a non-last filter, and some work in any position in the chain.
508 Depending on the filter, this limitation is either inherent to the fil-
509 ter design or exists to prevent security issues.
511 A custom filter chain is specified by using one or more filter options
512 in the order they are wanted in the filter chain. That is, the order
513 of filter options is significant! When decoding raw streams (--for-
514 mat=raw), the filter chain is specified in the same order as it was
515 specified when compressing.
517 Filters take filter-specific options as a comma-separated list. Extra
518 commas in options are ignored. Every option has a default value, so
519 you need to specify only those you want to change.
523 Add LZMA1 or LZMA2 filter to the filter chain. These filters
524 can be used only as the last filter in the chain.
526 LZMA1 is a legacy filter, which is supported almost solely due
527 to the legacy .lzma file format, which supports only LZMA1.
528 LZMA2 is an updated version of LZMA1 to fix some practical
529 issues of LZMA1. The .xz format uses LZMA2 and doesn't support
530 LZMA1 at all. Compression speed and ratios of LZMA1 and LZMA2
531 are practically the same.
533 LZMA1 and LZMA2 share the same set of options:
536 Reset all LZMA1 or LZMA2 options to preset. Preset con-
537 sist of an integer, which may be followed by single-let-
538 ter preset modifiers. The integer can be from 0 to 9,
539 matching the command line options -0 ... -9. The only
540 supported modifier is currently e, which matches
541 --extreme. The default preset is 6, from which the
542 default values for the rest of the LZMA1 or LZMA2 options
546 Dictionary (history buffer) size indicates how many bytes
547 of the recently processed uncompressed data is kept in
548 memory. The algorithm tries to find repeating byte
549 sequences (matches) in the uncompressed data, and replace
550 them with references to the data currently in the dictio-
551 nary. The bigger the dictionary, the higher is the
552 chance to find a match. Thus, increasing dictionary size
553 usually improves compression ratio, but a dictionary big-
554 ger than the uncompressed file is waste of memory.
556 Typical dictionary size is from 64 KiB to 64 MiB. The
557 minimum is 4 KiB. The maximum for compression is cur-
558 rently 1.5 GiB (1536 MiB). The decompressor already sup-
559 ports dictionaries up to one byte less than 4 GiB, which
560 is the maximum for the LZMA1 and LZMA2 stream formats.
562 Dictionary size and match finder (mf) together determine
563 the memory usage of the LZMA1 or LZMA2 encoder. The same
564 (or bigger) dictionary size is required for decompressing
565 that was used when compressing, thus the memory usage of
566 the decoder is determined by the dictionary size used
567 when compressing. The .xz headers store the dictionary
568 size either as 2^n or 2^n + 2^(n-1), so these sizes are
569 somewhat preferred for compression. Other sizes will get
570 rounded up when stored in the .xz headers.
572 lc=lc Specify the number of literal context bits. The minimum
573 is 0 and the maximum is 4; the default is 3. In addi-
574 tion, the sum of lc and lp must not exceed 4.
576 All bytes that cannot be encoded as matches are encoded
577 as literals. That is, literals are simply 8-bit bytes
578 that are encoded one at a time.
580 The literal coding makes an assumption that the highest
581 lc bits of the previous uncompressed byte correlate with
582 the next byte. E.g. in typical English text, an upper-
583 case letter is often followed by a lower-case letter, and
584 a lower-case letter is usually followed by another lower-
585 case letter. In the US-ASCII character set, the highest
586 three bits are 010 for upper-case letters and 011 for
587 lower-case letters. When lc is at least 3, the literal
588 coding can take advantage of this property in the uncom-
591 The default value (3) is usually good. If you want maxi-
592 mum compression, test lc=4. Sometimes it helps a little,
593 and sometimes it makes compression worse. If it makes it
594 worse, test e.g. lc=2 too.
596 lp=lp Specify the number of literal position bits. The minimum
597 is 0 and the maximum is 4; the default is 0.
599 Lp affects what kind of alignment in the uncompressed
600 data is assumed when encoding literals. See pb below for
601 more information about alignment.
603 pb=pb Specify the number of position bits. The minimum is 0
604 and the maximum is 4; the default is 2.
606 Pb affects what kind of alignment in the uncompressed
607 data is assumed in general. The default means four-byte
608 alignment (2^pb=2^2=4), which is often a good choice when
609 there's no better guess.
611 When the aligment is known, setting pb accordingly may
612 reduce the file size a little. E.g. with text files hav-
613 ing one-byte alignment (US-ASCII, ISO-8859-*, UTF-8),
614 setting pb=0 can improve compression slightly. For
615 UTF-16 text, pb=1 is a good choice. If the alignment is
616 an odd number like 3 bytes, pb=0 might be the best
619 Even though the assumed alignment can be adjusted with pb
620 and lp, LZMA1 and LZMA2 still slightly favor 16-byte
621 alignment. It might be worth taking into account when
622 designing file formats that are likely to be often com-
623 pressed with LZMA1 or LZMA2.
625 mf=mf Match finder has a major effect on encoder speed, memory
626 usage, and compression ratio. Usually Hash Chain match
627 finders are faster than Binary Tree match finders. The
628 default depends on the preset: 0 uses hc3, 1-3 use hc4,
629 and the rest use bt4.
631 The following match finders are supported. The memory
632 usage formulas below are rough approximations, which are
633 closest to the reality when dict is a power of two.
635 hc3 Hash Chain with 2- and 3-byte hashing
636 Minimum value for nice: 3
638 dict * 7.5 (if dict <= 16 MiB);
639 dict * 5.5 + 64 MiB (if dict > 16 MiB)
641 hc4 Hash Chain with 2-, 3-, and 4-byte hashing
642 Minimum value for nice: 4
644 dict * 7.5 (if dict <= 32 MiB);
645 dict * 6.5 (if dict > 32 MiB)
647 bt2 Binary Tree with 2-byte hashing
648 Minimum value for nice: 2
649 Memory usage: dict * 9.5
651 bt3 Binary Tree with 2- and 3-byte hashing
652 Minimum value for nice: 3
654 dict * 11.5 (if dict <= 16 MiB);
655 dict * 9.5 + 64 MiB (if dict > 16 MiB)
657 bt4 Binary Tree with 2-, 3-, and 4-byte hashing
658 Minimum value for nice: 4
660 dict * 11.5 (if dict <= 32 MiB);
661 dict * 10.5 (if dict > 32 MiB)
664 Compression mode specifies the method to analyze the data
665 produced by the match finder. Supported modes are fast
666 and normal. The default is fast for presets 0-3 and nor-
669 Usually fast is used with Hash Chain match finders and
670 normal with Binary Tree match finders. This is also what
674 Specify what is considered to be a nice length for a
675 match. Once a match of at least nice bytes is found, the
676 algorithm stops looking for possibly better matches.
678 Nice can be 2-273 bytes. Higher values tend to give bet-
679 ter compression ratio at the expense of speed. The
680 default depends on the preset.
683 Specify the maximum search depth in the match finder.
684 The default is the special value of 0, which makes the
685 compressor determine a reasonable depth from mf and nice.
687 Reasonable depth for Hash Chains is 4-100 and 16-1000 for
688 Binary Trees. Using very high values for depth can make
689 the encoder extremely slow with some files. Avoid set-
690 ting the depth over 1000 unless you are prepared to
691 interrupt the compression in case it is taking far too
694 When decoding raw streams (--format=raw), LZMA2 needs only the
695 dictionary size. LZMA1 needs also lc, lp, and pb.
703 Add a branch/call/jump (BCJ) filter to the filter chain. These
704 filters can be used only as a non-last filter in the filter
707 A BCJ filter converts relative addresses in the machine code to
708 their absolute counterparts. This doesn't change the size of
709 the data, but it increases redundancy, which can help LZMA2 to
710 produce 0-15 % smaller .xz file. The BCJ filters are always
711 reversible, so using a BCJ filter for wrong type of data doesn't
712 cause any data loss, although it may make the compression ratio
715 It is fine to apply a BCJ filter on a whole executable; there's
716 no need to apply it only on the executable section. Applying a
717 BCJ filter on an archive that contains both executable and non-
718 executable files may or may not give good results, so it gener-
719 ally isn't good to blindly apply a BCJ filter when compressing
720 binary packages for distribution.
722 These BCJ filters are very fast and use insignificant amount of
723 memory. If a BCJ filter improves compression ratio of a file,
724 it can improve decompression speed at the same time. This is
725 because, on the same hardware, the decompression speed of LZMA2
726 is roughly a fixed number of bytes of compressed data per sec-
729 These BCJ filters have known problems related to the compression
732 o Some types of files containing executable code (e.g. object
733 files, static libraries, and Linux kernel modules) have the
734 addresses in the instructions filled with filler values.
735 These BCJ filters will still do the address conversion, which
736 will make the compression worse with these files.
738 o Applying a BCJ filter on an archive containing multiple simi-
739 lar executables can make the compression ratio worse than not
740 using a BCJ filter. This is because the BCJ filter doesn't
741 detect the boundaries of the executable files, and doesn't
742 reset the address conversion counter for each executable.
744 Both of the above problems will be fixed in the future in a new
745 filter. The old BCJ filters will still be useful in embedded
746 systems, because the decoder of the new filter will be bigger
749 Different instruction sets have have different alignment:
751 Filter Alignment Notes
752 x86 1 32-bit or 64-bit x86
753 PowerPC 4 Big endian only
754 ARM 4 Little endian only
755 ARM-Thumb 2 Little endian only
756 IA-64 16 Big or little endian
757 SPARC 4 Big or little endian
759 Since the BCJ-filtered data is usually compressed with LZMA2,
760 the compression ratio may be improved slightly if the LZMA2
761 options are set to match the alignment of the selected BCJ fil-
762 ter. For example, with the IA-64 filter, it's good to set pb=4
763 with LZMA2 (2^4=16). The x86 filter is an exception; it's usu-
764 ally good to stick to LZMA2's default four-byte alignment when
765 compressing x86 executables.
767 All BCJ filters support the same options:
770 Specify the start offset that is used when converting
771 between relative and absolute addresses. The offset must
772 be a multiple of the alignment of the filter (see the ta-
773 ble above). The default is zero. In practice, the
774 default is good; specifying a custom offset is almost
778 Add the Delta filter to the filter chain. The Delta filter can
779 be only used as a non-last filter in the filter chain.
781 Currently only simple byte-wise delta calculation is supported.
782 It can be useful when compressing e.g. uncompressed bitmap
783 images or uncompressed PCM audio. However, special purpose
784 algorithms may give significantly better results than Delta +
785 LZMA2. This is true especially with audio, which compresses
786 faster and better e.g. with flac(1).
791 Specify the distance of the delta calculation in bytes.
792 distance must be 1-256. The default is 1.
794 For example, with dist=2 and eight-byte input A1 B1 A2 B3
795 A3 B5 A4 B7, the output will be A1 B1 01 02 01 02 01 02.
799 Suppress warnings and notices. Specify this twice to suppress
800 errors too. This option has no effect on the exit status. That
801 is, even if a warning was suppressed, the exit status to indi-
802 cate a warning is still used.
805 Be verbose. If standard error is connected to a terminal, xz
806 will display a progress indicator. Specifying --verbose twice
807 will give even more verbose output.
809 The progress indicator shows the following information:
811 o Completion percentage is shown if the size of the input file
812 is known. That is, the percentage cannot be shown in pipes.
814 o Amount of compressed data produced (compressing) or consumed
817 o Amount of uncompressed data consumed (compressing) or pro-
818 duced (decompressing).
820 o Compression ratio, which is calculated by dividing the amount
821 of compressed data processed so far by the amount of uncom-
822 pressed data processed so far.
824 o Compression or decompression speed. This is measured as the
825 amount of uncompressed data consumed (compression) or pro-
826 duced (decompression) per second. It is shown after a few
827 seconds have passed since xz started processing the file.
829 o Elapsed time in the format M:SS or H:MM:SS.
831 o Estimated remaining time is shown only when the size of the
832 input file is known and a couple of seconds have already
833 passed since xz started processing the file. The time is
834 shown in a less precise format which never has any colons,
837 When standard error is not a terminal, --verbose will make xz
838 print the filename, compressed size, uncompressed size, compres-
839 sion ratio, and possibly also the speed and elapsed time on a
840 single line to standard error after compressing or decompressing
841 the file. The speed and elapsed time are included only when the
842 operation took at least a few seconds. If the operation didn't
843 finish, e.g. due to user interruption, also the completion per-
844 centage is printed if the size of the input file is known.
847 Don't set the exit status to 2 even if a condition worth a warn-
848 ing was detected. This option doesn't affect the verbosity
849 level, thus both --quiet and --no-warn have to be used to not
850 display warnings and to not alter the exit status.
853 Print messages in a machine-parsable format. This is intended
854 to ease writing frontends that want to use xz instead of
855 liblzma, which may be the case with various scripts. The output
856 with this option enabled is meant to be stable across xz
857 releases. See the section ROBOT MODE for details.
860 Display, in human-readable format, how much physical memory
861 (RAM) xz thinks the system has and the memory usage limits for
862 compression and decompression, and exit successfully.
865 Display a help message describing the most commonly used
866 options, and exit successfully.
869 Display a help message describing all features of xz, and exit
873 Display the version number of xz and liblzma in human readable
874 format. To get machine-parsable output, specify --robot before
878 The robot mode is activated with the --robot option. It makes the out-
879 put of xz easier to parse by other programs. Currently --robot is sup-
880 ported only together with --version, --info-memory, and --list. It
881 will be supported for normal compression and decompression in the
885 xz --robot --version will print the version number of xz and liblzma in
886 the following format:
889 LIBLZMA_VERSION=XYYYZZZS
893 YYY Minor version. Even numbers are stable. Odd numbers are alpha
896 ZZZ Patch level for stable releases or just a counter for develop-
899 S Stability. 0 is alpha, 1 is beta, and 2 is stable. S should be
900 always 2 when YYY is even.
902 XYYYZZZS are the same on both lines if xz and liblzma are from the same
905 Examples: 4.999.9beta is 49990091 and 5.0.0 is 50000002.
907 Memory limit information
908 xz --robot --info-memory prints a single line with three tab-separated
911 1. Total amount of physical memory (RAM) in bytes
913 2. Memory usage limit for compression in bytes. A special value of
914 zero indicates the default setting, which for single-threaded mode
915 is the same as no limit.
917 3. Memory usage limit for decompression in bytes. A special value of
918 zero indicates the default setting, which for single-threaded mode
919 is the same as no limit.
921 In the future, the output of xz --robot --info-memory may have more
922 columns, but never more than a single line.
925 xz --robot --list uses tab-separated output. The first column of every
926 line has a string that indicates the type of the information found on
929 name This is always the first line when starting to list a file. The
930 second column on the line is the filename.
932 file This line contains overall information about the .xz file. This
933 line is always printed after the name line.
935 stream This line type is used only when --verbose was specified. There
936 are as many stream lines as there are streams in the .xz file.
938 block This line type is used only when --verbose was specified. There
939 are as many block lines as there are blocks in the .xz file.
940 The block lines are shown after all the stream lines; different
941 line types are not interleaved.
944 This line type is used only when --verbose was specified twice.
945 This line is printed after all block lines. Like the file line,
946 the summary line contains overall information about the .xz
949 totals This line is always the very last line of the list output. It
950 shows the total counts and sizes.
952 The columns of the file lines:
953 2. Number of streams in the file
954 3. Total number of blocks in the stream(s)
955 4. Compressed size of the file
956 5. Uncompressed size of the file
957 6. Compression ratio, for example 0.123. If ratio is over
958 9.999, three dashes (---) are displayed instead of the
960 7. Comma-separated list of integrity check names. The follow-
961 ing strings are used for the known check types: None, CRC32,
962 CRC64, and SHA-256. For unknown check types, Unknown-N is
963 used, where N is the Check ID as a decimal number (one or
965 8. Total size of stream padding in the file
967 The columns of the stream lines:
968 2. Stream number (the first stream is 1)
969 3. Number of blocks in the stream
970 4. Compressed start offset
971 5. Uncompressed start offset
972 6. Compressed size (does not include stream padding)
975 9. Name of the integrity check
976 10. Size of stream padding
978 The columns of the block lines:
979 2. Number of the stream containing this block
980 3. Block number relative to the beginning of the stream (the
982 4. Block number relative to the beginning of the file
983 5. Compressed start offset relative to the beginning of the
985 6. Uncompressed start offset relative to the beginning of the
987 7. Total compressed size of the block (includes headers)
990 10. Name of the integrity check
992 If --verbose was specified twice, additional columns are included on
993 the block lines. These are not displayed with a single --verbose,
994 because getting this information requires many seeks and can thus be
996 11. Value of the integrity check in hexadecimal
997 12. Block header size
998 13. Block flags: c indicates that compressed size is present,
999 and u indicates that uncompressed size is present. If the
1000 flag is not set, a dash (-) is shown instead to keep the
1001 string length fixed. New flags may be added to the end of
1002 the string in the future.
1003 14. Size of the actual compressed data in the block (this
1004 excludes the block header, block padding, and check fields)
1005 15. Amount of memory (in bytes) required to decompress this
1006 block with this xz version
1007 16. Filter chain. Note that most of the options used at com-
1008 pression time cannot be known, because only the options that
1009 are needed for decompression are stored in the .xz headers.
1011 The columns of the totals line:
1012 2. Number of streams
1015 5. Uncompressed size
1016 6. Average compression ratio
1017 7. Comma-separated list of integrity check names that were
1018 present in the files
1019 8. Stream padding size
1020 9. Number of files. This is here to keep the order of the ear-
1021 lier columns the same as on file lines.
1023 If --verbose was specified twice, additional columns are included on
1025 10. Maximum amount of memory (in bytes) required to decompress
1026 the files with this xz version
1027 11. yes or no indicating if all block headers have both com-
1028 pressed size and uncompressed size stored in them
1030 Future versions may add new line types and new columns can be added to
1031 the existing line types, but the existing columns won't be changed.
1036 1 An error occurred.
1038 2 Something worth a warning occurred, but no actual errors
1041 Notices (not warnings or errors) printed on standard error don't affect
1045 xz parses space-separated lists of options from the environment vari-
1046 ables XZ_DEFAULTS and XZ_OPT, in this order, before parsing the options
1047 from the command line. Note that only options are parsed from the
1048 environment variables; all non-options are silently ignored. Parsing
1049 is done with getopt_long(3) which is used also for the command line
1053 User-specific or system-wide default options. Typically this is
1054 set in a shell initialization script to enable xz's memory usage
1055 limiter by default. Excluding shell initialization scripts and
1056 similar special cases, scripts must never set or unset
1059 XZ_OPT This is for passing options to xz when it is not possible to set
1060 the options directly on the xz command line. This is the case
1061 e.g. when xz is run by a script or tool, e.g. GNU tar(1):
1063 XZ_OPT=-2v tar caf foo.tar.xz foo
1065 Scripts may use XZ_OPT e.g. to set script-specific default com-
1066 pression options. It is still recommended to allow users to
1067 override XZ_OPT if that is reasonable, e.g. in sh(1) scripts one
1068 may use something like this:
1070 XZ_OPT=${XZ_OPT-"-7e"}
1073 LZMA UTILS COMPATIBILITY
1074 The command line syntax of xz is practically a superset of lzma,
1075 unlzma, and lzcat as found from LZMA Utils 4.32.x. In most cases, it
1076 is possible to replace LZMA Utils with XZ Utils without breaking exist-
1077 ing scripts. There are some incompatibilities though, which may some-
1078 times cause problems.
1080 Compression preset levels
1081 The numbering of the compression level presets is not identical in xz
1082 and LZMA Utils. The most important difference is how dictionary sizes
1083 are mapped to different presets. Dictionary size is roughly equal to
1084 the decompressor memory usage.
1099 The dictionary size differences affect the compressor memory usage too,
1100 but there are some other differences between LZMA Utils and XZ Utils,
1101 which make the difference even bigger:
1103 Level xz LZMA Utils 4.32.x
1115 The default preset level in LZMA Utils is -7 while in XZ Utils it is
1116 -6, so both use an 8 MiB dictionary by default.
1118 Streamed vs. non-streamed .lzma files
1119 The uncompressed size of the file can be stored in the .lzma header.
1120 LZMA Utils does that when compressing regular files. The alternative
1121 is to mark that uncompressed size is unknown and use end-of-payload
1122 marker to indicate where the decompressor should stop. LZMA Utils uses
1123 this method when uncompressed size isn't known, which is the case for
1126 xz supports decompressing .lzma files with or without end-of-payload
1127 marker, but all .lzma files created by xz will use end-of-payload
1128 marker and have uncompressed size marked as unknown in the .lzma
1129 header. This may be a problem in some uncommon situations. For exam-
1130 ple, a .lzma decompressor in an embedded device might work only with
1131 files that have known uncompressed size. If you hit this problem, you
1132 need to use LZMA Utils or LZMA SDK to create .lzma files with known
1135 Unsupported .lzma files
1136 The .lzma format allows lc values up to 8, and lp values up to 4. LZMA
1137 Utils can decompress files with any lc and lp, but always creates files
1138 with lc=3 and lp=0. Creating files with other lc and lp is possible
1139 with xz and with LZMA SDK.
1141 The implementation of the LZMA1 filter in liblzma requires that the sum
1142 of lc and lp must not exceed 4. Thus, .lzma files, which exceed this
1143 limitation, cannot be decompressed with xz.
1145 LZMA Utils creates only .lzma files which have a dictionary size of 2^n
1146 (a power of 2) but accepts files with any dictionary size. liblzma
1147 accepts only .lzma files which have a dictionary size of 2^n or 2^n +
1148 2^(n-1). This is to decrease false positives when detecting .lzma
1151 These limitations shouldn't be a problem in practice, since practically
1152 all .lzma files have been compressed with settings that liblzma will
1156 When decompressing, LZMA Utils silently ignore everything after the
1157 first .lzma stream. In most situations, this is a bug. This also
1158 means that LZMA Utils don't support decompressing concatenated .lzma
1161 If there is data left after the first .lzma stream, xz considers the
1162 file to be corrupt. This may break obscure scripts which have assumed
1163 that trailing garbage is ignored.
1166 Compressed output may vary
1167 The exact compressed output produced from the same uncompressed input
1168 file may vary between XZ Utils versions even if compression options are
1169 identical. This is because the encoder can be improved (faster or bet-
1170 ter compression) without affecting the file format. The output can
1171 vary even between different builds of the same XZ Utils version, if
1172 different build options are used.
1174 The above means that implementing --rsyncable to create rsyncable .xz
1175 files is not going to happen without freezing a part of the encoder
1176 implementation, which can then be used with --rsyncable.
1178 Embedded .xz decompressors
1179 Embedded .xz decompressor implementations like XZ Embedded don't neces-
1180 sarily support files created with integrity check types other than none
1181 and crc32. Since the default is --check=crc64, you must use
1182 --check=none or --check=crc32 when creating files for embedded systems.
1184 Outside embedded systems, all .xz format decompressors support all the
1185 check types, or at least are able to decompress the file without veri-
1186 fying the integrity check if the particular check is not supported.
1188 XZ Embedded supports BCJ filters, but only with the default start off-
1193 Compress the file foo into foo.xz using the default compression level
1194 (-6), and remove foo if compression is successful:
1198 Decompress bar.xz into bar and don't remove bar.xz even if decompres-
1203 Create baz.tar.xz with the preset -4e (-4 --extreme), which is slower
1204 than e.g. the default -6, but needs less memory for compression and
1205 decompression (48 MiB and 5 MiB, respectively):
1207 tar cf - baz | xz -4e > baz.tar.xz
1209 A mix of compressed and uncompressed files can be decompressed to stan-
1210 dard output with a single command:
1212 xz -dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
1214 Parallel compression of many files
1215 On GNU and *BSD, find(1) and xargs(1) can be used to parallelize com-
1216 pression of many files:
1218 find . -type f \! -name '*.xz' -print0 \
1219 | xargs -0r -P4 -n16 xz -T1
1221 The -P option to xargs(1) sets the number of parallel xz processes.
1222 The best value for the -n option depends on how many files there are to
1223 be compressed. If there are only a couple of files, the value should
1224 probably be 1; with tens of thousands of files, 100 or even more may be
1225 appropriate to reduce the number of xz processes that xargs(1) will
1228 The option -T1 for xz is there to force it to single-threaded mode,
1229 because xargs(1) is used to control the amount of parallelization.
1232 Calculate how many bytes have been saved in total after compressing
1235 xz --robot --list *.xz | awk '/^totals/{print $5-$4}'
1237 A script may want to know that it is using new enough xz. The follow-
1238 ing sh(1) script checks that the version number of the xz tool is at
1239 least 5.0.0. This method is compatible with old beta versions, which
1240 didn't support the --robot option:
1242 if ! eval "$(xz --robot --version 2> /dev/null)" ||
1243 [ "$XZ_VERSION" -lt 50000002 ]; then
1244 echo "Your xz is too old."
1246 unset XZ_VERSION LIBLZMA_VERSION
1248 Set a memory usage limit for decompression using XZ_OPT, but if a limit
1249 has already been set, don't increase it:
1251 NEWLIM=$((123 << 20)) # 123 MiB
1252 OLDLIM=$(xz --robot --info-memory | cut -f3)
1253 if [ $OLDLIM -eq 0 -o $OLDLIM -gt $NEWLIM ]; then
1254 XZ_OPT="$XZ_OPT --memlimit-decompress=$NEWLIM"
1258 Custom compressor filter chains
1259 The simplest use for custom filter chains is customizing a LZMA2 pre-
1260 set. This can be useful, because the presets cover only a subset of
1261 the potentially useful combinations of compression settings.
1263 The CompCPU columns of the tables from the descriptions of the options
1264 -0 ... -9 and --extreme are useful when customizing LZMA2 presets.
1265 Here are the relevant parts collected from those two tables:
1278 If you know that a file requires somewhat big dictionary (e.g. 32 MiB)
1279 to compress well, but you want to compress it quicker than xz -8 would
1280 do, a preset with a low CompCPU value (e.g. 1) can be modified to use a
1283 xz --lzma2=preset=1,dict=32MiB foo.tar
1285 With certain files, the above command may be faster than xz -6 while
1286 compressing significantly better. However, it must be emphasized that
1287 only some files benefit from a big dictionary while keeping the CompCPU
1288 value low. The most obvious situation, where a big dictionary can help
1289 a lot, is an archive containing very similar files of at least a few
1290 megabytes each. The dictionary size has to be significantly bigger
1291 than any individual file to allow LZMA2 to take full advantage of the
1292 similarities between consecutive files.
1294 If very high compressor and decompressor memory usage is fine, and the
1295 file being compressed is at least several hundred megabytes, it may be
1296 useful to use an even bigger dictionary than the 64 MiB that xz -9
1299 xz -vv --lzma2=dict=192MiB big_foo.tar
1301 Using -vv (--verbose --verbose) like in the above example can be useful
1302 to see the memory requirements of the compressor and decompressor.
1303 Remember that using a dictionary bigger than the size of the uncom-
1304 pressed file is waste of memory, so the above command isn't useful for
1307 Sometimes the compression time doesn't matter, but the decompressor
1308 memory usage has to be kept low e.g. to make it possible to decompress
1309 the file on an embedded system. The following command uses -6e (-6
1310 --extreme) as a base and sets the dictionary to only 64 KiB. The
1311 resulting file can be decompressed with XZ Embedded (that's why there
1312 is --check=crc32) using about 100 KiB of memory.
1314 xz --check=crc32 --lzma2=preset=6e,dict=64KiB foo
1316 If you want to squeeze out as many bytes as possible, adjusting the
1317 number of literal context bits (lc) and number of position bits (pb)
1318 can sometimes help. Adjusting the number of literal position bits (lp)
1319 might help too, but usually lc and pb are more important. E.g. a
1320 source code archive contains mostly US-ASCII text, so something like
1321 the following might give slightly (like 0.1 %) smaller file than xz -6e
1322 (try also without lc=4):
1324 xz --lzma2=preset=6e,pb=0,lc=4 source_code.tar
1326 Using another filter together with LZMA2 can improve compression with
1327 certain file types. E.g. to compress a x86-32 or x86-64 shared library
1328 using the x86 BCJ filter:
1330 xz --x86 --lzma2 libfoo.so
1332 Note that the order of the filter options is significant. If --x86 is
1333 specified after --lzma2, xz will give an error, because there cannot be
1334 any filter after LZMA2, and also because the x86 BCJ filter cannot be
1335 used as the last filter in the chain.
1337 The Delta filter together with LZMA2 can give good results with bitmap
1338 images. It should usually beat PNG, which has a few more advanced fil-
1339 ters than simple delta but uses Deflate for the actual compression.
1341 The image has to be saved in uncompressed format, e.g. as uncompressed
1342 TIFF. The distance parameter of the Delta filter is set to match the
1343 number of bytes per pixel in the image. E.g. 24-bit RGB bitmap needs
1344 dist=3, and it is also good to pass pb=0 to LZMA2 to accommodate the
1345 three-byte alignment:
1347 xz --delta=dist=3 --lzma2=pb=0 foo.tiff
1349 If multiple images have been put into a single archive (e.g. .tar), the
1350 Delta filter will work on that too as long as all images have the same
1351 number of bytes per pixel.
1354 xzdec(1), xzdiff(1), xzgrep(1), xzless(1), xzmore(1), gzip(1),
1357 XZ Utils: <http://tukaani.org/xz/>
1358 XZ Embedded: <http://tukaani.org/xz/embedded.html>
1359 LZMA SDK: <http://7-zip.org/sdk.html>
1363 Tukaani 2010-10-04 XZ(1)