4 LZMA SDK 4.32 Copyright (C) 1999-2005 Igor Pavlov
\r
6 LZMA SDK provides the documentation, samples, header files, libraries,
\r
7 and tools you need to develop applications that use LZMA compression.
\r
9 LZMA is default and general compression method of 7z format
\r
10 in 7-Zip compression program (www.7-zip.org). LZMA provides high
\r
11 compression ratio and very fast decompression.
\r
13 LZMA is an improved version of famous LZ77 compression algorithm.
\r
14 It was improved in way of maximum increasing of compression ratio,
\r
15 keeping high decompression speed and low memory requirements for
\r
23 LZMA SDK is available under any of the following licenses:
\r
25 1) GNU Lesser General Public License (GNU LGPL)
\r
26 2) Common Public License (CPL)
\r
27 3) Simplified license for unmodified code (read SPECIAL EXCEPTION)
\r
28 4) Proprietary license
\r
30 It means that you can select one of these four options and follow rules of that license.
\r
33 1,2) GNU LGPL and CPL licenses are pretty similar and both these
\r
34 licenses are classified as
\r
35 - "Free software licenses" at http://www.gnu.org/
\r
36 - "OSI-approved" at http://www.opensource.org/
\r
39 3) SPECIAL EXCEPTION
\r
41 Igor Pavlov, as the author of this code, expressly permits you
\r
42 to statically or dynamically link your code (or bind by name)
\r
43 to the files from LZMA SDK without subjecting your linked
\r
44 code to the terms of the CPL or GNU LGPL.
\r
45 Any modifications or additions to files from LZMA SDK, however,
\r
46 are subject to the GNU LGPL or CPL terms.
\r
48 SPECIAL EXCEPTION allows you to use LZMA SDK in applications with closed code,
\r
49 while you keep LZMA SDK code unmodified.
\r
52 SPECIAL EXCEPTION #2: Igor Pavlov, as the author of this code, expressly permits
\r
53 you to use this code under the same terms and conditions contained in the License
\r
54 Agreement you have for any previous version of LZMA SDK developed by Igor Pavlov.
\r
56 SPECIAL EXCEPTION #2 allows owners of proprietary licenses to use latest version
\r
57 of LZMA SDK as update for previous versions.
\r
60 SPECIAL EXCEPTION #3: Igor Pavlov, as the author of this code, expressly permits
\r
61 you to use code of examples (LzmaTest.c, LzmaStateTest.c, LzmaAlone.cpp,
\r
62 LzmaAlone.cs, LzmaAlone.java) as public domain code.
\r
65 4) Proprietary license
\r
67 LZMA SDK also can be available under a proprietary license which
\r
70 1) Right to modify code without subjecting modified code to the
\r
71 terms of the CPL or GNU LGPL
\r
72 2) Technical support for code
\r
74 To request such proprietary license or any additional consultations,
\r
75 send email message from that page:
\r
76 http://www.7-zip.org/support.html
\r
79 You should have received a copy of the GNU Lesser General Public
\r
80 License along with this library; if not, write to the Free Software
\r
81 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
\r
83 You should have received a copy of the Common Public License
\r
84 along with this library.
\r
92 - C++ source code of LZMA compressing and decompressing
\r
93 - ANSI-C compatible source code for LZMA decompressing
\r
94 - C# source code for LZMA compressing and decompressing
\r
95 - Java source code for LZMA compressing and decompressing
\r
96 - Compiled file->file LZMA compressing/decompressing program for Windows system
\r
98 ANSI-C LZMA decompression code was ported from original C++ sources to C.
\r
99 Also it was simplified and optimized for code size.
\r
100 But it is fully compatible with LZMA from 7-Zip.
\r
103 UNIX/Linux version
\r
105 To compile C++ version of file->file LZMA, go to directory
\r
106 C/7zip/Compress/LZMA_Alone
\r
107 and type "make" or "make clean all" to recompile all.
\r
109 In some UNIX/Linux versions you must compile LZMA with static libraries.
\r
110 To compile with static libraries, change string in makefile
\r
117 ---------------------
\r
118 C - C / CPP source code
\r
119 CS - C# source code
\r
120 Java - Java source code
\r
121 lzma.txt - LZMA SDK description (this file)
\r
122 7zFormat.txt - 7z Format description
\r
123 7zC.txt - 7z ANSI-C Decoder description (this file)
\r
124 methods.txt - Compression method IDs for .7z
\r
125 LGPL.txt - GNU Lesser General Public License
\r
126 CPL.html - Common Public License
\r
127 lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
\r
128 history.txt - history of the LZMA SDK
\r
131 Source code structure
\r
132 ---------------------
\r
135 Common - common files for C++ projects
\r
136 Windows - common files for Windows related code
\r
137 7zip - files related to 7-Zip Project
\r
138 Common - common files for 7-Zip
\r
139 Compress - files related to compression/decompression
\r
140 LZ - files related to LZ (Lempel-Ziv) compression algorithm
\r
141 BinTree - Binary Tree Match Finder for LZ algorithm
\r
142 HashChain - Hash Chain Match Finder for LZ algorithm
\r
143 Patricia - Patricia Match Finder for LZ algorithm
\r
144 RangeCoder - Range Coder (special code of compression/decompression)
\r
145 LZMA - LZMA compression/decompression on C++
\r
146 LZMA_Alone - file->file LZMA compression/decompression
\r
147 LZMA_C - ANSI-C compatible LZMA decompressor
\r
148 LzmaDecode.h - interface for LZMA decoding on ANSI-C
\r
149 LzmaDecode.c - LZMA decoding on ANSI-C (new fastest version)
\r
150 LzmaDecodeSize.c - LZMA decoding on ANSI-C (old size-optimized version)
\r
151 LzmaTest.c - test application that decodes LZMA encoded file
\r
152 LzmaStateDecode.h - interface for LZMA decoding (State version)
\r
153 LzmaStateDecode.c - LZMA decoding on ANSI-C (State version)
\r
154 LzmaStateTest.c - test application (State version)
\r
155 Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
\r
156 Archive - files related to archiving
\r
157 7z_C - 7z ANSI-C Decoder
\r
161 Common - some common files for 7-Zip
\r
162 Compress - files related to compression/decompression
\r
163 LZ - files related to LZ (Lempel-Ziv) compression algorithm
\r
164 LZMA - LZMA compression/decompression
\r
165 LzmaAlone - file->file LZMA compression/decompression
\r
166 RangeCoder - Range Coder (special code of compression/decompression)
\r
170 Compression - files related to compression/decompression
\r
171 LZ - files related to LZ (Lempel-Ziv) compression algorithm
\r
172 LZMA - LZMA compression/decompression
\r
173 RangeCoder - Range Coder (special code of compression/decompression)
\r
175 C/C++ source code of LZMA SDK is part of 7-Zip project.
\r
177 You can find ANSI-C LZMA decompressing code at folder
\r
178 C/7zip/Compress/LZMA_C
\r
179 7-Zip doesn't use that ANSI-C LZMA code and that code was developed
\r
180 specially for this SDK. And files from LZMA_C do not need files from
\r
181 other directories of SDK for compiling.
\r
183 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
\r
185 http://sourceforge.net/projects/sevenzip/
\r
188 LZMA Decompression features
\r
189 ---------------------------
\r
190 - Variable dictionary size (up to 256 MB)
\r
191 - Estimated compressing speed: about 500 KB/s on 1 GHz CPU
\r
192 - Estimated decompressing speed:
\r
193 - 8-12 MB/s on 1 GHz Intel Pentium 3 or AMD Athlon
\r
194 - 500-1000 KB/s on 100 MHz ARM, MIPS, PowerPC or other simple RISC
\r
195 - Small memory requirements for decompressing (8-32 KB + DictionarySize)
\r
196 - Small code size for decompressing: 2-8 KB (depending from
\r
197 speed optimizations)
\r
199 LZMA decoder uses only integer operations and can be
\r
200 implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
\r
202 Some critical operations that affect to speed of LZMA decompression:
\r
203 1) 32*16 bit integer multiply
\r
204 2) Misspredicted branches (penalty mostly depends from pipeline length)
\r
205 3) 32-bit shift and arithmetic operations
\r
207 Speed of LZMA decompressing mostly depends from CPU speed.
\r
208 Memory speed has no big meaning. But if your CPU has small data cache,
\r
209 overall weight of memory speed will slightly increase.
\r
215 Using LZMA encoder/decoder executable
\r
216 --------------------------------------
\r
218 Usage: LZMA <e|d> inputFile outputFile [<switches>...]
\r
224 b: Benchmark. There are two tests: compressing and decompressing
\r
225 with LZMA method. Benchmark shows rating in MIPS (million
\r
226 instructions per second). Rating value is calculated from
\r
227 measured speed and it is normalized with AMD Athlon XP CPU
\r
228 results. Also Benchmark checks possible hardware errors (RAM
\r
229 errors in most cases). Benchmark uses these settings:
\r
230 (-a1, -d21, -fb32, -mfbt4). You can change only -d. Also you
\r
231 can change number of iterations. Example for 30 iterations:
\r
233 Default number of iterations is 10.
\r
238 -a{N}: set compression mode 0 = fast, 1 = normal, 2 = max
\r
241 d{N}: Sets Dictionary size - [0, 28], default: 23 (8MB)
\r
242 The maximum value for dictionary size is 256 MB = 2^28 bytes.
\r
243 Dictionary size is calculated as DictionarySize = 2^N bytes.
\r
244 For decompressing file compressed by LZMA method with dictionary
\r
245 size D = 2^N you need about D bytes of memory (RAM).
\r
247 -fb{N}: set number of fast bytes - [5, 273], default: 128
\r
248 Usually big number gives a little bit better compression ratio
\r
249 and slower compression process.
\r
251 -lc{N}: set number of literal context bits - [0, 8], default: 3
\r
252 Sometimes lc=4 gives gain for big files.
\r
254 -lp{N}: set number of literal pos bits - [0, 4], default: 0
\r
255 lp switch is intended for periodical data when period is
\r
256 equal 2^N. For example, for 32-bit (4 bytes)
\r
257 periodical data you can use lp=2. Often it's better to set lc0,
\r
258 if you change lp switch.
\r
260 -pb{N}: set number of pos bits - [0, 4], default: 2
\r
261 pb switch is intended for periodical data
\r
262 when period is equal 2^N.
\r
264 -mf{MF_ID}: set Match Finder. Default: bt4.
\r
265 Compression ratio for all bt* and pat* almost the same.
\r
266 Algorithms from hc* group doesn't provide good compression
\r
267 ratio, but they often works pretty fast in combination with
\r
268 fast mode (-a0). Methods from bt* group require less memory
\r
269 than methods from pat* group. Usually bt4 works faster than
\r
270 any pat*, but for some types of files pat* can work faster.
\r
272 Memory requirements depend from dictionary size
\r
273 (parameter "d" in table below).
\r
275 MF_ID Memory Description
\r
277 bt2 d*9.5 + 1MB Binary Tree with 2 bytes hashing.
\r
278 bt3 d*9.5 + 65MB Binary Tree with 2-3(full) bytes hashing.
\r
279 bt4 d*9.5 + 6MB Binary Tree with 2-3-4 bytes hashing.
\r
280 bt4b d*9.5 + 34MB Binary Tree with 2-3-4(big) bytes hashing.
\r
281 pat2r d*26 + 1MB Patricia Tree with 2-bits nodes, removing.
\r
282 pat2 d*38 + 1MB Patricia Tree with 2-bits nodes.
\r
283 pat2h d*38 + 77MB Patricia Tree with 2-bits nodes, 2-3 bytes hashing.
\r
284 pat3h d*62 + 85MB Patricia Tree with 3-bits nodes, 2-3 bytes hashing.
\r
285 pat4h d*110 +101MB Patricia Tree with 4-bits nodes, 2-3 bytes hashing.
\r
286 hc3 d*5.5 + 1MB Hash Chain with 2-3 bytes hashing.
\r
287 hc4 d*5.5 + 6MB Hash Chain with 2-3-4 bytes hashing.
\r
289 -eos: write End Of Stream marker. By default LZMA doesn't write
\r
290 eos marker, since LZMA decoder knows uncompressed size
\r
291 stored in .lzma file header.
\r
293 -si: Read data from stdin (it will write End Of Stream marker).
\r
294 -so: Write data to stdout
\r
299 1) LZMA e file.bin file.lzma -d16 -lc0
\r
301 compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
\r
302 and 0 literal context bits. -lc0 allows to reduce memory requirements
\r
306 2) LZMA e file.bin file.lzma -lc0 -lp2
\r
308 compresses file.bin to file.lzma with settings suitable
\r
309 for 32-bit periodical data (for example, ARM or MIPS code).
\r
311 3) LZMA d file.lzma file.bin
\r
313 decompresses file.lzma to file.bin.
\r
316 Compression ratio hints
\r
317 -----------------------
\r
322 To increase compression ratio for LZMA compressing it's desirable
\r
323 to have aligned data (if it's possible) and also it's desirable to locate
\r
324 data in such order, where code is grouped in one place and data is
\r
325 grouped in other place (it's better than such mixing: code, data, code,
\r
331 You can increase compression ratio for some data types, using
\r
332 special filters before compressing. For example, it's possible to
\r
333 increase compression ratio on 5-10% for code for those CPU ISAs:
\r
334 x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
\r
336 You can find C/C++ source code of such filters in folder "7zip/Compress/Branch"
\r
338 You can check compression ratio gain of these filters with such
\r
339 7-Zip commands (example for ARM code):
\r
341 7z a a1.7z a.bin -m0=lzma
\r
343 With filter for little-endian ARM code:
\r
344 7z a a2.7z a.bin -m0=bc_arm -m1=lzma
\r
346 With filter for big-endian ARM code (using additional Swap4 filter):
\r
347 7z a a3.7z a.bin -m0=swap4 -m1=bc_arm -m2=lzma
\r
349 It works in such manner:
\r
350 Compressing = Filter_encoding + LZMA_encoding
\r
351 Decompressing = LZMA_decoding + Filter_decoding
\r
353 Compressing and decompressing speed of such filters is very high,
\r
354 so it will not increase decompressing time too much.
\r
355 Moreover, it reduces decompression time for LZMA_decoding,
\r
356 since compression ratio with filtering is higher.
\r
358 These filters convert CALL (calling procedure) instructions
\r
359 from relative offsets to absolute addresses, so such data becomes more
\r
360 compressible. Source code of these CALL filters is pretty simple
\r
361 (about 20 lines of C++), so you can convert it from C++ version yourself.
\r
363 For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
\r
366 LZMA compressed file format
\r
367 ---------------------------
\r
368 Offset Size Description
\r
369 0 1 Special LZMA properties for compressed data
\r
370 1 4 Dictionary size (little endian)
\r
371 5 8 Uncompressed size (little endian). -1 means unknown size
\r
375 ANSI-C LZMA Decoder
\r
376 ~~~~~~~~~~~~~~~~~~~
\r
378 To compile ANSI-C LZMA Decoder you can use one of the following files sets:
\r
379 1) LzmaDecode.h + LzmaDecode.c + LzmaTest.c (fastest version)
\r
380 2) LzmaDecode.h + LzmaDecodeSize.c + LzmaTest.c (old size-optimized version)
\r
381 3) LzmaStateDecode.h + LzmaStateDecode.c + LzmaStateTest.c (zlib-like interface)
\r
384 Memory requirements for LZMA decoding
\r
385 -------------------------------------
\r
387 LZMA decoder doesn't allocate memory itself, so you must
\r
388 allocate memory and send it to LZMA.
\r
390 Stack usage of LZMA decoding function for local variables is not
\r
391 larger than 200 bytes.
\r
393 How To decompress data
\r
394 ----------------------
\r
396 LZMA Decoder (ANSI-C version) now supports 5 interfaces:
\r
397 1) Single-call Decompressing
\r
398 2) Single-call Decompressing with input stream callback
\r
399 3) Multi-call Decompressing with output buffer
\r
400 4) Multi-call Decompressing with input callback and output buffer
\r
401 5) Multi-call State Decompressing (zlib-like interface)
\r
403 Variant-5 is similar to Variant-4, but Variant-5 doesn't use callback functions.
\r
405 Decompressing steps
\r
406 -------------------
\r
408 1) read LZMA properties (5 bytes):
\r
409 unsigned char properties[LZMA_PROPERTIES_SIZE];
\r
411 2) read uncompressed size (8 bytes, little-endian)
\r
413 3) Decode properties:
\r
415 CLzmaDecoderState state; /* it's 24-140 bytes structure, if int is 32-bit */
\r
417 if (LzmaDecodeProperties(&state.Properties, properties, LZMA_PROPERTIES_SIZE) != LZMA_RESULT_OK)
\r
418 return PrintError(rs, "Incorrect stream properties");
\r
420 4) Allocate memory block for internal Structures:
\r
422 state.Probs = (CProb *)malloc(LzmaGetNumProbs(&state.Properties) * sizeof(CProb));
\r
423 if (state.Probs == 0)
\r
424 return PrintError(rs, kCantAllocateMessage);
\r
426 LZMA decoder uses array of CProb variables as internal structure.
\r
427 By default, CProb is unsigned_short. But you can define _LZMA_PROB32 to make
\r
428 it unsigned_int. It can increase speed on some 32-bit CPUs, but memory
\r
429 usage will be doubled in that case.
\r
432 5) Main Decompressing
\r
434 You must use one of the following interfaces:
\r
436 5.1 Single-call Decompressing
\r
437 -----------------------------
\r
438 When to use: RAM->RAM decompressing
\r
439 Compile files: LzmaDecode.h, LzmaDecode.c
\r
440 Compile defines: no defines
\r
441 Memory Requirements:
\r
442 - Input buffer: compressed size
\r
443 - Output buffer: uncompressed size
\r
444 - LZMA Internal Structures (~16 KB for default settings)
\r
447 int res = LzmaDecode(&state,
\r
448 inStream, compressedSize, &inProcessed,
\r
449 outStream, outSize, &outProcessed);
\r
452 5.2 Single-call Decompressing with input stream callback
\r
453 --------------------------------------------------------
\r
454 When to use: File->RAM or Flash->RAM decompressing.
\r
455 Compile files: LzmaDecode.h, LzmaDecode.c
\r
456 Compile defines: _LZMA_IN_CB
\r
457 Memory Requirements:
\r
458 - Buffer for input stream: any size (for example, 16 KB)
\r
459 - Output buffer: uncompressed size
\r
460 - LZMA Internal Structures (~16 KB for default settings)
\r
463 typedef struct _CBuffer
\r
465 ILzmaInCallback InCallback;
\r
467 unsigned char Buffer[kInBufferSize];
\r
470 int LzmaReadCompressed(void *object, const unsigned char **buffer, SizeT *size)
\r
472 CBuffer *bo = (CBuffer *)object;
\r
473 *buffer = bo->Buffer;
\r
474 *size = MyReadFile(bo->File, bo->Buffer, kInBufferSize);
\r
475 return LZMA_RESULT_OK;
\r
478 CBuffer g_InBuffer;
\r
480 g_InBuffer.File = inFile;
\r
481 g_InBuffer.InCallback.Read = LzmaReadCompressed;
\r
482 int res = LzmaDecode(&state,
\r
483 &g_InBuffer.InCallback,
\r
484 outStream, outSize, &outProcessed);
\r
487 5.3 Multi-call decompressing with output buffer
\r
488 -----------------------------------------------
\r
489 When to use: RAM->File decompressing
\r
490 Compile files: LzmaDecode.h, LzmaDecode.c
\r
491 Compile defines: _LZMA_OUT_READ
\r
492 Memory Requirements:
\r
493 - Input buffer: compressed size
\r
494 - Buffer for output stream: any size (for example, 16 KB)
\r
495 - LZMA Internal Structures (~16 KB for default settings)
\r
496 - LZMA dictionary (dictionary size is encoded in stream properties)
\r
500 state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize);
\r
502 LzmaDecoderInit(&state);
\r
506 inBuffer, inAvail, &inProcessed,
\r
507 g_OutBuffer, outAvail, &outProcessed);
\r
508 inAvail -= inProcessed;
\r
509 inBuffer += inProcessed;
\r
511 while you need more bytes
\r
513 see LzmaTest.c for more details.
\r
516 5.4 Multi-call decompressing with input callback and output buffer
\r
517 ------------------------------------------------------------------
\r
518 When to use: File->File decompressing
\r
519 Compile files: LzmaDecode.h, LzmaDecode.c
\r
520 Compile defines: _LZMA_IN_CB, _LZMA_OUT_READ
\r
521 Memory Requirements:
\r
522 - Buffer for input stream: any size (for example, 16 KB)
\r
523 - Buffer for output stream: any size (for example, 16 KB)
\r
524 - LZMA Internal Structures (~16 KB for default settings)
\r
525 - LZMA dictionary (dictionary size is encoded in stream properties)
\r
529 state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize);
\r
531 LzmaDecoderInit(&state);
\r
536 g_OutBuffer, outAvail, &outProcessed);
\r
538 while you need more bytes
\r
540 see LzmaTest.c for more details:
\r
543 5.5 Multi-call State Decompressing (zlib-like interface)
\r
544 ------------------------------------------------------------------
\r
545 When to use: file->file decompressing
\r
546 Compile files: LzmaStateDecode.h, LzmaStateDecode.c
\r
548 Memory Requirements:
\r
549 - Buffer for input stream: any size (for example, 16 KB)
\r
550 - Buffer for output stream: any size (for example, 16 KB)
\r
551 - LZMA Internal Structures (~16 KB for default settings)
\r
552 - LZMA dictionary (dictionary size is encoded in stream properties)
\r
556 state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize);
\r
559 LzmaDecoderInit(&state);
\r
562 res = LzmaDecode(&state,
\r
563 inBuffer, inAvail, &inProcessed,
\r
564 g_OutBuffer, outAvail, &outProcessed,
\r
566 inAvail -= inProcessed;
\r
567 inBuffer += inProcessed;
\r
569 while you need more bytes
\r
571 see LzmaStateTest.c for more details:
\r
574 6) Free all allocated blocks
\r
579 LzmaDecodeSize.c is size-optimized version of LzmaDecode.c.
\r
580 But compiled code of LzmaDecodeSize.c can be larger than
\r
581 compiled code of LzmaDecode.c. So it's better to use
\r
582 LzmaDecode.c in most cases.
\r
588 LZMA decoder can return one of the following codes:
\r
590 #define LZMA_RESULT_OK 0
\r
591 #define LZMA_RESULT_DATA_ERROR 1
\r
593 If you use callback function for input data and you return some
\r
594 error code, LZMA Decoder also returns that code.
\r
601 _LZMA_IN_CB - Use callback for input data
\r
603 _LZMA_OUT_READ - Use read function for output data
\r
605 _LZMA_LOC_OPT - Enable local speed optimizations inside code.
\r
606 _LZMA_LOC_OPT is only for LzmaDecodeSize.c (size-optimized version).
\r
607 _LZMA_LOC_OPT doesn't affect LzmaDecode.c (speed-optimized version)
\r
608 and LzmaStateDecode.c
\r
610 _LZMA_PROB32 - It can increase speed on some 32-bit CPUs,
\r
611 but memory usage will be doubled in that case
\r
613 _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler
\r
614 and long is 32-bit.
\r
616 _LZMA_SYSTEM_SIZE_T - Define it if you want to use system's size_t.
\r
617 You can use it to enable 64-bit sizes supporting
\r
621 C++ LZMA Encoder/Decoder
\r
622 ~~~~~~~~~~~~~~~~~~~~~~~~
\r
623 C++ LZMA code use COM-like interfaces. So if you want to use it,
\r
624 you can study basics of COM/OLE.
\r
626 By default, LZMA Encoder contains all Match Finders.
\r
627 But for compressing it's enough to have just one of them.
\r
628 So for reducing size of compressing code you can define:
\r
629 #define COMPRESS_MF_BT
\r
630 #define COMPRESS_MF_BT4
\r
631 and it will use only bt4 match finder.
\r
636 http://www.7-zip.org
\r
637 http://www.7-zip.org/support.html
\r