Add README.txt.
[salza2.git] / doc / index.html
blobf7a75a89ea8f4ff66f43c7f1f91a2e9d901d41ee
1 <html>
2 <head>
3 <title>Salza2 - Create compressed data from Common Lisp</title>
4 <style type="text/css">
5 a, a:visited { text-decoration: none }
6 a[href]:hover { text-decoration: underline }
7 pre { background: #DDD; padding: 0.25em }
8 p.download { color: red }
9 </style>
10 </head>
12 <body>
14 <h2>Salza2 - Create compressed data from Common Lisp</h2>
16 <blockquote class='abstract'>
17 <h3>Abstract</h3>
19 <p>Salza2 is a Common Lisp library for creating compressed data in the
20 ZLIB, DEFLATE, or GZIP data formats, described in
21 <a href="http://ietf.org/rfc/rfc1950.txt">RFC 1950</a>,
22 <a href="http://ietf.org/rfc/rfc1951.txt">RFC 1951</a>, and
23 <a href="http://ietf.org/rfc/rfc1952.txt">RFC 1952</a>, respectively.
24 It does not use any external libraries for compression. It does not
25 yet support decompression. Salza2 is available under
26 a <a href="COPYING.txt">BSD-like license</a>. The current version is
27 2.0.7, released on June 12, 2009.
29 <p class='download'>Download shortcut:
31 <p><a href="http://www.xach.com/lisp/salza2.tgz">http://www.xach.com/lisp/salza2.tgz</a>
33 </blockquote>
36 <h3>Contents</h3>
38 <ol>
40 <li> <a href='#sect-overview-and-limitations'>Overview and Limitations</a>
42 <li> <a href='#sect-dictionary'>Dictionary</a>
44 <ul>
45 <li> <a href='#sect-standard-compressors'>Standard Compressors</a>
47 <ul>
48 <li> <a href='#deflate-compressor'><tt>deflate-compressor</tt></a>
49 <li> <a href='#zlib-compressor'><tt>zlib-compressor</tt></a>
50 <li> <a href='#gzip-compressor'><tt>gzip-compressor</tt></a>
51 <li> <a href='#callback'><tt>callback</tt></a>
52 <li> <a href='#compress-octet'><tt>compress-octet</tt></a>
53 <li> <a href='#compress-octet-vector'><tt>compress-octet-vector</tt></a>
54 <li> <a href='#finish-compression'><tt>finish-compression</tt></a>
55 <li> <a href='#reset'><tt>reset</tt></a>
56 <li> <a href='#with-compressor'><tt>with-compressor</tt></a>
57 </ul>
59 <li> <a href='#sect-customizing-compressors'>Customizing Compressors</a>
61 <ul>
62 <li> <a href='#write-bits'><tt>write-bits</tt></a>
63 <li> <a href='#write-octet'><tt>write-octet</tt></a>
64 <li> <a href='#start-data-format'><tt>start-data-format</tt></a>
65 <li> <a href='#process-input'><tt>process-input</tt></a>
66 <li> <a href='#finish-data-format'><tt>finish-data-format</tt></a>
67 </ul>
69 <li> <a href='#sect-checksums'>Checksums</a>
71 <ul>
72 <li> <a href='#adler32-checksum'><tt>adler32-checksum</tt></a>
73 <li> <a href='#crc32-checksum'><tt>crc32-checksum</tt></a>
74 <li> <a href='#update'><tt>update</tt></a>
75 <li> <a href='#result'><tt>result</tt></a>
76 <li> <a href='#result-octets'><tt>result-octets</tt></a>
77 <li> <a href='#reset-checksum'><tt>reset</tt></a>
78 </ul>
80 <li> <a href='#sect-shortcuts'>Shortcuts</a>
82 <ul>
83 <li> <a href='#make-stream-output-callback'><tt>make-stream-output-callback</tt></a>
84 <li> <a href='#gzip-stream'><tt>gzip-stream</tt></a>
85 <li> <a href='#gzip-file'><tt>gzip-file</tt></a>
86 <li> <a href='#compress-data'><tt>compress-data</tt></a>
87 </ul>
88 </ul>
90 <li> <a href='#sect-references'>References</a>
92 <li> <a href='#sect-acknowledgements'>Acknowledgements</a>
94 <li> <a href='#sect-feedback'>Feedback</a>
96 </ol>
99 <a name='sect-overview-and-limitations'><h3>Overview and Limitations</h3></a>
101 <p>Salza2 provides an interface for creating a compressor object. This
102 object acts as a sink for octets (either individual octets or
103 vectors of octets), and is a source for octets in a compressed data
104 format. The compressed octet data is provided to a user-defined
105 callback that can write it to a stream, copy it to another vector,
106 etc.
108 <p>Salza2 has built-in compressors that support the ZLIB, DEFLATE, and
109 GZIP data formats. The classes and generic function protocol are
110 available to make it easy to support similar formats via subclassing
111 and new methods. ZLIB and GZIP are extensions to the DEFLATE format
112 and are implemented as subclasses
113 of <a href='#deflate-compressor'><tt>DEFLATE-COMPRESSOR</tt></a>
114 with a few methods implemented for the protocol.
116 <p>Salza2 is the successor
117 to <a href="http://cliki.net/Salza">Salza</a>, but it is not
118 backwards-compatible. Among other changes, Salza2 drops support for
119 compressing Lisp character data, since the compression formats are
120 octet-based and obtaining encoded octets from Lisp characters varies
121 from implementation to implementation.
123 <p>There are a number of functions that provide a simple interface to
124 specific tasks such as gzipping a file or compressing a single
125 vector.
127 <p>Salza2 does not decode compressed data. There is no support for
128 dynamically defined Huffman codes. There is currently no interface
129 for changing the tradeoff between compression speed and compressed
130 data size.
133 <a name='sect-dictionary'><h3>Dictionary</h3></a>
135 <p>The following symbols are exported from the SALZA2 package.
138 <a name='sect-standard-compressors'><h4>Standard Compressors</h4></a>
140 <p><a name='deflate-compressor'
141 ><a name='zlib-compressor'><a name='gzip-compressor'>[Classes]</a></a></a><br>
142 <b>deflate-compressor</b><br>
143 <b>zlib-compressor</b><br>
144 <b>gzip-compressor</b>
146 <blockquote>
147 Instances of these classes may be created via make-instance. The only
148 supported initarg is <tt>:CALLBACK</tt>.
149 See <a href='#callback'><tt>CALLBACK</tt></a> for the expected value.
150 </blockquote>
153 <p><a name='callback'>[Accessor]</a><br>
154 <b>callback</b> <i>compressor</i> => <i>callback</i><br>
155 (<tt>setf</tt> (<b>callback</b> <i>compressor</i>) <i>new-value</i>)
156 => <i>new-value</i>
158 <blockquote>
159 Gets or sets the callback function of <i>compressor</i>. The callback
160 should be a function of two arguments, an octet vector and an end
161 index, and it should process all octets from the start of the vector
162 below the end index as the compressed output data stream of the
163 compressor. See <a href='#make-stream-output-callback'><tt>MAKE-STREAM-OUTPUT-CALLBACK</tt></a>
164 for an example callback.
166 </blockquote>
168 <p><a name='compress-octet'>[Function]</a><br>
169 <b>compress-octet</b> <i>octet</i> <i>compressor</i> => |
171 <blockquote>
172 Adds <i>octet</i> to <i>compressor</i> to be compressed.
173 </blockquote>
176 <p><a name='compress-octet-vector'>[Function]</a><br>
177 <b>compress-octet-vector</b> <i>vector</i> <i>compressor</i> <tt>&key</tt>
178 <i>start</i> <i>end</i> => |
180 <blockquote>
181 Adds the octets from <i>vector</i> to <i>compressor</i> to be
182 compressed, beginning with the octet at <i>start</i> and ending at the
183 octet at
184 <i>end</i> - 1. If <i>start</i> is not specified, it defaults to
185 0. If <i>end</i> is not specified, it defaults to the total length
186 of <i>vector</i>. Equivalent to (but much more efficient than) the
187 following:
189 <pre>
190 (loop for i from start below end
191 do (compress-octet (aref vector i) compressor))
192 </pre>
194 </blockquote>
197 <p><a name='finish-compression'>[Generic function]</a><br>
198 <b>finish-compression</b> <i>compressor</i> => |
200 <blockquote>Compresses any pending data, concludes the data format
201 for <i>compressor</i> with
202 <a href='#finish-data-format'><tt>FINISH-DATA-FORMAT</tt></a>, and
203 invokes the user callback for the final octets of the compressed data
204 format. This function must be called at the end of compression to
205 ensure the validity of the data format; it is called implicitly
206 by <a href='#with-compressor'><tt>WITH-COMPRESSOR</tt></a>.
208 </blockquote>
211 <p><a name='reset'>[Generic function]</a><br>
212 <b>reset</b> <i>compressor</i> => |
214 <blockquote>
215 The default method
216 for <a href='#deflate-compressor'><tt>DEFLATE-COMPRESSOR</tt></a>
217 objects resets the internal state of <i>compressor</i> and
218 calls <a href='#start-data-format'><tt>START-DATA-FORMAT</tt></a>. This
219 allows the re-use of a single compressor object for multiple
220 compression tasks.
221 </blockquote>
224 <p><a name='with-compressor'>[Macro]<br>
225 <b>with-compressor</b> (<i>var</i> <i>class</i>
226 <tt>&amp;rest</tt> <i>initargs</i>
227 <tt>&amp;key</tt> <tt>&allow-other-keys</tt>)
228 <tt>&amp;body</tt> <i>body</i> => |
230 <blockquote>
231 Evaluates <i>body</i> with <i>var</i> bound to a new compressor
232 created as
233 with <tt>(apply&nbsp;#'make-instance&nbsp;class&nbsp;initargs)</tt>.
234 <a href='#finish-compression'><tt>FINISH-COMPRESSION</tt></a>
235 is implicitly called on the compressor at the end of evaluation.
236 </blockquote>
239 <a name='sect-customizing-compressors'><h4>Customizing Compressors</h4></a>
241 <p>Compressor objects follow a protocol that makes it easy to create
242 specialized data formats. The ZLIB data format is essentially the
243 same as the DEFLATE format with an additional header and a trailing
244 checksum; this is implemented by creating a new class and adding a
245 few new methods to the generic functions below.
247 <p>For example, consider a new compressed data format FOO that
248 encapsulates a DEFLATE data stream but adds four signature octets,
249 F0 0D 00 D1, to the start of the output data stream, and adds a
250 trailing 32-bit length value, MSB first, after the end. It could be
251 implemented like this:
253 <pre>
254 (defclass foo-compressor (deflate-compressor)
255 ((data-length
256 :initarg :data-length
257 :accessor data-length))
258 (:default-initargs
259 :data-length 0))
261 (defmethod <a href='#start-data-format'>start-data-format</a> :before ((compressor foo-compressor))
262 (<a href='#write-octet'>write-octet</a> #xF0 compressor)
263 (write-octet #x0D compressor)
264 (write-octet #x00 compressor)
265 (write-octet #xD1 compressor))
267 (defmethod <a href='#process-input'>process-input</a> :after ((compressor foo-compressor) input start count)
268 (declare (ignore input start))
269 (incf (data-length compressor) count))
271 (defmethod <a href='#finish-data-format'>finish-data-format</a> :after ((compressor foo-compressor))
272 (let ((length (data-length compressor)))
273 (write-octet (ldb (byte 8 24) length) compressor)
274 (write-octet (ldb (byte 8 16) length) compressor)
275 (write-octet (ldb (byte 8 8) length) compressor)
276 (write-octet (ldb (byte 8 0) length) compressor)))
278 (defmethod <a href='#reset'>reset</a> :after ((compressor foo-compressor))
279 (setf (data-length compressor) 0))
280 </pre>
283 <p><a name='write-bits'>[Function]</a><br>
284 <b>write-bits</b> <i>code</i> <i>size</i> <i>compressor</i> => |
286 <blockquote>
287 Writes <i>size</i> low bits of the integer <i>code</i> to the output
288 buffer of <i>compressor</i>. Follows the bit packing layout described
289 in <a href="http://ietf.org/rfc/rfc1951.txt">RFC 1951</a>. The bits
290 are not compressed, but become literal parts of the output stream.
291 </blockquote>
294 <p><a name='write-octet'>[Function]</a><br>
295 <b>write-octet</b> <i>octet</i> <i>compressor</i> => |
297 <blockquote>
298 Writes <i>octet</i> to the output buffer of <i>compressor</i>. Bits of the
299 octet are <i>not</i> packed; the octet is added to the output buffer
300 at the next octet boundary. The octet is not compressed, but becomes a
301 literal part of the output stream.
302 </blockquote>
305 <p><a name='start-data-format'>[Generic function]</a><br>
306 <b>start-data-format</b> <i>compressor</i> => |
308 <blockquote>
309 Outputs any prologue bits or octets needed to produce a valid
310 compressed data stream for <i>compressor</i>. Called from
311 initialize-instance and <a href='#reset'><tt>RESET</tt></a> for
312 subclasses of deflate-compressor. Should not be called directly, but
313 subclasses may add methods to customize what literal data is added to
314 the beginning of the output buffer.
315 </blockquote>
318 <p><a name='process-input'>[Generic function]</a><br>
319 <b>process-input</b> <i>compressor</i> <i>input</i>
320 <i>start</i> <i>count</i> => |
322 <blockquote>
323 Called when <i>count</i> octets of the octet vector <i>input</i>,
324 starting from <i>start</i>, are about to be compressed. This generic
325 function should not be called directly, but may be specialized.
327 <p>This is useful for data formats that must maintain information about
328 the uncompressed contents of a compressed data stream, such as
329 checksums or total data length.
330 </blockquote>
333 <p><a name='finish-data-format'>[Generic function]</a><br>
334 <b>finish-data-format</b> <i>compressor</i> => |
336 <blockquote>
337 Called
338 by <a href='#finish-compression'><tt>FINISH-COMPRESSION</tt></a>. Outputs
339 any epilogue bits or octets needed to produce a valid compressed data
340 stream for compressor. This generic function should not be called
341 directly, but may be specialized.
342 </blockquote>
345 <a name='sect-checksums'><h4>Checksums</h4></a>
347 <p>Checksums are used in several data formats to check data
348 integrity. For example, PNG uses a CRC32 checksum for its chunks of
349 data. Salza2 exports support for two common checksums.
351 <p><a name='adler32-checksum'><a name='crc32-checksum'>[Standard classes]</a></a><br>
352 <b>adler32-checksum</b><br>
353 <b>crc32-checksum</b>
355 <blockquote>
356 Instances of these classes may be created directly with
357 make-instance.
358 </blockquote>
360 <p><a name='update'>[Generic function]</a><br>
361 <b>update</b> <i>checksum</i> <i>buffer</i> <i>start</i> <i>count</i>
362 => |
364 <blockquote>
365 Updates <i>checksum</i> with <i>count</i> octets from the octet
366 vector <i>buffer</i>, starting at <i>start</i>.
367 </blockquote>
370 <p><a name='result'>[Generic function]</a><br>
371 <b>result</b> <i>checksum</i> => <i>result</i>
373 <blockquote>
374 Returns the accumulated value of <i>checksum</i> as an integer.
375 </blockquote>
378 <p><a name='result-octets'>[Generic function]</a><br>
379 <b>result-octets</b> <i>checksum</i> => <i>result-list</i>
381 <blockquote>
382 Returns the individual octets of <i>checksum</i> as a list of octets,
383 in MSB order.
384 </blockquote>
386 <p><a name='reset-checksum'>[Generic function]<br>
387 <b>reset</b> <i>checksum</i> => |
389 <blockquote>
390 The default method for checksum objects resets the internal state
391 of <i>checksum</i> so it may be re-used.
392 </blockquote>
395 <a name='sect-shortcuts'><h4>Shortcuts</h4></a>
397 <p>Some shortcuts for common compression tasks are available.
399 <p><a name='make-stream-output-callback'>[Function]</a><br>
400 <b>make-stream-output-callback</b> <i>stream</i> => <i>callback</i>>
402 <blockquote>
403 Creates and returns a callback function that writes all compressed
404 data to <i>stream</i>. It is defined like this:
406 <pre>
407 (defun make-stream-output-callback (stream)
408 (lambda (buffer end)
409 (write-sequence buffer stream :end end)))
410 </pre>
411 </blockquote>
413 <p><a name='gzip-stream'>[Function]</a><br>
414 <b>gzip-stream</b> <i>input-stream</i> <i>output-stream</i> => |
416 <blockquote>
417 Compresses all data read from <i>input-stream</i> and writes the
418 compressed data to <i>output-stream</i>.
419 </blockquote>
422 <p><a name='gzip-file'>[Function]</a><br>
423 <b>gzip-file</b> <i>input-file</i> <i>output-file</i> => <i>pathname</i>
425 <blockquote>
426 Compresses <i>input-file</i> and writes the compressed data
427 to <i>output-file</i>.
428 </blockquote>
431 <p><a name='compress-data'>[Function]</a><br>
432 <b>compress-data</b> <i>data</i> <i>compressor-designator</i>
433 <tt>&amp;rest</tt> <i>initargs</i> => <i>compressed-data</i>
435 <blockquote>
436 Compresses the octet vector <i>data</i> and returns the compressed
437 data as an octet vector. <i>compressor-designator</i> should be either
438 a compressor object, designating itself, or a symbol, designating a
439 compressor created as with <tt>(apply #'make-instance
440 compressor-designator initargs)</tt>.
442 <p>For example:
444 <pre>
445 * <b>(compress-data (sb-ext:string-to-octets "Hello, hello, hello, hello world.")
446 'zlib-compressor)</b>
447 #(8 153 243 72 205 201 201 215 81 200 192 164 20 202 243 139 114 82 244 0 194 64 11 139)
448 </pre>
449 </blockquote>
452 <a name='sect-references'><h3>References</h3></a>
454 <ul>
456 <li> Deutsch and
457 Gailly, <a href='http://ietf.org/rfc/rfc1950.txt'>ZLIB Compressed Data
458 Format Specification version 3.3 (RFC 1950)</a>
460 <li> Deutsch, <a href='http://ietf.org/rfc/rfc1951.txt'>DEFLATE
461 Compressed Data Format Specification version 1.3 (RFC 1951)</a>
463 <li> Deutsch, <a href='http://ietf.org/rfc/rfc1952.txt'>GZIP file
464 format specification version 4.3 (RFC 1952)</a>
466 <li>
467 Wikipedia, <a href='http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm'>Rabin-Karp
468 string search algorithm</a>
470 </ul>
473 <a name='sect-acknowledgements'><h3>Acknowledgements</h3></a>
475 <p>Thanks to Paul Khuong for his help optimizing the modulo-8191
476 hashing.
478 <p>Thanks to Austin Haas for providing some test SWF files
479 demonstrating a data format bug.
481 <a name='sect-feedback'><h3>Feedback</h3></a>
483 <p>Please direct any comments, questions, bug reports, or other
484 feedback to <a href='mailto:xach@xach.com'>Zach Beane</a>.