Note the release date of January 1.
[salza2.git] / doc / index.html
blob681d3dc0c2b2f84172749848544a985d15094d6e
1 <html>
2 <head>
3 <title>Salza2 - Create compressed data from Common Lisp</title>
4 <style type="text/css">
5 a, a:visited { text-decoration: none }
6 a[href]:hover { text-decoration: underline }
7 pre { background: #DDD; padding: 0.25em }
8 p.download { color: red }
9 </style>
10 </head>
12 <body>
14 <h2>Salza2 - Create compressed data from Common Lisp</h2>
16 <blockquote class='abstract'>
17 <h3>Abstract</h3>
19 <p>Salza2 is a Common Lisp library for creating compressed data in the
20 ZLIB, DEFLATE, or GZIP data formats, described in
21 <a href="http://ietf.org/rfc/rfc1950.txt">RFC 1950</a>,
22 <a href="http://ietf.org/rfc/rfc1951.txt">RFC 1951</a>, and
23 <a href="http://ietf.org/rfc/rfc1952.txt">RFC 1952</a>, respectively.
24 It does not use any external libraries for compression. It does not
25 yet support decompression. Salza2 is available under
26 a <a href="COPYING.txt">BSD-like license</a>. The current version is
27 2.0, released on January 1st, 2008.
29 <p class='download'>Download shortcut:
31 <p><a href="http://www.xach.com/lisp/salza2.tgz">http://www.xach.com/lisp/salza2.tgz</a>
33 </blockquote>
36 <h3>Contents</h3>
38 <ol>
40 <li> <a href='#sect-overview-and-limitations'>Overview and Limitations</a>
42 <li> <a href='#sect-dictionary'>Dictionary</a>
44 <ul>
45 <li> <a href='#sect-standard-compressors'>Standard Compressors</a>
47 <ul>
48 <li> <a href='#deflate-compressor'><tt>deflate-compressor</tt></a>
49 <li> <a href='#zlib-compressor'><tt>zlib-compressor</tt></a>
50 <li> <a href='#gzip-compressor'><tt>gzip-compressor</tt></a>
51 <li> <a href='#callback'><tt>callback</tt></a>
52 <li> <a href='#compress-octet'><tt>compress-octet</tt></a>
53 <li> <a href='#compress-octet-vector'><tt>compress-octet-vector</tt></a>
54 <li> <a href='#finish-compression'><tt>finish-compression</tt></a>
55 <li> <a href='#reset'><tt>reset</tt></a>
56 <li> <a href='#with-compressor'><tt>with-compressor</tt></a>
57 </ul>
59 <li> <a href='#sect-customizing-compressors'>Customizing Compressors</a>
61 <ul>
62 <li> <a href='#write-bits'><tt>write-bits</tt></a>
63 <li> <a href='#write-octet'><tt>write-octet</tt></a>
64 <li> <a href='#start-data-format'><tt>start-data-format</tt></a>
65 <li> <a href='#process-input'><tt>process-input</tt></a>
66 <li> <a href='#finish-data-format'><tt>finish-data-format</tt></a>
67 </ul>
69 <li> <a href='#sect-checksums'>Checksums</a>
71 <ul>
72 <li> <a href='#adler32-checksum'><tt>adler32-checksum</tt></a>
73 <li> <a href='#crc32-checksum'><tt>crc32-checksum</tt></a>
74 <li> <a href='#update'><tt>update</tt></a>
75 <li> <a href='#result'><tt>result</tt></a>
76 <li> <a href='#result-octets'><tt>result-octets</tt></a>
77 <li> <a href='#reset-checksum'><tt>reset</tt></a>
78 </ul>
80 <li> <a href='#sect-shortcuts'>Shortcuts</a>
82 <ul>
83 <li> <a href='#make-stream-output-callback'><tt>make-stream-output-callback</tt></a>
84 <li> <a href='#gzip-stream'><tt>gzip-stream</tt></a>
85 <li> <a href='#gzip-file'><tt>gzip-file</tt></a>
86 <li> <a href='#compress-data'><tt>compress-data</tt></a>
87 </ul>
88 </ul>
90 <li> <a href='#sect-references'>References</a>
92 <li> <a href='#sect-feedback'>Acknowledgements and Feedback</a>
94 </ol>
97 <a name='sect-overview-and-limitations'><h3>Overview and Limitations</h3></a>
99 <p>Salza2 provides an interface for creating a compressor object. This
100 object acts as a sink for octets (either individual octets or
101 vectors of octets), and is a source for octets in a compressed data
102 format. The compressed octet data is provided to a user-defined
103 callback that can write it to a stream, copy it to another vector,
104 etc.
106 <p>Salza2 has built-in compressors that support the ZLIB, DEFLATE, and
107 GZIP data formats. The classes and generic function protocol are
108 available to make it easy to support similar formats via subclassing
109 and new methods. ZLIB and GZIP are extensions to the DEFLATE format
110 and are implemented as subclasses
111 of <a href='#deflate-compressor'><tt>DEFLATE-COMPRESSOR</tt></a>
112 with a few methods implemented for the protocol.
114 <p>Salza2 is the successor
115 to <a href="http://cliki.net/Salza">Salza</a>, but it is not
116 backwards-compatible. Among other changes, Salza2 drops support for
117 compressing Lisp character data, since the compression formats are
118 octet-based and obtaining encoded octets from Lisp characters varies
119 from implementation to implementation.
121 <p>There are a number of functions that provide a simple interface to
122 specific tasks such as gzipping a file or compressing a single
123 vector.
125 <p>Salza2 does not decode compressed data. There is no support for
126 dynamically defined Huffman codes. There is currently no interface
127 for changing the tradeoff between compression speed and compressed
128 data size.
131 <a name='sect-dictionary'><h3>Dictionary</h3></a>
133 <p>The following symbols are exported from the SALZA2 package.
136 <a name='sect-standard-compressors'><h4>Standard Compressors</h4></a>
138 <p><a name='deflate-compressor'
139 ><a name='zlib-compressor'><a name='gzip-compressor'>[Classes]</a></a></a><br>
140 <b>deflate-compressor</b><br>
141 <b>zlib-compressor</b><br>
142 <b>gzip-compressor</b>
144 <blockquote>
145 Instances of these classes may be created via make-instance. The only
146 supported initarg is <tt>:CALLBACK</tt>.
147 See <a href='#callback'><tt>CALLBACK</tt></a> for the expected value.
148 </blockquote>
151 <p><a name='callback'>[Accessor]</a><br>
152 <b>callback</b> <i>compressor</i> => <i>callback</i><br>
153 (<tt>setf</tt> (<b>callback</b> <i>compressor</i>) <i>new-value</i>)
154 => <i>new-value</i>
156 <blockquote>
157 Gets or sets the callback function of <i>compressor</i>. The callback
158 should be a function of two arguments, an octet vector and an end
159 index, and it should process all octets from the start of the vector
160 below the end index as the compressed output data stream of the
161 compressor. See <a href='#make-stream-output-callback'><tt>MAKE-STREAM-OUTPUT-CALLBACK</tt></a>
162 for an example callback.
164 </blockquote>
166 <p><a name='compress-octet'>[Function]</a><br>
167 <b>compress-octet</b> <i>octet</i> <i>compressor</i> => |
169 <blockquote>
170 Adds <i>octet</i> to <i>compressor</i> to be compressed.
171 </blockquote>
174 <p><a name='compress-octet-vector'>[Function]</a><br>
175 <b>compress-octet-vector</b> <i>vector</i> <i>compressor</i> <tt>&key</tt>
176 <i>start</i> <i>end</i> => |
178 <blockquote>
179 Adds the octets from <i>vector</i> to <i>compressor</i> to be
180 compressed, beginning with the octet at <i>start</i> and ending at the
181 octet at
182 <i>end</i> - 1. If <i>start</i> is not specified, it defaults to
183 0. If <i>end</i> is not specified, it defaults to the total length
184 of <i>vector</i>. Equivalent to (but much more efficient than) the
185 following:
187 <pre>
188 (loop for i from start below end
189 do (compress-octet (aref vector i) compressor))
190 </pre>
192 </blockquote>
195 <p><a name='finish-compression'>[Generic function]</a><br>
196 <b>finish-compression</b> <i>compressor</i> => |
198 <blockquote>Compresses any pending data, concludes the data format
199 for <i>compressor</i> with
200 <a href='#finish-data-format'><tt>FINISH-DATA-FORMAT</tt></a>, and
201 invokes the user callback for the final octets of the compressed data
202 format. This function must be called at the end of compression to
203 ensure the validity of the data format; it is called implicitly
204 by <a href='#with-compressor'><tt>WITH-COMPRESSOR</tt></a>.
206 </blockquote>
209 <p><a name='reset'>[Generic function]</a><br>
210 <b>reset</b> <i>compressor</i> => |
212 <blockquote>
213 The default method
214 for <a href='#deflate-compressor'><tt>DEFLATE-COMPRESSOR</tt></a>
215 objects resets the internal state of <i>compressor</i> and
216 calls <a href='#start-data-format'><tt>START-DATA-FORMAT</tt></a>. This
217 allows the re-use of a single compressor object for multiple
218 compression tasks.
219 </blockquote>
222 <p><a name='with-compressor'>[Macro]<br>
223 <b>with-compressor</b> (<i>var</i> <i>class</i>
224 <tt>&amp;rest</tt> <i>initargs</i>
225 <tt>&amp;key</tt> <tt>&allow-other-keys</tt>)
226 <tt>&amp;body</tt> <i>body</i> => |
228 <blockquote>
229 Evaluates <i>body</i> with <i>var</i> bound to a new compressor
230 created as
231 with <tt>(apply&nbsp;#'make-instance&nbsp;class&nbsp;initargs)</tt>.
232 <a href='#finish-compression'><tt>FINISH-COMPRESSION</tt></a>
233 is implicitly called on the compressor at the end of evaluation.
234 </blockquote>
237 <a name='sect-customizing-compressors'><h4>Customizing Compressors</h4></a>
239 <p>Compressor objects follow a protocol that makes it easy to create
240 specialized data formats. The ZLIB data format is essentially the
241 same as the DEFLATE format with an additional header and a trailing
242 checksum; this is implemented by creating a new class and adding a
243 few new methods to the generic functions below.
245 <p>For example, consider a new compressed data format FOO that
246 encapsulates a DEFLATE data stream but adds four signature octets,
247 F0 0D 00 D1, to the start of the output data stream, and adds a
248 trailing 32-bit length value, MSB first, after the end. It could be
249 implemented like this:
251 <pre>
252 (defclass foo-compressor (deflate-compressor)
253 ((data-length
254 :initarg :data-length
255 :accessor data-length))
256 (:default-initargs
257 :data-length 0))
259 (defmethod <a href='#start-data-format'>start-data-format</a> :before ((compressor foo-compressor))
260 (<a href='#write-octet'>write-octet</a> #xF0 compressor)
261 (write-octet #x0D compressor)
262 (write-octet #x00 compressor)
263 (write-octet #xD1 compressor))
265 (defmethod <a href='#process-input'>process-input</a> :after ((compressor foo-compressor) input start count)
266 (declare (ignore input start))
267 (incf (data-length compressor) count))
269 (defmethod <a href='#finish-data-format'>finish-data-format</a> :after ((compressor foo-compressor))
270 (let ((length (data-length compressor)))
271 (write-octet (ldb (byte 8 24) length) compressor)
272 (write-octet (ldb (byte 8 16) length) compressor)
273 (write-octet (ldb (byte 8 8) length) compressor)
274 (write-octet (ldb (byte 8 0) length) compressor)))
276 (defmethod <a href='#reset'>reset</a> :after ((compressor foo-compressor))
277 (setf (data-length compressor) 0))
278 </pre>
281 <p><a name='write-bits'>[Function]</a><br>
282 <b>write-bits</b> <i>code</i> <i>size</i> <i>compressor</i> => |
284 <blockquote>
285 Writes <i>size</i> low bits of the integer <i>code</i> to the output
286 buffer of <i>compressor</i>. Follows the bit packing layout described
287 in <a href="http://ietf.org/rfc/rfc1951.txt">RFC 1951</a>. The bits
288 are not compressed, but become literal parts of the output stream.
289 </blockquote>
292 <p><a name='write-octet'>[Function]</a><br>
293 <b>write-octet</b> <i>octet</i> <i>compressor</i> => |
295 <blockquote>
296 Writes <i>octet</i> to the output buffer of <i>compressor</i>. Bits of the
297 octet are <i>not</i> packed; the octet is added to the output buffer
298 at the next octet boundary. The octet is not compressed, but becomes a
299 literal part of the output stream.
300 </blockquote>
303 <p><a name='start-data-format'>[Generic function]</a><br>
304 <b>start-data-format</b> <i>compressor</i> => |
306 <blockquote>
307 Outputs any prologue bits or octets needed to produce a valid
308 compressed data stream for <i>compressor</i>. Called from
309 initialize-instance and <a href='#reset'><tt>RESET</tt></a> for
310 subclasses of deflate-compressor. Should not be called directly, but
311 subclasses may add methods to customize what literal data is added to
312 the beginning of the output buffer.
313 </blockquote>
316 <p><a name='process-input'>[Generic function]</a><br>
317 <b>process-input</b> <i>compressor</i> <i>input</i>
318 <i>start</i> <i>count</i> => |
320 <blockquote>
321 Called when <i>count</i> octets of the octet vector <i>input</i>,
322 starting from <i>start</i>, are about to be compressed. This generic
323 function should not be called directly, but may be specialized.
325 <p>This is useful for data formats that must maintain information about
326 the uncompressed contents of a compressed data stream, such as
327 checksums or total data length.
328 </blockquote>
331 <p><a name='finish-data-format'>[Generic function]</a><br>
332 <b>finish-data-format</b> <i>compressor</i> => |
334 <blockquote>
335 Called
336 by <a href='#finish-compression'><tt>FINISH-COMPRESSION</tt></a>. Outputs
337 any epilogue bits or octets needed to produce a valid compressed data
338 stream for compressor. This generic function should not be called
339 directly, but may be specialized.
340 </blockquote>
343 <a name='sect-checksums'><h4>Checksums</h4></a>
345 <p>Checksums are used in several data formats to check data
346 integrity. For example, PNG uses a CRC32 checksum for its chunks of
347 data. Salza2 exports support for two common checksums.
349 <p><a name='adler32-checksum'><a name='crc32-checksum'>[Standard classes]</a></a><br>
350 <b>adler32-checksum</b><br>
351 <b>crc32-checksum</b>
353 <blockquote>
354 Instances of these classes may be created directly with
355 make-instance.
356 </blockquote>
358 <p><a name='update'>[Generic function]</a><br>
359 <b>update</b> <i>checksum</i> <i>buffer</i> <i>start</i> <i>count</i>
360 => |
362 <blockquote>
363 Updates <i>checksum</i> with <i>count</i> octets from the octet
364 vector <i>buffer</i>, starting at <i>start</i>.
365 </blockquote>
368 <p><a name='result'>[Generic function]</a><br>
369 <b>result</b> <i>checksum</i> => <i>result</i>
371 <blockquote>
372 Returns the accumulated value of <i>checksum</i> as an integer.
373 </blockquote>
376 <p><a name='result-octets'>[Generic function]</a><br>
377 <b>result-octets</b> <i>checksum</i> => <i>result-list</i>
379 <blockquote>
380 Returns the individual octets of <i>checksum</i> as a list of octets,
381 in MSB order.
382 </blockquote>
384 <p><a name='reset-checksum'>[Generic function]<br>
385 <b>reset</b> <i>checksum</i> => |
387 <blockquote>
388 The default method for checksum objects resets the internal state
389 of <i>checksum</i> so it may be re-used.
390 </blockquote>
393 <a name='sect-shortcuts'><h4>Shortcuts</h4></a>
395 <p>Some shortcuts for common compression tasks are available.
397 <p><a name='make-stream-output-callback'>[Function]</a><br>
398 <b>make-stream-output-callback</b> <i>stream</i> => <i>callback</i>>
400 <blockquote>
401 Creates and returns a callback function that writes all compressed
402 data to <i>stream</i>. It is defined like this:
404 <pre>
405 (defun make-stream-output-callback (stream)
406 (lambda (buffer end)
407 (write-sequence buffer stream :end end)))
408 </pre>
409 </blockquote>
411 <p><a name='gzip-stream'>[Function]</a><br>
412 <b>gzip-stream</b> <i>input-stream</i> <i>output-stream</i> => |
414 <blockquote>
415 Compresses all data read from <i>input-stream</i> and writes the
416 compressed data to <i>output-stream</i>.
417 </blockquote>
420 <p><a name='gzip-file'>[Function]</a><br>
421 <b>gzip-file</b> <i>input-file</i> <i>output-file</i> => <i>pathname</i>
423 <blockquote>
424 Compresses <i>input-file</i> and writes the compressed data
425 to <i>output-file</i>.
426 </blockquote>
429 <p><a name='compress-data'>[Function]</a><br>
430 <b>compress-data</b> <i>data</i> <i>compressor-designator</i>
431 <tt>&amp;rest</tt> <i>initargs</i> => <i>compressed-data</i>
433 <blockquote>
434 Compresses the octet vector <i>data</i> and returns the compressed
435 data as an octet vector. <i>compressor-designator</i> should be either
436 a compressor object, designating itself, or a symbol, designating a
437 compressor created as with <tt>(apply #'make-instance
438 compressor-designator initargs)</tt>.
440 <p>For example:
442 <pre>
443 * <b>(compress-data (sb-ext:string-to-octets "Hello, hello, hello, hello world.")
444 'zlib-compressor)</b>
445 #(8 153 243 72 205 201 201 215 81 200 192 164 20 202 243 139 114 82 244 0 194 64 11 139)
446 </pre>
447 </blockquote>
450 <a name='sect-references'><h3>References</h3></a>
452 <ul>
454 <li> Deutsch and
455 Gailly, <a href='http://ietf.org/rfc/rfc1950.txt'>ZLIB Compressed Data
456 Format Specification version 3.3 (RFC 1950)</a>
458 <li> Deutsch, <a href='http://ietf.org/rfc/rfc1951.txt'>DEFLATE
459 Compressed Data Format Specification version 1.3 (RFC 1951)</a>
461 <li> Deutsch, <a href='http://ietf.org/rfc/rfc1952.txt'>GZIP file
462 format specification version 4.3 (RFC 1952)</a>
464 <li>
465 Wikipedia, <a href='http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm'>Rabin-Karp
466 string search algorithm</a>
468 </ul>
471 <a name='sect-feedback'><h3>Acknowledgements &amp; Feedback</h3></a>
473 <p>Thanks to Paul Khuong for his help optimizing the modulo-8191
474 hashing.
476 <p>Please direct any comments, questions, bug reports, or other
477 feedback to <a href='mailto:xach@xach.com'>Zach Beane</a>.