sr3c.d

   1 /** SR3C, a symbol ranking data compressor.
   2  *
   3  * This file implements a fast and effective data compressor.
   4  * The compression is on par (k8: i guess ;-) to gzip -7.
   5  * bzip2 -2 compresses slightly better than SR3C, but takes almost
   6  * three times as long. Furthermore, since bzip2 is  based on
   7  * Burrows-Wheeler block sorting, it can't be used in on-line
   8  * compression tasks.
   9  * Memory consumption of SR3C is currently around 4.5 MB per ongoing
  10  * compression and decompression.
  11  *
  12  * Author: Kenneth Oksanen <cessu@iki.fi>, 2008.
  13  * Copyright (C) Helsinki University of Technology.
  14  * D conversion by Ketmar // Invisible Vector <ketmar@ketmar.no-ip.org>
  15  *
  16  * This code borrows many ideas and some paragraphs of comments from
  17  * Matt Mahoney's s symbol ranking compression program SR2 and Peter
  18  * Fenwicks SRANK, but otherwise all code has been implemented from
  19  * scratch.
  20  *
  21  * This file is distributed under the following license:
  22  *
  23  * The MIT License
  24  * Copyright (c) 2008 Helsinki University of Technology
  25  * Permission is hereby granted, free of charge, to any person obtaining a copy
  26  * of this software and associated documentation files (the "Software"), to deal
  27  * in the Software without restriction, including without limitation the rights
  28  * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  29  * copies of the Software, and to permit persons to whom the Software is
  30  * furnished to do so, subject to the following conditions:
  31  * The above copyright notice and this permission notice shall be included in
  32  * all copies or substantial portions of the Software.
  33  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  34  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  35  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  36  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  37  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  38  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  39  * THE SOFTWARE.
  40  */
  41 module iv.sr3c /*is aliced*/;
  42 import iv.alice;
  43
  44 /** Prior to compression or uncompression the user of this library
  45  * creates a "compression context" of type `SR3C` which can
  46  * with certain disciplines be used for both compression and
  47  * uncompression. The compression context initialization is given a
  48  * function `outdg` which the compression and uncompression
  49  * routines call for processing the data further.
  50  *
  51  * This library is MT-safe so that any number of concurrent
  52  * compression or uncompression tasks may be ongoing as long as each
  53  * of them is passed a distinct compression context. By default each
  54  * compression context requires approximately 4.5 MB of memory, but
  55  * see the internal documentation of sr3c.d for further notes on this.
  56  *
  57  * Compression tasks are generally NOT synchronous in the sense that
  58  * once some data is passed to `compress()` the `outdg` would
  59  * be passed enough compressed data so as to uncompress all of the
  60  * same original data. When synchronization is required, for example
  61  * at the end of successful compression, the user of this library
  62  * should call `flush()`. The decompressing side will recognize
  63  * the flush from the compressed data. Only when a compression and
  64  * decompression have flushed in this manner, can the corresponding
  65  * compression contexts be used in reverse direction.
  66  */
  67 public final class SR3C {
  68 public:
  69   /** The type of the function used to process the compressed or
  70    * uncompressed data further. The function should return a non-zero
  71    * value on failure. The function may not free the memory referred to
  72    * by the 'bytes' argument. The argument 'flush' is passed a true
  73    * value iff `flush()` was called after compressing the data.
  74    */
  75   alias OutputDg = int delegate (const(void)[] bytes, bool flush);
  76
  77 private:
  78   enum SR_HMASK = 0xFFFFF;
  79   enum SR_HMULT = 480;
  80   enum SR_XSHFT = 20;
  81   /* It is possible to reduce memory consumption from 4.5 MB to, say,
  82      1.5 MB with the following definitions, but this will also reduce
  83      the compression ratio.  In my tests on average this would result in
  84      an almost 4% increase in the size of the compressed data.
  85     #define SR_HMASK   0x3FFFF
  86     #define SR_HMULT   352
  87     #define SR_XSHFT   18
  88      Even further memory savings are possible, but again with worse
  89      compression.  The following definitions require some 0.75 MB memory
  90      and result 7-8% larger compressed data than with current default
  91      definitions.  Further memory savings would require changes in the
  92      secondary arithmetic compression, a.k.a. state maps.
  93     #define SR_HMASK   0xFFFF
  94     #define SR_HMULT   288
  95     #define SR_XSHFT   16
  96      Conversely, by allowing a loftier memory budget, say 64.5 MB,
  97      compression can be improved further.  The following will result in
  98      a little over 2% improvement in compression:
  99     #define SR_HMASK   0xFFFFFF
 100     #define SR_HMULT   704
 101     #define SR_XSHFT   24
 102    */
 103
 104   enum SM_P_HALF = 1U<<31;
 105
 106   enum SM_HUFF_LO_N = 4*256*3;
 107   enum SM_HUFF_HI_N1 = 16;
 108   enum SM_HUFF_HI_N2 = 3;
 109   enum SM_BYTES_N = 256*256;
 110
 111   enum {
 112     SR_FLUSHED,
 113     SR_COMPRESSING,
 114     /* The following states are for the uncompressor. */
 115     SR_FILLING_1, SR_FILLING_2, SR_FILLING_3,
 116     SR_UNCOMPRESSING_1, SR_UNCOMPRESSING_2, SR_UNCOMPRESSING_3,
 117     SR_UNCOMPRESSING_BYTE, SR_FLUSHING,
 118   }
 119
 120   ubyte* outbuf_p; // current output buffer
 121   uint[SR_HMASK+1] rank; // symbol ranking tables
 122   uint hash; // hash value of the current compression context
 123   int prev_ch; // previous byte we encoded, 0 if none
 124   /* statemaps map secondary context to probability.
 125    * each statemap entry contains the prediction in the upper 25 bits and a count of
 126    * how many times this state has been reached in the lower 7 bits. */
 127   // states for encoding the first three bits in each context; some of the entries in sm_huff_hi are unused
 128   uint[SM_HUFF_LO_N] sm_huff_lo;
 129   uint[SM_HUFF_HI_N2][SM_HUFF_HI_N1] sm_huff_hi;
 130   // states for encoding a byte not predicted by symbol ranker using a context-1 arithmetic encoding
 131   uint[SM_BYTES_N] sm_bytes;
 132   // arithmetic coder range, initially [0, 1), scaled by 2^32; the field x is use only during uncompression
 133   uint x1, x2, x;
 134   // iutput function and its context, returns non-zero on error
 135   OutputDg output_f;
 136   int status; // SR_XXX
 137   // the following field is used by the uncompressor while uncompressing literal bytes
 138   int lit_ix;
 139
 140 public:
 141   this (OutputDg outdg) { reset(outdg); }
 142
 143 private:
 144   /** Reinitialize the given compression context. If `outdg` is
 145    * non-null, it is assigned to the compression context's new output
 146    * function. If, on the other hand, output_f is `null` the callback
 147    * function remain as it is.
 148    */
 149   public void reset (OutputDg outdg=null) {
 150     int i, j;
 151     /* Initialize statemaps.  The initial values were chosen based on a
 152        large number of test runs.  The cumulative statistic match quite
 153        well those of Fenwick:
 154          - Least recently used matches with 45% probability (over all counts),
 155          - Second least recently used matches with 15% probability,
 156          - Third least recently used matches with 7% probability,
 157          - Literals match with 32% probability.
 158        Initializing to anything else but SM_P_HALF produces
 159        proportionally modest benefits for large inputs, but we wanted
 160        SR3C to be effective also for small inputs. */
 161     for (i = 0; i < 3*256; i += 3) {
 162       sm_huff_lo.ptr[i] = (3400<<20)|0;
 163       sm_huff_lo.ptr[i+1] = (150<<20)|12;
 164       sm_huff_lo.ptr[i+2] = (1000<<20)|12;
 165     }
 166     for (; i < 2*3*256; i += 3) {
 167       sm_huff_lo.ptr[i] = (1500<<20)|0;
 168       sm_huff_lo.ptr[i+1] = (1840<<20)|12;
 169       sm_huff_lo.ptr[i+2] = (780<<20)|12;
 170     }
 171     for (; i < 3*3*256; i += 3) {
 172       sm_huff_lo.ptr[i] = (880<<20)|0;
 173       sm_huff_lo.ptr[i+1] = (1840<<20)|12;
 174       sm_huff_lo.ptr[i+2] = (760<<20)|12;
 175     }
 176     for (; i < 4*3*256; i += 3) {
 177       sm_huff_lo.ptr[i] = (780<<20)|0;
 178       sm_huff_lo.ptr[i+1] = (1840<<20)|12;
 179       sm_huff_lo.ptr[i+2] = (1120<<20)|12;
 180     }
 181     for (i = 0; i < SM_HUFF_HI_N1; i++) {
 182       sm_huff_hi.ptr[i].ptr[0] = (400<<20)|10;
 183       sm_huff_hi.ptr[i].ptr[1] = (1840<<20)|12;
 184       sm_huff_hi.ptr[i].ptr[2] = (1180<<20)|12;
 185     }
 186     for (i = 0; i < SM_BYTES_N; i++)
 187       sm_bytes.ptr[i] = SM_P_HALF;
 188
 189     for (i = 0; i < SR_HMASK+1; i++)
 190       rank.ptr[i] = 0;
 191
 192     prev_ch = 0;
 193     hash = 0;
 194
 195     x1 = 0;
 196     x2 = 0xFEFFFFFF;
 197     if (outdg !is null) output_f = outdg;
 198     status = SR_FLUSHED;
 199   }
 200
 201   // what is the prediction, as a fractional probability from 0 to 4095, of the next bit being one
 202   enum SM_PREDICT(string sm) = "(("~sm~")>>20)";
 203   enum SM_COUNT(string sm) = "(("~sm~")&0x1FF)";
 204
 205
 206   enum SM_UPDATE_RANKS(string sm, string bit) = "do {\n"~
 207     //"assert(bit == 0 || bit == 1);\n"~
 208     "int count = "~sm~"&0x1FF, prediction__ = "~sm~">>9;\n"~
 209     "int d = ("~bit~"<<23)-prediction__;\n"~
 210     ""~sm~" += (d*sr_wt_ranks.ptr[count])&0xFFFFFE00;\n"~
 211     "if (d < 0) d = -d;\n"~
 212     "d >>= 17;\n"~
 213     "//"~sm~" += sr_ct_ranks.ptr[d].ptr[count];\n"~
 214     ""~sm~" ^= sr_ct_ranks.ptr[d].ptr[count];\n"~
 215   "} while (0);";
 216
 217   enum SM_UPDATE_BYTES(string sm, string bit) = "do {\n"~
 218     //"assert(bit == 0 || bit == 1);\n"~
 219     "int count = "~sm~"&0x1FF, prediction__ = "~sm~">>9;\n"~
 220     "int d = ("~bit~"<<23)-prediction__;\n"~
 221     ""~sm~" += (d*sr_wt_bytes.ptr[count])&0xFFFFFE00;\n"~
 222     "if (d < 0) d = -d;\n"~
 223     "d >>= 17;\n"~
 224     ""~sm~" ^= sr_ct_bytes.ptr[d].ptr[count];\n"~
 225   "} while (0);";
 226
 227   // compress bit in the given state map, possibly outputting bytes in process
 228   enum SR_ENCODE_RANK_BIT(string sm, string bit) = "do {\n"~
 229     "uint xmid;\n"~
 230     "int prediction_ = "~SM_PREDICT!sm~";\n"~
 231     //"assert(bit == 0 || bit == 1);\n"~
 232     "assert(prediction_ >= 0 && prediction_ < 4096);\n"~
 233     "xmid = x1+((x2-x1)>>12)*prediction_;\n"~
 234     "assert(xmid >= x1 && xmid < x2);\n"~
 235     "if ("~bit~") x2 = xmid; else x1 = xmid+1;\n"~
 236     SM_UPDATE_RANKS!(sm, bit)~"\n"~
 237     "/* pass equal leading bytes of range */\n"~
 238     "while ((x1>>24) == (x2>>24)) {\n"~
 239     "  *outbuf_p++ = x2>>24;\n"~
 240     "  x1 <<= 8;\n"~
 241     "  x2 = (x2<<8)+255;\n"~
 242     "}\n"~
 243   "} while (0);";
 244
 245   enum SR_ENCODE_BYTE_BIT(string sm, string bit) = "do {\n"~
 246     "uint xmid;\n"~
 247     "int prediction_ = "~SM_PREDICT!sm~";\n"~
 248     //"assert(bit == 0 || bit == 1);\n"~
 249     "assert(prediction_ >= 0 && prediction_ < 4096);\n"~
 250     "xmid = x1+((x2-x1)>>12)*prediction_;\n"~
 251     "assert(xmid >= x1 && xmid < x2);\n"~
 252     "if ("~bit~") x2 = xmid; else x1 = xmid+1;\n"~
 253     SM_UPDATE_BYTES!(sm, bit)~"\n"~
 254     "// pass equal leading bytes of range\n"~
 255     "while ((x1>>24) == (x2>>24)) {\n"~
 256     "  *outbuf_p++ = x2>>24;\n"~
 257     "  x1 <<= 8;\n"~
 258     "  x2 = (x2<<8)+255;\n"~
 259     "}\n"~
 260   "} while (0);";
 261
 262   enum SR_ENCODE_BYTE(string ch) = "do {\n"~
 263     "uint* sm_ = &sm_bytes.ptr[256*prev_ch];\n"~
 264     "uint ix = 1, x = "~ch~", bit;\n"~
 265     "do {\n"~
 266     "  bit = (x>>7)&0x1;\n"~
 267     "  "~SR_ENCODE_BYTE_BIT!("sm_[ix]", "bit")~"\n"~
 268     "  ix = (ix<<1)|bit;\n"~
 269     "  x <<= 1;\n"~
 270     "} while (ix < 256);\n"~
 271   "} while (0);";
 272
 273
 274   enum OUTBUF_SIZE = 1024;
 275
 276   /** Compress the given bytes, possibly sending some compressed data to
 277    * `outdg`. Returns zero on success, or whatever `outdg` returned
 278    * should it have erred. Once the context has been used for
 279    * compressing, it can't be used for uncompressing before `flush()`
 280    * has been called.
 281    */
 282   public int compress (const(void)[] data) {
 283     const(ubyte)* bytes = cast(const(ubyte)*)data.ptr;
 284     usize n_bytes = data.length;
 285     uint index, xsum, r;
 286     uint* sm;
 287     int ch;
 288     ubyte[OUTBUF_SIZE] outbuf;
 289     /* The theoretical worst case is that each bit is encoded into 12
 290        bits, and there can be 10 bits of output per bit sent to the
 291        arithmetic coder, or 15 bytes.  Additionally there may be some
 292        bytes of flushing from the arithmetic coder itself, thus a margin
 293        of slightly cautious 30 bytes. */
 294     ubyte* outbuf_end = outbuf.ptr+OUTBUF_SIZE-30;
 295
 296     assert(status == SR_FLUSHED || status == SR_COMPRESSING);
 297
 298     status = SR_COMPRESSING;
 299
 300     while (n_bytes > 0) {
 301       outbuf_p = outbuf.ptr;
 302       while (outbuf_p < outbuf_end && n_bytes > 0) {
 303         --n_bytes;
 304         index = hash&SR_HMASK;
 305         xsum = (hash>>SR_XSHFT)&0xF;
 306         r = rank.ptr[index];
 307         if (((r>>24)&0xF) != xsum) {
 308           // hash collision: pick another index, use it if checksums match or it has a lower first-hit counter value
 309           int alt_index = (index+0x3A77)&SR_HMASK;
 310           uint alt_r = rank.ptr[alt_index];
 311           if (((alt_r>>24)&0xF) == xsum || alt_r < r) {
 312             index = alt_index;
 313             r = alt_r;
 314           }
 315         }
 316         if (r >= 0x40000000)
 317           sm = &sm_huff_hi.ptr[r>>28].ptr[0];
 318         else
 319           sm = &sm_huff_lo.ptr[3*(prev_ch|((r>>20)&0x300))];
 320
 321         ch = *bytes++;
 322         xsum = (xsum<<24)|ch;
 323         if (ch == (r&0xFF)) {
 324           // is ch the least recently seen?
 325           mixin(SR_ENCODE_RANK_BIT!("sm[0]", "0"));
 326           if (r < 0xF0000000) r += 0x10000000; // increment hit count
 327         } else {
 328           mixin(SR_ENCODE_RANK_BIT!("sm[0]", "1"));
 329           if (ch == ((r>>8)&0xFF)) {
 330             // is ch the second least recent?
 331             mixin(SR_ENCODE_RANK_BIT!("sm[1]", "1"));
 332             mixin(SR_ENCODE_RANK_BIT!("sm[2]", "0"));
 333             if ((r>>28) >= 0xC)
 334               r &= 0x4FFFFFFF;
 335             else
 336               r = (r&0xFF0000)|((r&0xFF)<<8)|xsum|0x10000000;
 337           } else if (ch == ((r>>16)&0xFF)) {
 338             // is ch the third least recent?
 339             mixin(SR_ENCODE_RANK_BIT!("sm[1]", "1"));
 340             mixin(SR_ENCODE_RANK_BIT!("sm[2]", "1"));
 341             r = ((r&0xFFFF)<<8)|xsum|0x10000000;
 342           } else {
 343             mixin(SR_ENCODE_RANK_BIT!("sm[1]", "0"));
 344             mixin(SR_ENCODE_BYTE!"ch");
 345             r = ((r&0xFFFF)<<8)|xsum;
 346           }
 347         }
 348         rank.ptr[index] = r;
 349         prev_ch = ch;
 350         hash = SR_HMULT*hash+ch+1;
 351       }
 352       if (outbuf_p > outbuf.ptr) {
 353         if (int err = output_f(outbuf[0..outbuf_p-outbuf.ptr], false)) return err;
 354       }
 355     }
 356     return 0;
 357   }
 358
 359   /** Flush the internal buffered state of the compression context to
 360    * `outdg` so that a uncompression of all the data provided prior
 361    * to the flush becomes possible. Returns zero on success, or
 362    * whatever `outdg` returned should it have erred. It is possible
 363    * to continue compression after flushing the buffers, but each flush
 364    * will cause at least a three byte overhead, probably higher.
 365    * File compressors and archivers typically flush only at the end of
 366    * compression, but message-interchanging programs flush whenever
 367    * real-timeness requires or a response is required to proceed.
 368    *
 369    * Note that `flush()` may not be called for a context currently used
 370    * for uncompression.
 371    */
 372   public int flush () {
 373     uint r;
 374     uint *sm;
 375     int index, xsum, i, err;
 376     ubyte[128] outbuf;
 377     outbuf_p = &outbuf[0];
 378
 379     assert(status == SR_COMPRESSING || status == SR_FLUSHED);
 380
 381     /* Pick state map entry as if compressing a normal data byte. */
 382     index = hash&SR_HMASK;
 383     xsum = (hash>>SR_XSHFT)&0xF;
 384     r = rank.ptr[index];
 385     if (((r>>24)&0xF) != xsum) {
 386       int alt_index = (index+0x3A77)&SR_HMASK;
 387       uint alt_r = rank.ptr[alt_index];
 388       if (((alt_r>>24)&0xF) == xsum || alt_r < r) {
 389         index = alt_index;
 390         r = alt_r;
 391       }
 392     }
 393     if (r >= 0x40000000)
 394       sm = &sm_huff_hi.ptr[r>>28].ptr[0];
 395     else
 396       sm = &sm_huff_lo.ptr[3*(prev_ch|((r>>20)&0x300))];
 397
 398     /* Mark end of data by coding third least recently used byte as
 399        a literal. */
 400     mixin(SR_ENCODE_RANK_BIT!("sm[0]", "1"));
 401     mixin(SR_ENCODE_RANK_BIT!("sm[1]", "0"));
 402     mixin(SR_ENCODE_BYTE!"((r>>16)&0xFF)");
 403
 404     /* Flush also the arithmetic encoder, first by the first unequal
 405        byte in the range and thereafter three maximum bytes. */
 406     *outbuf_p++ = x1>>24;
 407     *outbuf_p++ = 0xFF;
 408     *outbuf_p++ = 0xFF;
 409     *outbuf_p++ = 0xFF;
 410
 411     /* Finally send this all out. */
 412     err = output_f(outbuf[0..outbuf_p-outbuf.ptr], true);
 413     if (err) return err;
 414
 415     /* Reset internal values in the context, not however the statemaps or ranks. */
 416     prev_ch = 0;
 417     hash = 0;
 418     x1 = 0;
 419     x2 = 0xFEFFFFFF;
 420     status = SR_FLUSHED;
 421
 422     return 0;
 423   }
 424
 425   /** Return true if the given context is flushed, i.e. if the
 426    * context can now be used for either compression or uncompression.
 427    */
 428   public @property bool flushed () const pure nothrow @safe @nogc { return (status == SR_FLUSHED); }
 429
 430   /** Uncompress the given bytes, possibly sending some uncompressed data
 431    * to `outdg`.  Returns zero on success, or whatever `outdg`
 432    * returned should it have failed. SR3C includes no checksum so
 433    * corrupted compressed messages will not be detected. Once the
 434    * context has been used for uncompressing, it can't be used for
 435    * compressing before `flush()` has been issued on the corresponding
 436    * compressing side and the resulting compressed data has been
 437    * uncompressed.
 438    */
 439   public int uncompress (const(void)[] data) {
 440     enum SR_INPUT(string state_name) =
 441       "do {\n"~
 442       ""~state_name~":\n"~
 443       "  while ((x1>>24) == (x2>>24)) {\n"~
 444       "    if (n_bytes == 0) { status = "~state_name~"; goto out_of_input; }\n"~
 445       "    x1 <<= 8;\n"~
 446       "    x2 = (x2<<8)+255;\n"~
 447       "    x = (x<<8)+*bytes++;\n"~
 448       "    n_bytes--;\n"~
 449       "  }\n"~
 450       "} while (0);";
 451
 452     enum SR_DECODE_BIT(string sm, string bit, string state_name) =
 453       "do {\n"~
 454       "  "~SR_INPUT!(state_name)~"\n"~
 455       "  prediction = "~SM_PREDICT!(sm)~";\n"~
 456       "  assert(prediction >= 0 && prediction < 4096);\n"~
 457       "  xmid = x1+((x2-x1)>>12)*prediction;\n"~
 458       "  assert(xmid >= x1 && xmid < x2);\n"~
 459       "  if (x <= xmid) {\n"~
 460       "    "~bit~" = 1;\n"~
 461       "    x2 = xmid;\n"~
 462       "  } else {\n"~
 463       "    "~bit~" = 0;\n"~
 464       "    x1 = xmid+1;\n"~
 465       "  }\n"~
 466       "} while (0);";
 467
 468     const(ubyte)* bytes = cast(const(ubyte)*)data.ptr;
 469     usize n_bytes = data.length;
 470     uint xmid;
 471     int prediction;
 472
 473     uint index, xsum, r;
 474     uint* sm;
 475     int bit, ch, err;
 476     ubyte[OUTBUF_SIZE] outbuf;
 477     outbuf_p = outbuf.ptr;
 478     ubyte* outbuf_end = outbuf.ptr+OUTBUF_SIZE;
 479
 480     assert(status != SR_COMPRESSING);
 481
 482     switch (status) {
 483       case SR_FLUSHED:
 484         while (n_bytes > 0 && *bytes == 0xFF) {
 485           bytes++;
 486           n_bytes--;
 487         }
 488         if (n_bytes-- == 0) return 0;
 489       restart:
 490         x = (x<<8)|*bytes++;
 491         goto case;
 492       case SR_FILLING_1:
 493         if (n_bytes-- == 0) {
 494           status = SR_FILLING_1;
 495           return 0;
 496         }
 497         x = (x<<8)|*bytes++;
 498         goto case;
 499       case SR_FILLING_2:
 500         if (n_bytes-- == 0) {
 501           status = SR_FILLING_2;
 502           return 0;
 503         }
 504         x = (x<<8)|*bytes++;
 505         goto case;
 506       case SR_FILLING_3:
 507         if (n_bytes-- == 0) {
 508           status = SR_FILLING_3;
 509           return 0;
 510         }
 511         x = (x<<8)|*bytes++;
 512         status = SR_UNCOMPRESSING_1;
 513         break;
 514       case SR_FLUSHING:
 515         goto SR_FLUSHING;
 516       default:
 517         // the default branch is here to only to keep the compiler happy
 518         break;
 519     }
 520
 521     index = hash&SR_HMASK;
 522     xsum = (hash>>SR_XSHFT)&0xF;
 523     r = rank.ptr[index];
 524     if (((r>>24)&0xF) != xsum) {
 525       // hash collision: pick another index, use it if checksums match or it has a lower first-hit counter value
 526       int alt_index = (index+0x3A77)&SR_HMASK;
 527       uint alt_r = rank.ptr[alt_index];
 528       if (((alt_r>>24)&0xF) == xsum || alt_r < r) {
 529         index = alt_index;
 530         r = alt_r;
 531       }
 532     }
 533     if (r >= 0x40000000)
 534       sm = &sm_huff_hi.ptr[r>>28].ptr[0];
 535     else
 536       sm = &sm_huff_lo.ptr[3*(prev_ch|((r>>20)&0x300))];
 537     xsum <<= 24;
 538
 539     switch (status) {
 540       case SR_UNCOMPRESSING_1: goto SR_UNCOMPRESSING_1;
 541       case SR_UNCOMPRESSING_2: goto SR_UNCOMPRESSING_2;
 542       case SR_UNCOMPRESSING_3: goto SR_UNCOMPRESSING_3;
 543       case SR_UNCOMPRESSING_BYTE: sm = &sm_bytes.ptr[256*prev_ch]; goto SR_UNCOMPRESSING_BYTE;
 544       default: assert(0);
 545     }
 546
 547     for (;;) {
 548       index = hash&SR_HMASK;
 549       xsum = (hash>>SR_XSHFT)&0xF;
 550       r = rank.ptr[index];
 551       if (((r>>24)&0xF) != xsum) {
 552         // hash collision: pick another index, use it if checksums match or it has a lower first-hit counter value
 553         int alt_index = (index+0x3A77)&SR_HMASK;
 554         uint alt_r = rank.ptr[alt_index];
 555         if (((alt_r>>24)&0xF) == xsum || alt_r < r) {
 556           index = alt_index;
 557           r = alt_r;
 558         }
 559       }
 560       if (r >= 0x40000000)
 561         sm = &sm_huff_hi.ptr[r>>28].ptr[0];
 562       else
 563         sm = &sm_huff_lo.ptr[3*(prev_ch|((r>>20)&0x300))];
 564       xsum <<= 24;
 565
 566       mixin(SR_DECODE_BIT!("sm[0]", "bit", "SR_UNCOMPRESSING_1"));
 567       mixin(SM_UPDATE_RANKS!("sm[0]", "bit"));
 568       if (bit) {
 569         mixin(SR_DECODE_BIT!("sm[1]", "bit", "SR_UNCOMPRESSING_2"));
 570         mixin(SM_UPDATE_RANKS!("sm[1]", "bit"));
 571         if (bit) {
 572           mixin(SR_DECODE_BIT!("sm[2]", "bit", "SR_UNCOMPRESSING_3"));
 573           mixin(SM_UPDATE_RANKS!("sm[2]", "bit"));
 574           if (bit) {
 575             // third least recent byte
 576             ch = (r>>16)&0xFF;
 577             r = ((r&0xFFFF)<<8)|ch|xsum|0x10000000;
 578           } else {
 579             // second least recent byte
 580             ch = (r>>8)&0xFF;
 581             if ((r>>28) >= 0xC)
 582               r &= 0x4FFFFFFF;
 583             else
 584               r = (r&0xFF0000)|((r&0xFF)<<8)|ch|xsum|0x10000000;
 585           }
 586         } else {
 587           sm = &sm_bytes.ptr[256*prev_ch];
 588           lit_ix = 1;
 589           do {
 590             mixin(SR_DECODE_BIT!("sm[lit_ix]", "bit", "SR_UNCOMPRESSING_BYTE"));
 591             mixin(SM_UPDATE_BYTES!("sm[lit_ix]", "bit"));
 592             lit_ix = (lit_ix<<1)|bit;
 593           } while (lit_ix < 256);
 594           ch = lit_ix&0xFF;
 595           if (ch == ((r>>16)&0xFF)) goto flush;
 596           r = ((r&0xFFFF)<<8)|ch|xsum;
 597         }
 598       } else {
 599         // least recent byte
 600         ch = r&0xFF;
 601         if (r < 0xF0000000) r += 0x10000000;
 602       }
 603
 604       *outbuf_p++ = cast(ubyte)ch;
 605       rank.ptr[index] = r;
 606       prev_ch = ch;
 607       hash = SR_HMULT*hash+ch+1;
 608       if (outbuf_p == outbuf_end) {
 609         err = output_f(outbuf[0..OUTBUF_SIZE], false);
 610         if (err) return err;
 611         outbuf_p = outbuf.ptr;
 612       }
 613     }
 614
 615    flush:
 616     // we come here when we have received a flush.
 617     // pass the flush-induced bytes in the data stream and reset internal values
 618     // in the context, not however the statemaps or ranks.
 619     mixin(SR_INPUT!"SR_FLUSHING");
 620     prev_ch = 0;
 621     hash = 0;
 622     x1 = 0;
 623     x2 = 0xFEFFFFFF;
 624     status = SR_FLUSHED;
 625     /* Skip 0xFF-bytes. */
 626     while (n_bytes > 0 && *bytes == 0xFF) {
 627       bytes++;
 628       n_bytes--;
 629     }
 630     err = output_f(outbuf[0..outbuf_p-outbuf.ptr], true);
 631     if (err) return err;
 632     outbuf_p = outbuf.ptr;
 633     if (n_bytes-- > 0) {
 634       outbuf_p = outbuf.ptr;
 635       goto restart;
 636     }
 637     return 0;
 638
 639    out_of_input:
 640     if (outbuf_p != outbuf.ptr) return output_f(outbuf[0..outbuf_p-outbuf.ptr], false);
 641     return 0;
 642   }
 643
 644 static:
 645   // some precomputed tables on how the secondary arithmetical compressor adjusts to actual data; see comments later
 646   static immutable ubyte[512] sr_wt_bytes;
 647   static immutable ubyte[512] sr_wt_ranks;
 648   static immutable ushort[512][65] sr_ct_bytes;
 649   static immutable ushort[512][65] sr_ct_ranks;
 650
 651   shared static this () pure nothrow @safe @nogc {
 652     // initialize the various precomputed tables
 653     foreach (int count; 0..512) {
 654       // a table indexed by current state map count and returning the corresponding responsiveness to unpredicted input
 655       // found with numeric optimization for the order-0 arithmetic compressor
 656       sr_wt_bytes[count] = cast(ubyte)(2.814086+(1.107489*count+639.588922)/(0.006940*count*count+count+6.318012));
 657       // as above, but for rank encoding data
 658       sr_wt_ranks[count] = cast(ubyte)(-1.311630+(0.616477*count+640.391038)/(0.001946*count*count+count+5.632143));
 659       foreach (int d; 0..64+1) {
 660         // a table for updating the count-part in statemaps for bytes
 661         int c = (d > 1.1898325 && count > 19.782085 ? cast(int)(-2+0.021466*(d-1.1898325)*(count-19.782085)) : -2);
 662         if (c > count) c = 0; else if (count-c > 0x1FF) c = 0x1FF; else c = count-c;
 663         sr_ct_bytes[d][count] = cast(ushort)(c^count);
 664         // a table for updating the count-part in statemaps for ranks
 665         c = (count > 33.861341 ? cast(int)(-2+0.005355*(d+0.981405)*(count-33.861341)) : -2);
 666         if (c > count) c = 0; else if (count-c > 0x1FF) c = 0x1FF; else c = count-c;
 667         sr_ct_ranks[d][count] = cast(ushort)(c^count);
 668       }
 669     }
 670   }
 671 }
 672
 673
 674 /* This code is based on many of the ideas in Matt Mahoney's SR2
 675    symbol ranking compression program
 676        http://www.cs.fit.edu/~mmahoney/compression/#sr2
 677    which in turn is based on SRANK by P. M. Fenwick in 1997-98
 678        ftp://ftp.cs.auckland.ac.nz/pub/staff/peter-f/srank.c
 679    See also Fenwick's technical report "A fast, constant-order, symbol
 680    ranking text compressor", The University of Auckland, Department of
 681    Computer Science Report No 145, Apr. 1997.
 682
 683    A symbol ranking compressor maintains a list (a move-to-front
 684    queue) of the most recently seen symbols (bytes) in the given
 685    context in order of time since last seen.  The original SRANK used
 686    a table of 2^10 to 2^18 hashed order-3 contexts, SR2 uses 2^20
 687    hashed order-4 contexts, and some versions of SR3 use 2^24
 688    contexts, which we considered excessively much in terms of implied
 689    memory consumption.  All the mentioned programs use a queue length
 690    of 3, SR2 furthermore uses a 6 bit count (n) of consecutive hits,
 691    SR3C uses only a 4 bit count but employs a 4-bit checksum to reduce
 692    hash collisions.
 693
 694    SR2 as well as SR3C follow Fenwick's suggestion for a hardware
 695    implementation in sending 0 for literals, 110 and 111 for the
 696    second and third least recently seen, and 10xxxxxxxx's are reserved
 697    for literals.  These are compressed further with arithmetic coding
 698    using both an order-1 context (last coded byte) and the count as
 699    context, or order-0 and count if the count is greater or equal to
 700    four.
 701
 702    Codes and updates are as follows:
 703
 704       Input    Code        Next state
 705                            (c1  c2  c3  n)
 706       -----    ----        ---------------
 707       Initial              (0,  0,  0,  0)
 708         c1     0           (c1, c2, c3, min(n+1, 15))
 709         c2     110         (c2, c1, c3, 1)
 710         c3     111         (c3, c1, c2, 1)
 711       other c  10cccccccc  (c,  c1, c2, 0)
 712
 713    As an exception, however, in SR3C if input is c2 and n has reached
 714    very high counts (12 or more), then the count is reduced to four
 715    but the queue is kept intact.
 716
 717    After coding byte c, the hash index h is updated to h * 480 + c + 1
 718    (mod 2^20) which depends on only the last 4 bytes.  SR2 did not
 719    detect hash collisions, but SR3C uses a 4-bit checksum which is
 720    formed by the next four higher bits of the hash.  The values are
 721    packed into a 32 bit integer as follows: c1 in bits 0-7, c2 in
 722    8-15, c3 in 16-23, checksum in 24-27, n in 28-31.
 723
 724    End of file is marked by coding c3 as a literal (SR2 used c1)
 725    followed by three 0xFF's.  The compressor library adds no header,
 726    but ensures the message doesn't begin with 0xFF so as to enable
 727    arbitrary catenation of messages.  Additional headers and checksums
 728    may be added by a stand-alone archiver using this library.
 729
 730    Arithmetic coding is performed as in SR2, and we advise the reader
 731    to its documentation.  Let it only be noted that compared to SR2,
 732    SR3C uses far fewer arithmetic coding states than SR2 thus saving
 733    ~1.5 MB of RAM as well as incidentally also a slightly better
 734    compression ratio.  There are also several other differences in
 735    between SR2 and SR3C in how the predictions for next bits are
 736    updated.
 737  */