arith.tex

   1 \label{arithcoding}
   2
   3 This annex has three parts:
   4 \begin{enumerate}
   5 \item a description of the principles of arithmetic
   6 coding,
   7 \item a specification of the arithmetic decoding
   8 engine used in Dirac, and
   9 \item a description of a compatible arithmetic encoder.
  10 \end{enumerate}
  11
  12 \subsection{Arithmetic coding principles (Informative)}
  13
  14 This section provides an introduction to the principles underlying arithmetic
  15 coding. It briefly describes binary arithmetic coding, that is the coding of binary symbols,
  16 which is used in Dirac.
  17
  18 Arithmetic coding is an extremely powerful form of entropy coding, which can closely
  19 approximate the Shannon information limit for given data. Arithmetic encoding consists
  20  of a state machine that is fed with a sequence of symbols together with an estimate
  21 of each symbol’s probability. For each input symbol the arithmetic coding engine
  22 updates its state and output a number of coded bits. The number of output bits
  23 for each input symbol depends on the internal state and on the current probabilities
  24  the symbols that are coded, and can range from zero to many bits.
  25
  26 The variable number of coded bits output for each input symbol complicates
  27 the implementation but is essential to the optimal nature of arithmetic coding. Consider
  28 a binary symbol $b$, with $p(b=\false)=p_0$ and $p(b=\true)=1-p_0$. The entropy of $b$
  29 is the expected number of bits required to encode $b$, and is equal to
  30 \[e(p_0)=p_0\log_2(1/p_0)+(1-p_0)\log_2(1/(1-p_0))\]
  31
  32 If $e(p_0)$ is plotted against $p_0$, it can be seen that if $p_0$ is not equal
  33 to 0.5 exactly, $e(p_0)<1$. This means that an optimal binary entropy encoder
  34 that operates symbol by symbol, cannot produce an output for every symbol.
  35
  36 \subsubsection{Interval division and scaling}
  37 The fundamental idea of arithmetic coding is interval division and scaling. An
  38 arithmetic code can be thought of as a single number lying in an interval
  39 determined by the sequence of values being coded. For simplicity, this discussion
  40 describes binary arithmetic coding, but larger symbol alphabets can be treated
  41 in an analogous manner.
  42
  43 Let us begin with the interval $[0,1)$, and suppose that we know (or have some
  44 estimate of) the probability of $\false$, $p_0$. Conceptually we divide the interval
  45 into the intervals $[0,p0)$ and $[p_0,1)$. Suppose we code $\false$ as the
  46 first symbol. In this case the interval is changed to $[0,p_0)$. If we code $\true$,
  47 then the interval is changed to $[p_0,1)$. After coding a number of symbols we
  48 arrive at an interval $[low,high)$. To code the next symbol we partition this interval
  49 into $[low,low+p_0(high-low))$ and $[low+p_0(high-low),high)$, and if the symbol is
  50 $\false$ we choose the first interval, otherwise the second.
  51
  52 For any integer $N$, this process clearly partitions the interval $[0,1)$ into
  53 a set of disjoint intervals that correspond to all the input sequences of $N$ bits.
  54 Identifying such a bit sequence is equivalent to choosing a value in the
  55 corresponding interval, and for an interval width $w$ that in general requires
  56
  57 \[\left\lceil\log_2(1/w)\right\rceil\]
  58 bits. With static probabilities, on average,
  59 \[w=p_0^{Np_0}(1-p_0)^{N(1-p_0)}\]
  60 resulting in
  61 \[\left\lceil Ne(p_0)\right\rceil\]
  62 being used, demonstrating the near-optimality of arithmetic coding.
  63 Moreover, it is clearly possible to create an adaptive arithmetic code by
  64 changing the estimate of $p_0$ based on previously coded data.
  65
  66 \subsubsection{Finite precision arithmetic}
  67 As it stands, the procedure outlined in the previous section has a number of
  68 drawbacks for practical application. Firstly, it requires unlimited precision
  69 to scale the interval, which is not available in real hardware or software.
  70 Secondly, it only produces an output when all values have been coded. These
  71 problems are addressed by renormalisation and progressive output: periodically rescaling the
  72 interval, and outputting the most significant bits of $low$ and $high$ whenever they agree.
  73
  74 For example, if we know that
  75 \begin{eqnarray*}
  76 low &=& b0xyz...\\
  77 high & = &b0pqr...
  78 \end{eqnarray*}
  79  then we can output $0$, since this must prefix any value lying in the interval,
  80 and shift $low$ and $high$ to get $low=bxyz...$ and $high=bpqr...$.
  81 This has the effect of doubling the interval from 0 ($x\mapsto 2x$). Likewise if
  82 \begin{eqnarray*}
  83 low &=& b1xyz...\\
  84 high & = &b1pqr...
  85 \end{eqnarray*}
  86 we can output $1$ and shift to get
  87 $low=bxyz...$ and $high=bpqr...$ again: this is equivalent to doubling the interval
  88 from 1 ($x\mapsto 2x-1$).
  89
  90 One problem remains: suppose the interval resolutely sits on the fence, straddling
  91 $\dfrac{1}{2}$ whilst getting smaller and smaller, with the most
  92 significant bits of low and high staying as 0 and 1 respectively.
  93 In this case, when the straddle is finally resolved, $low$ and $high$ will
  94 both be of the form $b10000...xyz$ or $b01111...pqr$.
  95
  96 The resolution strategy is to again rescale $low$ and $high$, but this time
  97 double from $\dfrac{1}{2}$ (i.e. $x\mapsto 2x-\dfrac{1}{2}$), and keep a count of the number $k$
  98 of times this is done, as this is the number of carry bits that are
  99 required. When the straddle is resolved as 1, then 1 followed by $k$ zero bits is
 100 output, otherwise 0 followed by k 1s is output. This ensures that the
 101 output exactly represents the small straddling interval.
 102
 103 A decoder can determine a symbol as soon as it has sufficient bits to distinguish
 104 whether a value lies in one interval or another. If constraints are placed on the
 105  size of the smallest interval before
 106 renormalisation (for example, by renormalising often enough and by having a fixed
 107 smallest allowable probability), then this can be accomplished within a fixed word width.
 108
 109 \subsubsection{Symbol probability estimation}
 110 \begin{comment}
 111 The effectiveness of arithmetic coding is critically dependent on the accurate estimation of symbol probabilities. This
 112 might be achieved by counting the number of False symbols, n0, and True symbols, n1, that have previously been
 113 coded (or decoded), and estimating the probability of zero as:
 114 p0 = n0/(n0+n1)
 115 This requires maintaining the values of the two counts, n0 and n1, and performing some arithmetic (including division)
 116 every time a symbol is coded or decoded, in order to estimate the probability. Usually the values of n0 and n1 are both
 117 initialized to one for the first symbol, giving an initial probability estimate of 0.5.
 118 With this method of probability estimation if False is coded the probability becomes
 119   p0=n0+1/(n0+n1+1),
 120 and if True is coded the probability becomes
 121    p0=n0/(n0+n1+1).
 122 In other words:
 123 ?  If False is coded the probability update is: p0 += (1-p0)/(n0+n1+1)
 124 ?  else, if True is coded, the probability update is: p0 -= p0/(n0+n1+1)
 125 In VC-2 the probability estimate is simplified by assuming an estimate for the total number of symbols coded, t, as a
 126 function of probability itself. That is:
 127 n0+n1 ˜ g(p0)
 128 With this simplifying assumption the probability update process becomes:
 129 ?  If False is coded:  p0 += (1-p0)/( g(p0)+1)
 130 ?  else, if True is coded: p0 -= p0/( g(p0)+1)
 131 Many choices would be possible for the definition of  ?(p0). For VC-2 g(p0) is given by:
 132 g(p0) ˜ 2v
 133   where v=8-4.log2(128.(1-2.abs(p0-0.5)))/7
 134 With this choice of g(p0), (n0+n1) is assumed to be 16 when p0=0.5 and 256 when p0=1/256 or 255/256.
 135 Assuming the total number of symbols coded is a function of probability significantly simplifies the implementation.
 136 The probability update can be achieved with a single arithmetic operation (addition or subtraction) and a pair of
 137 lookup tables. That is:
 138 ?  If False is coded: p0 += LUT_False[p0]
 139 ?  else, if True is coded: p0 -= LUT_True[p0]
 140 In the VC-2 arithmetic codec probability is represented as a 16 bit unsigned integer (that is the range [0,65536)   SMPTE xxxx:2007        20/12/2007
 141
 142 Page 119 of 150 pages
 143 represents the range of probabilities [0,1) ). To reduce the size of the lookup table only the 8 most significant bits of p0
 144 are used as the index to the lookup table. The definition of LUT_True is:
 145 LUT_True(prob):
 146    lut_input &= 0xFF00
 147    if (lut_input >= 0x8000) p0 = lut_input/65536.0
 148    else p0 = (lut_input+256)/65536.0
 149    v = 8.0-4.0*log2(128.0*(1.0-2.0*abs(p0-0.5)))/7.0
 150    t = 2**v
 151    return round(lut_input+128)/t+1)
 152 Here prob is the 16bit representation of probability used in the VC-2 codec, the return value is in the same format.
 153 The 8 LSBs of prob are ignored allowing the use of a 256 entry lookup table. p0 is calculated using a dead-zone
 154 quantiser. The lookup table for False symbols is calculated similarly. It turns out that LUT_False is the mirror image
 155 of LUT_True so the same table can be used for both.
 156 Note: this pseudocode does not conform to the definitions in section 4 because it contains floating point numbers,
 157 floating point division and the round function.
 158 \end{comment}
 159
 160 \subsection{Arithmetic decoding engine}
 161 \label{arithengine}
 162
 163 This section is a normative specification of the operation of the arithmetic
 164 decoding engine and the processes for using it to extract binary values from coded streams.
 165
 166 The arithmetic decoding engine shall consist of two elements:
 167 \begin{enumerate}
 168 \item a collection of state variables representing the state of the arithmetic
 169 decoder (Section~\ref{initarith})
 170 \item a function for extracting binary values from the decoder
 171 and updating the decoder state (Section~\ref{arithreadbool})
 172 \end{enumerate}
 173
 174 \subsubsection{State and contexts}
 175 \label{stateandcontexts}
 176
 177 The arithmetic decoder state shall consist of the following decoder state variables:
 178
 179 \begin{itemize}
 180 \item $\ALow$, an integer representing the beginning of the current coding interval.
 181 \item $\ARange$, an integer representing the size of the current coding interval.
 182 \item $\ACode$, an integer within the interval from $\ALow$ to $\ALow+\ARange-1$, determined from the encoded bitstream.
 183 \item $\ABitsLeft$, a decrementing count of the number of bits yet to be read in
 184 \item $\AContexts$, a map of all the contexts used in the Dirac decoder.
 185 \end{itemize}
 186
 187 A context $context$ shal be a 16 bit unsigned interger value which encapsulates
 188 the probability of zero symbol in the stream, represented as  such that
 189 \[0<context[prob0]\leq\text{0xFFFF}\]
 190
 191 Contexts shall be accessed by decoding functions via a context label passed to the function.
 192
 193 \subsubsection{Initialisation}
 194 \label{initarith}
 195
 196 At the beginning of the decoding of any data unit, the arithmetic
 197 decoding state shall be initialised as follows:
 198
 199 \begin{pseudo}{initialise\_arithmetic\_decoding}{ctx\_labels}
 200 \bsCODE{\ALow = \text{0x0}}
 201 \bsCODE{\ARange =\text{0xFFFF}}
 202 \bsCODE{\ACode =\text{ 0x0}}
 203 \bsFOR{i=0}{15}
 204     \bsCODE{\ACode <<= 1}
 205     \bsCODE{\ACode+= read\_bitb()}
 206 \bsEND
 207 \bsCODE{init\_context\_probs(ctx\_labels)}
 208 \end{pseudo}
 209
 210 The $init\_context\_probs()$ process shall be defined as follows:
 211
 212 \begin{pseudo}{init\_context\_probs}{ctx\_labels}
 213 \bsFOR{i=0}{\length(ctx\_labels)-1}
 214   \bsCODE{\AContexts[ctx\_labels[i]]=\text{0x8000}}
 215 \bsEND
 216 \end{pseudo}
 217
 218 \begin{informative}
 219 The value 0x8000 represents 1/2 or equal likelihood for binary values.
 220 \end{informative}
 221
 222 \subsubsection{Data input}
 223 \label{inputarith}
 224
 225 The arithmetic decoding process shall access data in a contiguous block of bytes
 226 whose size is equal to $\ABitsLeft$, this value having been set prior to decoding
 227 the block. The bits in this block shall be sufficient to allow for the
 228 decoding of all coefficients. However, the specification of arithmetic
 229 decoding operations in this section may occasionally cause further bits to be read,
 230 even though they are not required for determining decoded values. For this
 231 reason the bounded-block read function $read\_bitb()$ (Section~\ref{blockreadbit})
 232 shall be used for data access.
 233
 234 Since the length of arithmetically coded data elements is given in bytes within the Dirac
 235 stream, there may be bits left unread when all values have been extracted. These
 236 shall be flushed as described in Section~\ref{blockreadbit}. Since arithmetically coded
 237 blocks are byte-aligned and a whole number of bytes, this aligns data input with the
 238 beginning of the byte after the arithmetically coded data i.e.\ at the end of the
 239 data chunk. $flush\_inputb()$ shall always be called at the end of decoding an
 240 arithmetically coded data element.
 241
 242 \subsubsection{Decoding boolean values}
 243 \label{arithreadbool}
 244
 245 The arithmetic decoding engine is a multi-context, adaptive binary
 246 arithmetic decoder, performing binary renormalisation and producing
 247 binary outputs. For each bit decoded, the semantics of the relevant
 248 calling decoder function shall determine which contexts are passed to the
 249 arithmetic decoding operations.
 250
 251 This section defines the operation of the $read\_boola()$ function
 252 for extracting a boolean value from the Dirac stream. The function
 253 shall be defined as follows:
 254
 255 \begin{pseudo}{read\_boola}{context\_label}
 256 \bsCODE{prob\_zero=\AContexts[context\_label]}
 257 \bsCODE{count = \ACode-\ALow}
 258 \bsCODE{range\_times\_prob = (\ARange * prob\_zero)\gg 16}
 259 \bsIF{ count >= range\_times\_prob }
 260   \bsCODE{value = \true}
 261   \bsCODE{\ALow +=  range\_times\_prob}
 262   \bsCODE{\ARange -= range\_time\_prob}
 263 \bsELSE
 264   \bsCODE{value = \false}
 265   \bsCODE{\ARange = range\_times\_prob}
 266 \bsEND
 267 \bsCODE{update\_context(\AContexts[context\_label],value)}{\ref{contextupdate}}
 268 \bsWHILE{\ARange<=\text{0x4000}}
 269     \bsCODE{renormalise()}{\ref{renormalisation}}
 270 \bsEND
 271 \bsRET{value}
 272 \end{pseudo}
 273
 274 \begin{informative}
 275 The function scales the probability of the symbol 0 or $\false$ from the decoding context
 276 so that if this probability were $1$, then the interval would equal that between
 277  $\ALow$ and
 278  \[high=\ALow+\ARange-1\]
 279 and $count$ is set to the normalised cut-off between 0/$\false$ and 1/$\true$
 280 within this range.
 281 \end{informative}
 282
 283 \subsubsection{Renormalisation}
 284 \label{renormalisation}
 285
 286 Renormalisation shall be applied to stop the arithmetic decoding
 287 engine from losing accuracy. Renormalisation shall be
 288  applied while the range is less than or equal to a quarter of
 289  the total available 16-bit range ($\text{0x4000}$).
 290
 291 Renormalisation shall double the interval and read a bit into the codeword.
 292 The $renormalise()$ function shall be defined as follows:
 293
 294 \begin{pseudo}{renormalise}{}
 295 \bsIF{((\ALow+\ARange-1)\wedge\ALow) >= \text{0x8000}}
 296     \bsCODE{\ACode \wedge= \text{0x4000}}
 297     \bsCODE{\ALow \wedge= \text{0x4000}}
 298 \bsEND
 299 \bsCODE{\ALow  <<=  1}
 300 \bsCODE{\ARange  <<=  1}
 301 \bsCODE{\ALow \&= \text{0xFFFF}}
 302 \bsCODE{\ACode <<= 1}
 303 \bsCODE{\ACode+= read\_bitb()}
 304 \bsCODE{\ACode \&= \text{0xFFFF}}
 305 \end{pseudo}
 306
 307 \begin{informative}
 308 For convenience let $low=\ALow$ and $high=\ALow+\ARange-1$
 309 represent the upper and lower bounds of the interval. If the
 310 range is $<=\text{0x4000}$ then
 311 one of three possibilities must obtain:
 312 \begin{enumerate}
 313 \item the msbs of $low$ and $high$ are both 0
 314 \item the msbs of $low$ and $high$ are both 1
 315 \item $low=b01...$, $high=b10....$,  and the interval straddles the half-way point 0x8000.
 316 \end{enumerate}
 317
 318 The  renormalisation
 319 process has the effect that: in case 1, the interval $[low,high]$ is doubled from 0 (i.e. $x\mapsto 2*x$); in case 2 it is doubled from 1 (i.e. $x\mapsto 2*x-1$); and in case 3 it is doubled from 1/2 (i.e. $x\mapsto 2x-0.5$).
 320 \end{informative}
 321
 322 \subsubsection{Updating contexts}
 323 \label{contextupdate}
 324
 325 Context probabilities shall be updated according to a probability look-up table
 326 $\ALUT$, which supplies a value for decrementing
 327 or incrementing the probability of zero based on the first
 328 8 bits of its current value, according to Table \ref{table:lut}.
 329
 330 The $update\_context()$ process shall be defined as follows:
 331
 332 \begin{pseudo}{update\_context}{ctx\_prob,value}
 333 \bsIF{value==\true}
 334     \bsCODE{ctx\_prob -= \ALUT[ctx\_prob \gg 8]}{Table \ref{table:lut}}
 335 \bsELSE
 336     \bsCODE{ctx\_prob += \ALUT[255-(ctx\_prob \gg 8)]}{Table \ref{table:lut}}
 337 \bsEND
 338 \end{pseudo}
 339
 340 The lookup table used for updating context probabilities shall be as defined in
 341 Table \ref{table:lut}. below. The lookup table entries are arranged in raster scan order with
 342 rows of thirteen entries. The entry corresponding to index zero is in the
 343 top left hand corner, the index increments by one from left to right and by
 344 thirteen from top to bottom, the entry corresponding to index 255 is on the right hand
 345 side of the last row.
 346 \begin{table}[!ht]
 347 \centering
 348 \begin{tabular}{|cccccccc|}
 349 \hline
 350 \multicolumn{8}{|c|}{{\bf \ALUT[] (indexes 0 to 255)}} \\
 351 \hline
 352              0,&    2,&    5,&    8,&   11,&   15,&   20,&   24,\\
 353             29,&   35,&   41,&   47,&   53,&   60,&   67,&   74,\\
 354             82,&   89,&   97,&  106,&  114,&  123,&  132,&  141,\\
 355            150,&  160,&  170,&  180,&  190,&  201,&  211,&  222,\\
 356            233,&  244,&  256,&  267,&  279,&  291,&  303,&  315,\\
 357            327,&  340,&  353,&  366,&  379,&  392,&  405,&  419,\\
 358            433,&  447,&  461,&  475,&  489,&  504,&  518,&  533,\\
 359            548,&  563,&  578,&  593,&  609,&  624,&  640,&  656,\\
 360            672,&  688,&  705,&  721,&  738,&  754,&  771,&  788,\\
 361            805,&  822,&  840,&  857,&  875,&  892,&  910,&  928,\\
 362            946,&  964,&  983,& 1001,& 1020,& 1038,& 1057,& 1076,\\
 363           1095,& 1114,& 1133,& 1153,& 1172,& 1192,& 1211,& 1231,\\
 364           1251,& 1271,& 1291,& 1311,& 1332,& 1352,& 1373,& 1393,\\
 365           1414,& 1435,& 1456,& 1477,& 1498,& 1520,& 1541,& 1562,\\
 366           1584,& 1606,& 1628,& 1649,& 1671,& 1694,& 1716,& 1738,\\
 367           1760,& 1783,& 1806,& 1828,& 1851,& 1874,& 1897,& 1920,\\
 368           1935,& 1942,& 1949,& 1955,& 1961,& 1968,& 1974,& 1980,\\
 369           1985,& 1991,& 1996,& 2001,& 2006,& 2011,& 2016,& 2021,\\
 370           2025,& 2029,& 2033,& 2037,& 2040,& 2044,& 2047,& 2050,\\
 371           2053,& 2056,& 2058,& 2061,& 2063,& 2065,& 2066,& 2068,\\
 372           2069,& 2070,& 2071,& 2072,& 2072,& 2072,& 2072,& 2072,\\
 373           2072,& 2071,& 2070,& 2069,& 2068,& 2066,& 2065,& 2063,\\
 374           2060,& 2058,& 2055,& 2052,& 2049,& 2045,& 2042,& 2038,\\
 375           2033,& 2029,& 2024,& 2019,& 2013,& 2008,& 2002,& 1996,\\
 376           1989,& 1982,& 1975,& 1968,& 1960,& 1952,& 1943,& 1934,\\
 377           1925,& 1916,& 1906,& 1896,& 1885,& 1874,& 1863,& 1851,\\
 378           1839,& 1827,& 1814,& 1800,& 1786,& 1772,& 1757,& 1742,\\
 379           1727,& 1710,& 1694,& 1676,& 1659,& 1640,& 1622,& 1602,\\
 380           1582,& 1561,& 1540,& 1518,& 1495,& 1471,& 1447,& 1422,\\
 381           1396,& 1369,& 1341,& 1312,& 1282,& 1251,& 1219,& 1186,\\
 382           1151,& 1114,& 1077,& 1037,&  995,&  952,&  906,&  857,\\
 383            805,&  750,&  690,&  625,&  553,&  471,&  376,&  255\\
 384 \hline
 385 \end{tabular}
 386 \caption{Look-up table for context probability adaptation}
 387 \label{table:lut}
 388 \end{table}
 389
 390 \subsubsection{Efficient implementation (Informative)}
 391 \label{arithdecopts}
 392 The decoding operations defined in the preceding sections correspond closely
 393 to the descriptions of arithmetic coding principles contained in the academic
 394 literature. More efficient implementations are certainly possible, both for
 395 hardware and software. This section describes some simple techniques.
 396
 397 \paragraph{Change of variables}$\ $\newline
 398 There is in fact no need for the decoder to keep track of both $\ALow$ and
 399 $\ACode$, since the test is made against the difference of these values, i.e.
 400 be defined as:
 401 \[\ACodeMinusLow = \ACode-\ALow\]
 402
 403 So only this difference variable need be tracked. Since $\ALow$ is initialised
 404 to zero, $\ACodeMinusLow$ is initialised just like $\ACode$. The $read\_boola$
 405 then is re-written as:
 406
 407 \begin{pseudo}{read\_boola}{context\_label}
 408 \bsCODE{prob\_zero=\AContexts[context\_label]}
 409 \bsCODE{range\_times\_prob = (\ARange * prob\_zero)\gg 16}
 410 \bsIF{ \ACodeMinusLow >= range\_times\_prob }
 411   \bsCODE{value = \true}
 412   \bsCODE{\ACodeMinusLow -=  range\_times\_prob}
 413   \bsCODE{\ARange -= range\_time\_prob}
 414 \bsELSE
 415   \bsCODE{value = \false}
 416   \bsCODE{\ARange = range\_times\_prob}
 417 \bsEND
 418 \bsCODE{update\_context(\AContexts[context\_index],value)}{\ref{contextupdate}}
 419 \bsWHILE{\ARange<=\text{0x4000}}
 420     \bsCODE{renormalise()}{\ref{renormalisation}}
 421 \bsEND
 422 \bsRET{value}
 423 \end{pseudo}
 424
 425 The $renormalise()$ function is very greatly simplified, since all the masking
 426 and bit-twiddling is eliminated to leave:
 427
 428 \begin{pseudo}{renormalise}{}
 429 \bsCODE{\ACodeMinusLow  <<=  1}
 430 \bsCODE{\ARange  <<=  1}
 431 \bsCODE{\ACodeMinusLow+= read\_bitb()}
 432 \end{pseudo}
 433
 434 \paragraph{Bytewise operation}$\ $\newline
 435 Accessing data bit by bit is also inefficient, so it is useful to look ahead and
 436 read in bytes into $\ACodeMinusLow$ or $\ACode$ in advance. So, for example,
 437 $\ACodeMinusLow$ could be initialised to the first 4 bytes of the bitstream
 438 and $\ARange$ initialised to 0xFFFF0000, and all calulations shifted up by 16 bits.
 439 Then $read\_boola$ can be re-written as:
 440
 441 \begin{pseudo}{read\_boola}{context\_label}
 442 \bsCODE{prob\_zero=\AContexts[context\_label]}
 443 \bsCODE{range\_times\_prob = ((\ARange\gg16) * prob\_zero) \& \text{0xFFFF0000}}
 444 \bsIF{ \ACodeMinusLow >= range\_times\_prob }
 445   \bsCODE{value = \true}
 446   \bsCODE{\ACodeMinusLow -=  range\_times\_prob}
 447   \bsCODE{\ARange -= range\_time\_prob}
 448 \bsELSE
 449   \bsCODE{value = \false}
 450   \bsCODE{\ARange = range\_times\_prob}
 451 \bsEND
 452 \bsCODE{update\_context(\AContexts[context\_index],value)}
 453 \bsWHILE{\ARange<=\text{0x40000000}}
 454     \bsCODE{renormalise()}
 455 \bsEND
 456 \bsRET{value}
 457 \end{pseudo}
 458
 459 and the renormalisation loop uses a counter, starting at 16, to input bits in 2-byte
 460 chunks:
 461
 462 \begin{pseudo}{renormalise}{}
 463 \bsCODE{\ACodeMinusLow  <<=  1}
 464 \bsCODE{\ARange  <<=  1}
 465 \bsCODE{\ACounter -= 1}
 466 \bsIF{\ACounter == 0 }
 467   \bsCODE{\ACodeMinusLow+= read\_uint\_lit(2)}
 468   \bsCODE{\ACounter = 16}
 469 \bsEND
 470 \end{pseudo}
 471
 472 \paragraph{Look-up table}$\ $\newline
 473 In software it makes sense to use a modified probability LUT containing 512 elements, in
 474 which each even element is the negative increment to $prob\_zero$ if 0 is coded, and each
 475 odd element is the positive increment to $prob\_zero$ is 1 is coded. This means that access
 476 to the LUT will always be in a very local area whatever value is coded, whereas the basic
 477 structure will require either position $p$ or $255-p$ to be accessed depending on the value.
 478
 479 \subsection{Arithmetic encoding (Informative)}
 480
 481 This document only normatively defines the decoding of arithmetic coded data.
 482 However whilst it is clearly vital that an encoding process matches the decoding
 483 process, it is not entirely straightforward to derive an implementation of the
 484 encoder by only looking only at the decoder specification. Therefore this
 485 informative section describes a possible implementation for an
 486 arithmetic encoder that will produce output that is decodeable by
 487 the Dirac arithmetic decoder. This section is best read in conjunction with
 488 Annex \ref{arithengine}.
 489
 490 \subsubsection{Encoder variables}
 491
 492 An arithmetic encoder requires the following unsigned integer variables, or
 493 some mathematically equivalent set:
 494 \begin{itemize}
 495 \item $low$, a value indicating the bottom of the encoding interval,
 496 \item $range$, a value indicating the width of the encoding interval,
 497 \item $carry$, a value tracking the number of unresolved ``straddle" conditions
 498 (described below), and
 499 \item a set of 16-bit probability context probabilities, as described in Annex~\ref{arithdecoding}.
 500 \end{itemize}
 501
 502 The process for updating context probabilities, used for coding values,
 503 is described in Annex \ref{contextupdate}
 504
 505 A Dirac binary arithmetic encoder implementation codes a set of data in three stages:
 506 \begin{enumerate}
 507 \item initialisation,
 508 \item processing of all values, and
 509 \item flushing.
 510 \end{enumerate}
 511
 512 \subsubsection{Initialisation}
 513
 514 Initialisation of the arithmetic encoder is very simple -- the internal variables are
 515 set as:
 516 \begin{eqnarray*}
 517 low&=&\text{0x0} \\
 518 range&=&\text{0xFFFF} \\
 519 carry&=&0
 520 \end{eqnarray*}
 521
 522 With 16 bit accuracy, 0xFFFF corresponds to an interval width value of (almost) 1. All
 523 context probabilities are initialised to probability $1/2$ (0x8000).
 524
 525 \subsubsection{Encoding binary values}
 526 \label{arithwritebool}
 527 The encoding process for a binary value must precisely mirror
 528 that for the decoding process (Annex \ref{arithreadbool}). In
 529 particular the interval variables $low$ and $range$ must be
 530 updated in the same way.
 531
 532 Coding a boolean value consists of three sub-stages (in order):
 533 \begin{enumerate}
 534 \item scaling the interval $[low,low+range)$,
 535 \item updating contexts, and
 536 \item renormalising and outputting data.
 537 \end{enumerate}
 538
 539 \paragraph{Scaling the interval}$\ $\newline
 540 The integer interval $[low,low+range)$ represents the real interval
 541 \[[l,h)=[low/2^{16},(low+range)/2^{16})\]
 542  In a given context with label $label$, the probability of zero can be extracted as
 543 \[prob\_zero=\AContexts[label]\]
 544
 545 If $0$ is to be encoded, the real interval $[l,h)$ should be rescaled so that $l$ is unchanged and the width $r=h-l=range/2^{16}$ is scaled to $r*p_0$ where $p_0=prob\_zero/2^{16}$.
 546
 547 This operation is approximated by setting
 548 \[range=(range*prob\_zero)\gg 16\]
 549
 550 If 1 is to be encoded, $[l,h)$ should be rescaled so that $h$ is
 551 unchanged and $r$ is scaled to $(1-p0)*r$. This operation is
 552 approximated by setting
 553 \begin{eqnarray*}
 554 low &+= & (range*prob\_zero)\gg 16 \\
 555 range & -= & (range*prob\_zero)\gg 16
 556 \end{eqnarray*}
 557
 558 \paragraph{Updating contexts}$\ $\newline
 559 Contexts are updated in exactly the same way as the
 560 decoder (Annex \ref{contextupdate}).
 561
 562 \paragraph{Renormalisation and output}$\ $\newline
 563 Renormalisation must cause $low$ and $range$ to be modified exactly
 564 as in the decoder (Annex \ref{renormalisation}). In addition,
 565 during renormalisation bits are output when $low$ and $low+range$
 566 agree in their msbs, taking into account carries accumulated when a
 567 straddle condition is accumulated.
 568
 569 In pseudocode, this is as follows:
 570
 571 \begin{pseudo*}
 572 \bsWHILE{range<=\text{0x4000}}
 573     \bsIF{((low+range-1)\wedge low)>=\text{0x8000}}
 574         \bsCODE{low \wedge= \text{0x4000}}
 575         \bsCODE{carry+=1}
 576     \bsELSE
 577         \bsCODE{write\_bit( low\&\text{0x8000} )}
 578         \bsWHILE{ carry>0}
 579             \bsCODE{write\_bit( !low\&\text{0x8000} )}
 580             \bsCODE{carry -= 1}
 581         \bsEND
 582     \bsEND
 583     \bsCODE{low  <<=  1}
 584     \bsCODE{range  <<=  1}
 585     \bsCODE{low \&= \text{0xFFFF}}
 586 \bsEND
 587 \end{pseudo*}
 588
 589 \paragraph{Flushing the encoder}$\ $\newline
 590 After encoding, there may still be insufficient bits for a decoder
 591 to determine the final few encoded symbols, partly because further
 592 renormalisation is required -- for example, msbs may agree but the range
 593 may still be larger than 0x4000) -- and partly because there may be
 594 unresolved carries.
 595
 596 A four-stage process will adequately flush the encoder:
 597 \begin{enumerate}
 598 \item output remaining resolved msbs,
 599 \item resolve remaining straddle conditions,
 600 \item flush carry bits, and
 601 \item byte align the output with padding bits.
 602 \end{enumerate}
 603
 604 The remaining msbs are output as follows:
 605
 606 \begin{pseudo*}
 607 \bsWHILE{(low+range-1)\wedge low<\text{0x8000}}
 608     \bsCODE{write\_bit( low\&\text{0x8000} !=\text{0x0} )}
 609     \bsWHILE{ carry>0}
 610         \bsCODE{write\_bit( (low\&\text{0x8000})==\text{0x0} )}
 611         \bsCODE{carry -= 1}
 612     \bsEND
 613     \bsCODE{low  <<=  1}
 614     \bsCODE{low \&= \text{0xFFFF}}
 615     \bsCODE{range  <<=  1}
 616 \bsEND
 617 \end{pseudo*}
 618
 619 Remaining straddles can then be resolved by:
 620
 621 \begin{pseudo*}
 622 \bsWHILE{ (low \& \text{0x4000}) \text{ and } (((low+range-1) \& \text{0x4000})!=\text{0x0})}
 623     \bsCODE{carry+=1}
 624     \bsCODE{low \wedge= \text{0x4000}}
 625     \bsCODE{low  <<=  1}
 626     \bsCODE{range  <<=  1}
 627     \bsCODE{low \&= \text{0xFFFF}}
 628 \bsEND
 629 \end{pseudo*}
 630
 631 Carry bits can be discharged by picking a resolution of
 632 the final straddles:
 633
 634 \begin{pseudo*}
 635 \bsCODE{write\_bit(low \& \text{0x4000}!=\text{0x0})}
 636 \bsFOR{c=0}{carry}
 637     \bsCODE{write\_bit((low \& \text{0x4000})==\text{0x0})}
 638 \bsEND
 639 \end{pseudo*}
 640
 641 Finally, 0-7 padding bits are added to the encoded output to make
 642 a whole number of bytes. These are not necessary for decoding, but
 643 for stream compliance.
 644
 645 \subsection{Efficient implementation}
 646
 647 Some similar techniques to those described in Section~\ref{arithdecopts} can be
 648 used in the encoder to speed up operation.
 649
 650 \paragraph{Bytewise operation}$\ $\newline
 651 It is not necessary to output bits one by one. Instead, $low$ may be allowed to
 652 accumulate bits at the lower end and output them when a byte has accumulated. If the
 653 last bit determined was a 1, this 1 must be carried to the previous byte, so
 654 renormalisation becomes:
 655
 656 \begin{pseudo*}
 657 \bsWHILE{range<=\text{0x4000}}
 658     \bsCODE{low  \ll=  1}
 659     \bsCODE{range  \ll=  1}
 660     \bsCODE{counter -= 1}
 661     \bsIF{counter==0}
 662       \bsIF{low<=1\ll 24 \text{ and } low+range> 1\ll24}
 663         \bsCODE{carry+=1}
 664       \bsELSE
 665         \bsIF{low<1\ll 24}
 666             \bsWHILE{carry}
 667               \bsCODE{data[pos]=\text{0xFF}}
 668               \bsCODE{carry -= 1}
 669               \bsCODE{pos += 1}
 670             \bsEND
 671         \bsELSE
 672             \bsCODE{data[pos-1] += 1}
 673             \bsWHILE{carry}
 674               \bsCODE{data[pos]=\text{0x00}}
 675               \bsCODE{carry -= 1}
 676               \bsCODE{pos += 1}
 677             \bsEND
 678         \bsEND
 679         \bsCODE{data[pos](low\gg16}
 680         \bsCODE{pos += 1}
 681         \bsCODE{low \&= \text{0xFFFF}}
 682       \bsEND
 683       \bsCODE{counter=8}
 684     \bsEND
 685 \bsEND
 686 \end{pseudo*}
 687
 688 \paragraph{Overlap and add}$\ $\newline
 689 Renormalisation can be simplified still further by observing that carries occur if and only
 690 if the top byte of $low$ becomes 0xFF. In this case a carried 1 would propagate up multiple
 691 bytes, turning 0xFFs into 0x00s. So it is possible to store the top two bytes of $low$ (i.e.
 692 bit 24 containing the carry bit and the next byte) and do an overlap and add at the end to correctly
 693 propagate values back to the beginning. I.e. renormalisation becomes:
 694
 695 \begin{pseudo*}
 696 \bsWHILE{range<=\text{0x4000}}
 697     \bsCODE{low  \ll=  1}
 698     \bsCODE{range  \ll=  1}
 699     \bsCODE{counter -= 1}
 700     \bsIF{counter==0}
 701       \bsCODE{low\_list[pos] = low\gg16}
 702       \bsCODE{pos+=1}
 703       \bsCODE{counter=8}
 704       \bsCODE{low \&= \text{0xFFFF}}
 705     \bsEND
 706 \bsEND
 707 \end{pseudo*}
 708
 709 At the end of coding all values, a flush function will complete $low\_list$ for remaining values
 710 and perform the overlap and add:
 711
 712 \begin{pseudo*}
 713 \bsCODE{low += 1}
 714 \bsCODE{low \ll= counter}
 715 \bsCODE{low\_list[pos] = (low \gg 16)\&\text{0xFFFF}}
 716 \bsCODE{pos += 1}
 717 \bsCODE{low\_list[pos] = (low \gg 8)\&\text{0xFFFF}}
 718 \bsCODE{pos += 1}
 719 \bsCODE{data[pos-1] = low\_list[pos-1] \& \text{0xFF}}
 720 \bsCODE{data[pos-2] = low\_list[pos-2] \& \text{0xFF}}
 721 \bsFOR{i=0}{pos-3}
 722   \bsCODE{low\_list[pos-3-i] += low\_list[pos-2-i]\gg8}
 723   \bsCODE{data[pos-3-i] = low\_list[pos-3-i]\&\text{0xFF}}
 724 \bsEND
 725 \end{pseudo*}
 726