Include a pre-rendered pdf version
[dirac-spec-errata.git] / arith.tex
blob261994c64edf3dea998dc26218b5a36d99ce463d
1 \label{arithcoding}
3 This annex has three parts:
4 \begin{enumerate}
5 \item a description of the principles of arithmetic
6 coding,
7 \item a specification of the arithmetic decoding
8 engine used in Dirac, and
9 \item a description of a compatible arithmetic encoder.
10 \end{enumerate}
12 \subsection{Arithmetic coding principles (Informative)}
14 This section provides an introduction to the principles underlying arithmetic
15 coding. It briefly describes binary arithmetic coding, that is the coding of binary symbols,
16 which is used in Dirac.
18 Arithmetic coding is an extremely powerful form of entropy coding, which can closely
19 approximate the Shannon information limit for given data. Arithmetic encoding consists
20 of a state machine that is fed with a sequence of symbols together with an estimate
21 of each symbol’s probability. For each input symbol the arithmetic coding engine
22 updates its state and output a number of coded bits. The number of output bits
23 for each input symbol depends on the internal state and on the current probabilities
24 the symbols that are coded, and can range from zero to many bits.
26 The variable number of coded bits output for each input symbol complicates
27 the implementation but is essential to the optimal nature of arithmetic coding. Consider
28 a binary symbol $b$, with $p(b=\false)=p_0$ and $p(b=\true)=1-p_0$. The entropy of $b$
29 is the expected number of bits required to encode $b$, and is equal to
30 \[e(p_0)=p_0\log_2(1/p_0)+(1-p_0)\log_2(1/(1-p_0))\]
32 If $e(p_0)$ is plotted against $p_0$, it can be seen that if $p_0$ is not equal
33 to 0.5 exactly, $e(p_0)<1$. This means that an optimal binary entropy encoder
34 that operates symbol by symbol, cannot produce an output for every symbol.
36 \subsubsection{Interval division and scaling}
37 The fundamental idea of arithmetic coding is interval division and scaling. An
38 arithmetic code can be thought of as a single number lying in an interval
39 determined by the sequence of values being coded. For simplicity, this discussion
40 describes binary arithmetic coding, but larger symbol alphabets can be treated
41 in an analogous manner.
43 Let us begin with the interval $[0,1)$, and suppose that we know (or have some
44 estimate of) the probability of $\false$, $p_0$. Conceptually we divide the interval
45 into the intervals $[0,p0)$ and $[p_0,1)$. Suppose we code $\false$ as the
46 first symbol. In this case the interval is changed to $[0,p_0)$. If we code $\true$,
47 then the interval is changed to $[p_0,1)$. After coding a number of symbols we
48 arrive at an interval $[low,high)$. To code the next symbol we partition this interval
49 into $[low,low+p_0(high-low))$ and $[low+p_0(high-low),high)$, and if the symbol is
50 $\false$ we choose the first interval, otherwise the second.
52 For any integer $N$, this process clearly partitions the interval $[0,1)$ into
53 a set of disjoint intervals that correspond to all the input sequences of $N$ bits.
54 Identifying such a bit sequence is equivalent to choosing a value in the
55 corresponding interval, and for an interval width $w$ that in general requires
57 \[\left\lceil\log_2(1/w)\right\rceil\]
58 bits. With static probabilities, on average,
59 \[w=p_0^{Np_0}(1-p_0)^{N(1-p_0)}\]
60 resulting in
61 \[\left\lceil Ne(p_0)\right\rceil\]
62 being used, demonstrating the near-optimality of arithmetic coding.
63 Moreover, it is clearly possible to create an adaptive arithmetic code by
64 changing the estimate of $p_0$ based on previously coded data.
66 \subsubsection{Finite precision arithmetic}
67 As it stands, the procedure outlined in the previous section has a number of
68 drawbacks for practical application. Firstly, it requires unlimited precision
69 to scale the interval, which is not available in real hardware or software.
70 Secondly, it only produces an output when all values have been coded. These
71 problems are addressed by renormalisation and progressive output: periodically rescaling the
72 interval, and outputting the most significant bits of $low$ and $high$ whenever they agree.
74 For example, if we know that
75 \begin{eqnarray*}
76 low &=& b0xyz...\\
77 high & = &b0pqr...
78 \end{eqnarray*}
79 then we can output $0$, since this must prefix any value lying in the interval,
80 and shift $low$ and $high$ to get $low=bxyz...$ and $high=bpqr...$.
81 This has the effect of doubling the interval from 0 ($x\mapsto 2x$). Likewise if
82 \begin{eqnarray*}
83 low &=& b1xyz...\\
84 high & = &b1pqr...
85 \end{eqnarray*}
86 we can output $1$ and shift to get
87 $low=bxyz...$ and $high=bpqr...$ again: this is equivalent to doubling the interval
88 from 1 ($x\mapsto 2x-1$).
90 One problem remains: suppose the interval resolutely sits on the fence, straddling
91 $\dfrac{1}{2}$ whilst getting smaller and smaller, with the most
92 significant bits of low and high staying as 0 and 1 respectively.
93 In this case, when the straddle is finally resolved, $low$ and $high$ will
94 both be of the form $b10000...xyz$ or $b01111...pqr$.
96 The resolution strategy is to again rescale $low$ and $high$, but this time
97 double from $\dfrac{1}{2}$ (i.e. $x\mapsto 2x-\dfrac{1}{2}$), and keep a count of the number $k$
98 of times this is done, as this is the number of carry bits that are
99 required. When the straddle is resolved as 1, then 1 followed by $k$ zero bits is
100 output, otherwise 0 followed by k 1s is output. This ensures that the
101 output exactly represents the small straddling interval.
103 A decoder can determine a symbol as soon as it has sufficient bits to distinguish
104 whether a value lies in one interval or another. If constraints are placed on the
105 size of the smallest interval before
106 renormalisation (for example, by renormalising often enough and by having a fixed
107 smallest allowable probability), then this can be accomplished within a fixed word width.
109 \subsubsection{Symbol probability estimation}
110 \begin{comment}
111 The effectiveness of arithmetic coding is critically dependent on the accurate estimation of symbol probabilities. This
112 might be achieved by counting the number of False symbols, n0, and True symbols, n1, that have previously been
113 coded (or decoded), and estimating the probability of zero as:
114 p0 = n0/(n0+n1)
115 This requires maintaining the values of the two counts, n0 and n1, and performing some arithmetic (including division)
116 every time a symbol is coded or decoded, in order to estimate the probability. Usually the values of n0 and n1 are both
117 initialized to one for the first symbol, giving an initial probability estimate of 0.5.
118 With this method of probability estimation if False is coded the probability becomes
119 p0=n0+1/(n0+n1+1),
120 and if True is coded the probability becomes
121 p0=n0/(n0+n1+1).
122 In other words:
123 ? If False is coded the probability update is: p0 += (1-p0)/(n0+n1+1)
124 ? else, if True is coded, the probability update is: p0 -= p0/(n0+n1+1)
125 In VC-2 the probability estimate is simplified by assuming an estimate for the total number of symbols coded, t, as a
126 function of probability itself. That is:
127 n0+n1 ˜ g(p0)
128 With this simplifying assumption the probability update process becomes:
129 ? If False is coded: p0 += (1-p0)/( g(p0)+1)
130 ? else, if True is coded: p0 -= p0/( g(p0)+1)
131 Many choices would be possible for the definition of ?(p0). For VC-2 g(p0) is given by:
132 g(p0) ˜ 2v
133 where v=8-4.log2(128.(1-2.abs(p0-0.5)))/7
134 With this choice of g(p0), (n0+n1) is assumed to be 16 when p0=0.5 and 256 when p0=1/256 or 255/256.
135 Assuming the total number of symbols coded is a function of probability significantly simplifies the implementation.
136 The probability update can be achieved with a single arithmetic operation (addition or subtraction) and a pair of
137 lookup tables. That is:
138 ? If False is coded: p0 += LUT_False[p0]
139 ? else, if True is coded: p0 -= LUT_True[p0]
140 In the VC-2 arithmetic codec probability is represented as a 16 bit unsigned integer (that is the range [0,65536) SMPTE xxxx:2007 20/12/2007
142 Page 119 of 150 pages
143 represents the range of probabilities [0,1) ). To reduce the size of the lookup table only the 8 most significant bits of p0
144 are used as the index to the lookup table. The definition of LUT_True is:
145 LUT_True(prob):
146 lut_input &= 0xFF00
147 if (lut_input >= 0x8000) p0 = lut_input/65536.0
148 else p0 = (lut_input+256)/65536.0
149 v = 8.0-4.0*log2(128.0*(1.0-2.0*abs(p0-0.5)))/7.0
150 t = 2**v
151 return round(lut_input+128)/t+1)
152 Here prob is the 16bit representation of probability used in the VC-2 codec, the return value is in the same format.
153 The 8 LSBs of prob are ignored allowing the use of a 256 entry lookup table. p0 is calculated using a dead-zone
154 quantiser. The lookup table for False symbols is calculated similarly. It turns out that LUT_False is the mirror image
155 of LUT_True so the same table can be used for both.
156 Note: this pseudocode does not conform to the definitions in section 4 because it contains floating point numbers,
157 floating point division and the round function.
158 \end{comment}
160 \subsection{Arithmetic decoding engine}
161 \label{arithengine}
163 This section is a normative specification of the operation of the arithmetic
164 decoding engine and the processes for using it to extract binary values from coded streams.
166 The arithmetic decoding engine shall consist of two elements:
167 \begin{enumerate}
168 \item a collection of state variables representing the state of the arithmetic
169 decoder (Section~\ref{initarith})
170 \item a function for extracting binary values from the decoder
171 and updating the decoder state (Section~\ref{arithreadbool})
172 \end{enumerate}
174 \subsubsection{State and contexts}
175 \label{stateandcontexts}
177 The arithmetic decoder state shall consist of the following decoder state variables:
179 \begin{itemize}
180 \item $\ALow$, an integer representing the beginning of the current coding interval.
181 \item $\ARange$, an integer representing the size of the current coding interval.
182 \item $\ACode$, an integer within the interval from $\ALow$ to $\ALow+\ARange-1$, determined from the encoded bitstream.
183 \item $\ABitsLeft$, a decrementing count of the number of bits yet to be read in
184 \item $\AContexts$, a map of all the contexts used in the Dirac decoder.
185 \end{itemize}
187 A context $context$ shal be a 16 bit unsigned interger value which encapsulates
188 the probability of zero symbol in the stream, represented as such that
189 \[0<context[prob0]\leq\text{0xFFFF}\]
191 Contexts shall be accessed by decoding functions via a context label passed to the function.
193 \subsubsection{Initialisation}
194 \label{initarith}
196 At the beginning of the decoding of any data unit, the arithmetic
197 decoding state shall be initialised as follows:
199 \begin{pseudo}{initialise\_arithmetic\_decoding}{ctx\_labels}
200 \bsCODE{\ALow = \text{0x0}}
201 \bsCODE{\ARange =\text{0xFFFF}}
202 \bsCODE{\ACode =\text{ 0x0}}
203 \bsFOR{i=0}{15}
204 \bsCODE{\ACode <<= 1}
205 \bsCODE{\ACode+= read\_bitb()}
206 \bsEND
207 \bsCODE{init\_context\_probs(ctx\_labels)}
208 \end{pseudo}
210 The $init\_context\_probs()$ process shall be defined as follows:
212 \begin{pseudo}{init\_context\_probs}{ctx\_labels}
213 \bsFOR{i=0}{\length(ctx\_labels)-1}
214 \bsCODE{\AContexts[ctx\_labels[i]]=\text{0x8000}}
215 \bsEND
216 \end{pseudo}
218 \begin{informative}
219 The value 0x8000 represents 1/2 or equal likelihood for binary values.
220 \end{informative}
222 \subsubsection{Data input}
223 \label{inputarith}
225 The arithmetic decoding process shall access data in a contiguous block of bytes
226 whose size is equal to $\ABitsLeft$, this value having been set prior to decoding
227 the block. The bits in this block shall be sufficient to allow for the
228 decoding of all coefficients. However, the specification of arithmetic
229 decoding operations in this section may occasionally cause further bits to be read,
230 even though they are not required for determining decoded values. For this
231 reason the bounded-block read function $read\_bitb()$ (Section~\ref{blockreadbit})
232 shall be used for data access.
234 Since the length of arithmetically coded data elements is given in bytes within the Dirac
235 stream, there may be bits left unread when all values have been extracted. These
236 shall be flushed as described in Section~\ref{blockreadbit}. Since arithmetically coded
237 blocks are byte-aligned and a whole number of bytes, this aligns data input with the
238 beginning of the byte after the arithmetically coded data i.e.\ at the end of the
239 data chunk. $flush\_inputb()$ shall always be called at the end of decoding an
240 arithmetically coded data element.
242 \subsubsection{Decoding boolean values}
243 \label{arithreadbool}
245 The arithmetic decoding engine is a multi-context, adaptive binary
246 arithmetic decoder, performing binary renormalisation and producing
247 binary outputs. For each bit decoded, the semantics of the relevant
248 calling decoder function shall determine which contexts are passed to the
249 arithmetic decoding operations.
251 This section defines the operation of the $read\_boola()$ function
252 for extracting a boolean value from the Dirac stream. The function
253 shall be defined as follows:
255 \begin{pseudo}{read\_boola}{context\_label}
256 \bsCODE{prob\_zero=\AContexts[context\_label]}
257 \bsCODE{count = \ACode-\ALow}
258 \bsCODE{range\_times\_prob = (\ARange * prob\_zero)\gg 16}
259 \bsIF{ count >= range\_times\_prob }
260 \bsCODE{value = \true}
261 \bsCODE{\ALow += range\_times\_prob}
262 \bsCODE{\ARange -= range\_time\_prob}
263 \bsELSE
264 \bsCODE{value = \false}
265 \bsCODE{\ARange = range\_times\_prob}
266 \bsEND
267 \bsCODE{update\_context(\AContexts[context\_label],value)}{\ref{contextupdate}}
268 \bsWHILE{\ARange<=\text{0x4000}}
269 \bsCODE{renormalise()}{\ref{renormalisation}}
270 \bsEND
271 \bsRET{value}
272 \end{pseudo}
274 \begin{informative}
275 The function scales the probability of the symbol 0 or $\false$ from the decoding context
276 so that if this probability were $1$, then the interval would equal that between
277 $\ALow$ and
278 \[high=\ALow+\ARange-1\]
279 and $count$ is set to the normalised cut-off between 0/$\false$ and 1/$\true$
280 within this range.
281 \end{informative}
283 \subsubsection{Renormalisation}
284 \label{renormalisation}
286 Renormalisation shall be applied to stop the arithmetic decoding
287 engine from losing accuracy. Renormalisation shall be
288 applied while the range is less than or equal to a quarter of
289 the total available 16-bit range ($\text{0x4000}$).
291 Renormalisation shall double the interval and read a bit into the codeword.
292 The $renormalise()$ function shall be defined as follows:
294 \begin{pseudo}{renormalise}{}
295 \bsIF{((\ALow+\ARange-1)\wedge\ALow) >= \text{0x8000}}
296 \bsCODE{\ACode \wedge= \text{0x4000}}
297 \bsCODE{\ALow \wedge= \text{0x4000}}
298 \bsEND
299 \bsCODE{\ALow <<= 1}
300 \bsCODE{\ARange <<= 1}
301 \bsCODE{\ALow \&= \text{0xFFFF}}
302 \bsCODE{\ACode <<= 1}
303 \bsCODE{\ACode+= read\_bitb()}
304 \bsCODE{\ACode \&= \text{0xFFFF}}
305 \end{pseudo}
307 \begin{informative}
308 For convenience let $low=\ALow$ and $high=\ALow+\ARange-1$
309 represent the upper and lower bounds of the interval. If the
310 range is $<=\text{0x4000}$ then
311 one of three possibilities must obtain:
312 \begin{enumerate}
313 \item the msbs of $low$ and $high$ are both 0
314 \item the msbs of $low$ and $high$ are both 1
315 \item $low=b01...$, $high=b10....$, and the interval straddles the half-way point 0x8000.
316 \end{enumerate}
318 The renormalisation
319 process has the effect that: in case 1, the interval $[low,high]$ is doubled from 0 (i.e. $x\mapsto 2*x$); in case 2 it is doubled from 1 (i.e. $x\mapsto 2*x-1$); and in case 3 it is doubled from 1/2 (i.e. $x\mapsto 2x-0.5$).
320 \end{informative}
322 \subsubsection{Updating contexts}
323 \label{contextupdate}
325 Context probabilities shall be updated according to a probability look-up table
326 $\ALUT$, which supplies a value for decrementing
327 or incrementing the probability of zero based on the first
328 8 bits of its current value, according to Table \ref{table:lut}.
330 The $update\_context()$ process shall be defined as follows:
332 \begin{pseudo}{update\_context}{ctx\_prob,value}
333 \bsIF{value==\true}
334 \bsCODE{ctx\_prob -= \ALUT[ctx\_prob \gg 8]}{Table \ref{table:lut}}
335 \bsELSE
336 \bsCODE{ctx\_prob += \ALUT[255-(ctx\_prob \gg 8)]}{Table \ref{table:lut}}
337 \bsEND
338 \end{pseudo}
340 The lookup table used for updating context probabilities shall be as defined in
341 Table \ref{table:lut}. below. The lookup table entries are arranged in raster scan order with
342 rows of thirteen entries. The entry corresponding to index zero is in the
343 top left hand corner, the index increments by one from left to right and by
344 thirteen from top to bottom, the entry corresponding to index 255 is on the right hand
345 side of the last row.
346 \begin{table}[!ht]
347 \centering
348 \begin{tabular}{|cccccccc|}
349 \hline
350 \multicolumn{8}{|c|}{{\bf \ALUT[] (indexes 0 to 255)}} \\
351 \hline
352 0,& 2,& 5,& 8,& 11,& 15,& 20,& 24,\\
353 29,& 35,& 41,& 47,& 53,& 60,& 67,& 74,\\
354 82,& 89,& 97,& 106,& 114,& 123,& 132,& 141,\\
355 150,& 160,& 170,& 180,& 190,& 201,& 211,& 222,\\
356 233,& 244,& 256,& 267,& 279,& 291,& 303,& 315,\\
357 327,& 340,& 353,& 366,& 379,& 392,& 405,& 419,\\
358 433,& 447,& 461,& 475,& 489,& 504,& 518,& 533,\\
359 548,& 563,& 578,& 593,& 609,& 624,& 640,& 656,\\
360 672,& 688,& 705,& 721,& 738,& 754,& 771,& 788,\\
361 805,& 822,& 840,& 857,& 875,& 892,& 910,& 928,\\
362 946,& 964,& 983,& 1001,& 1020,& 1038,& 1057,& 1076,\\
363 1095,& 1114,& 1133,& 1153,& 1172,& 1192,& 1211,& 1231,\\
364 1251,& 1271,& 1291,& 1311,& 1332,& 1352,& 1373,& 1393,\\
365 1414,& 1435,& 1456,& 1477,& 1498,& 1520,& 1541,& 1562,\\
366 1584,& 1606,& 1628,& 1649,& 1671,& 1694,& 1716,& 1738,\\
367 1760,& 1783,& 1806,& 1828,& 1851,& 1874,& 1897,& 1920,\\
368 1935,& 1942,& 1949,& 1955,& 1961,& 1968,& 1974,& 1980,\\
369 1985,& 1991,& 1996,& 2001,& 2006,& 2011,& 2016,& 2021,\\
370 2025,& 2029,& 2033,& 2037,& 2040,& 2044,& 2047,& 2050,\\
371 2053,& 2056,& 2058,& 2061,& 2063,& 2065,& 2066,& 2068,\\
372 2069,& 2070,& 2071,& 2072,& 2072,& 2072,& 2072,& 2072,\\
373 2072,& 2071,& 2070,& 2069,& 2068,& 2066,& 2065,& 2063,\\
374 2060,& 2058,& 2055,& 2052,& 2049,& 2045,& 2042,& 2038,\\
375 2033,& 2029,& 2024,& 2019,& 2013,& 2008,& 2002,& 1996,\\
376 1989,& 1982,& 1975,& 1968,& 1960,& 1952,& 1943,& 1934,\\
377 1925,& 1916,& 1906,& 1896,& 1885,& 1874,& 1863,& 1851,\\
378 1839,& 1827,& 1814,& 1800,& 1786,& 1772,& 1757,& 1742,\\
379 1727,& 1710,& 1694,& 1676,& 1659,& 1640,& 1622,& 1602,\\
380 1582,& 1561,& 1540,& 1518,& 1495,& 1471,& 1447,& 1422,\\
381 1396,& 1369,& 1341,& 1312,& 1282,& 1251,& 1219,& 1186,\\
382 1151,& 1114,& 1077,& 1037,& 995,& 952,& 906,& 857,\\
383 805,& 750,& 690,& 625,& 553,& 471,& 376,& 255\\
384 \hline
385 \end{tabular}
386 \caption{Look-up table for context probability adaptation}
387 \label{table:lut}
388 \end{table}
390 \subsubsection{Efficient implementation (Informative)}
391 \label{arithdecopts}
392 The decoding operations defined in the preceding sections correspond closely
393 to the descriptions of arithmetic coding principles contained in the academic
394 literature. More efficient implementations are certainly possible, both for
395 hardware and software. This section describes some simple techniques.
397 \paragraph{Change of variables}$\ $\newline
398 There is in fact no need for the decoder to keep track of both $\ALow$ and
399 $\ACode$, since the test is made against the difference of these values, i.e.
400 be defined as:
401 \[\ACodeMinusLow = \ACode-\ALow\]
403 So only this difference variable need be tracked. Since $\ALow$ is initialised
404 to zero, $\ACodeMinusLow$ is initialised just like $\ACode$. The $read\_boola$
405 then is re-written as:
407 \begin{pseudo}{read\_boola}{context\_label}
408 \bsCODE{prob\_zero=\AContexts[context\_label]}
409 \bsCODE{range\_times\_prob = (\ARange * prob\_zero)\gg 16}
410 \bsIF{ \ACodeMinusLow >= range\_times\_prob }
411 \bsCODE{value = \true}
412 \bsCODE{\ACodeMinusLow -= range\_times\_prob}
413 \bsCODE{\ARange -= range\_time\_prob}
414 \bsELSE
415 \bsCODE{value = \false}
416 \bsCODE{\ARange = range\_times\_prob}
417 \bsEND
418 \bsCODE{update\_context(\AContexts[context\_index],value)}{\ref{contextupdate}}
419 \bsWHILE{\ARange<=\text{0x4000}}
420 \bsCODE{renormalise()}{\ref{renormalisation}}
421 \bsEND
422 \bsRET{value}
423 \end{pseudo}
425 The $renormalise()$ function is very greatly simplified, since all the masking
426 and bit-twiddling is eliminated to leave:
428 \begin{pseudo}{renormalise}{}
429 \bsCODE{\ACodeMinusLow <<= 1}
430 \bsCODE{\ARange <<= 1}
431 \bsCODE{\ACodeMinusLow+= read\_bitb()}
432 \end{pseudo}
434 \paragraph{Bytewise operation}$\ $\newline
435 Accessing data bit by bit is also inefficient, so it is useful to look ahead and
436 read in bytes into $\ACodeMinusLow$ or $\ACode$ in advance. So, for example,
437 $\ACodeMinusLow$ could be initialised to the first 4 bytes of the bitstream
438 and $\ARange$ initialised to 0xFFFF0000, and all calulations shifted up by 16 bits.
439 Then $read\_boola$ can be re-written as:
441 \begin{pseudo}{read\_boola}{context\_label}
442 \bsCODE{prob\_zero=\AContexts[context\_label]}
443 \bsCODE{range\_times\_prob = ((\ARange\gg16) * prob\_zero) \& \text{0xFFFF0000}}
444 \bsIF{ \ACodeMinusLow >= range\_times\_prob }
445 \bsCODE{value = \true}
446 \bsCODE{\ACodeMinusLow -= range\_times\_prob}
447 \bsCODE{\ARange -= range\_time\_prob}
448 \bsELSE
449 \bsCODE{value = \false}
450 \bsCODE{\ARange = range\_times\_prob}
451 \bsEND
452 \bsCODE{update\_context(\AContexts[context\_index],value)}
453 \bsWHILE{\ARange<=\text{0x40000000}}
454 \bsCODE{renormalise()}
455 \bsEND
456 \bsRET{value}
457 \end{pseudo}
459 and the renormalisation loop uses a counter, starting at 16, to input bits in 2-byte
460 chunks:
462 \begin{pseudo}{renormalise}{}
463 \bsCODE{\ACodeMinusLow <<= 1}
464 \bsCODE{\ARange <<= 1}
465 \bsCODE{\ACounter -= 1}
466 \bsIF{\ACounter == 0 }
467 \bsCODE{\ACodeMinusLow+= read\_uint\_lit(2)}
468 \bsCODE{\ACounter = 16}
469 \bsEND
470 \end{pseudo}
472 \paragraph{Look-up table}$\ $\newline
473 In software it makes sense to use a modified probability LUT containing 512 elements, in
474 which each even element is the negative increment to $prob\_zero$ if 0 is coded, and each
475 odd element is the positive increment to $prob\_zero$ is 1 is coded. This means that access
476 to the LUT will always be in a very local area whatever value is coded, whereas the basic
477 structure will require either position $p$ or $255-p$ to be accessed depending on the value.
479 \subsection{Arithmetic encoding (Informative)}
481 This document only normatively defines the decoding of arithmetic coded data.
482 However whilst it is clearly vital that an encoding process matches the decoding
483 process, it is not entirely straightforward to derive an implementation of the
484 encoder by only looking only at the decoder specification. Therefore this
485 informative section describes a possible implementation for an
486 arithmetic encoder that will produce output that is decodeable by
487 the Dirac arithmetic decoder. This section is best read in conjunction with
488 Annex \ref{arithengine}.
490 \subsubsection{Encoder variables}
492 An arithmetic encoder requires the following unsigned integer variables, or
493 some mathematically equivalent set:
494 \begin{itemize}
495 \item $low$, a value indicating the bottom of the encoding interval,
496 \item $range$, a value indicating the width of the encoding interval,
497 \item $carry$, a value tracking the number of unresolved ``straddle" conditions
498 (described below), and
499 \item a set of 16-bit probability context probabilities, as described in Annex~\ref{arithdecoding}.
500 \end{itemize}
502 The process for updating context probabilities, used for coding values,
503 is described in Annex \ref{contextupdate}
505 A Dirac binary arithmetic encoder implementation codes a set of data in three stages:
506 \begin{enumerate}
507 \item initialisation,
508 \item processing of all values, and
509 \item flushing.
510 \end{enumerate}
512 \subsubsection{Initialisation}
514 Initialisation of the arithmetic encoder is very simple -- the internal variables are
515 set as:
516 \begin{eqnarray*}
517 low&=&\text{0x0} \\
518 range&=&\text{0xFFFF} \\
519 carry&=&0
520 \end{eqnarray*}
522 With 16 bit accuracy, 0xFFFF corresponds to an interval width value of (almost) 1. All
523 context probabilities are initialised to probability $1/2$ (0x8000).
525 \subsubsection{Encoding binary values}
526 \label{arithwritebool}
527 The encoding process for a binary value must precisely mirror
528 that for the decoding process (Annex \ref{arithreadbool}). In
529 particular the interval variables $low$ and $range$ must be
530 updated in the same way.
532 Coding a boolean value consists of three sub-stages (in order):
533 \begin{enumerate}
534 \item scaling the interval $[low,low+range)$,
535 \item updating contexts, and
536 \item renormalising and outputting data.
537 \end{enumerate}
539 \paragraph{Scaling the interval}$\ $\newline
540 The integer interval $[low,low+range)$ represents the real interval
541 \[[l,h)=[low/2^{16},(low+range)/2^{16})\]
542 In a given context with label $label$, the probability of zero can be extracted as
543 \[prob\_zero=\AContexts[label]\]
545 If $0$ is to be encoded, the real interval $[l,h)$ should be rescaled so that $l$ is unchanged and the width $r=h-l=range/2^{16}$ is scaled to $r*p_0$ where $p_0=prob\_zero/2^{16}$.
547 This operation is approximated by setting
548 \[range=(range*prob\_zero)\gg 16\]
550 If 1 is to be encoded, $[l,h)$ should be rescaled so that $h$ is
551 unchanged and $r$ is scaled to $(1-p0)*r$. This operation is
552 approximated by setting
553 \begin{eqnarray*}
554 low &+= & (range*prob\_zero)\gg 16 \\
555 range & -= & (range*prob\_zero)\gg 16
556 \end{eqnarray*}
558 \paragraph{Updating contexts}$\ $\newline
559 Contexts are updated in exactly the same way as the
560 decoder (Annex \ref{contextupdate}).
562 \paragraph{Renormalisation and output}$\ $\newline
563 Renormalisation must cause $low$ and $range$ to be modified exactly
564 as in the decoder (Annex \ref{renormalisation}). In addition,
565 during renormalisation bits are output when $low$ and $low+range$
566 agree in their msbs, taking into account carries accumulated when a
567 straddle condition is accumulated.
569 In pseudocode, this is as follows:
571 \begin{pseudo*}
572 \bsWHILE{range<=\text{0x4000}}
573 \bsIF{((low+range-1)\wedge low)>=\text{0x8000}}
574 \bsCODE{low \wedge= \text{0x4000}}
575 \bsCODE{carry+=1}
576 \bsELSE
577 \bsCODE{write\_bit( low\&\text{0x8000} )}
578 \bsWHILE{ carry>0}
579 \bsCODE{write\_bit( !low\&\text{0x8000} )}
580 \bsCODE{carry -= 1}
581 \bsEND
582 \bsEND
583 \bsCODE{low <<= 1}
584 \bsCODE{range <<= 1}
585 \bsCODE{low \&= \text{0xFFFF}}
586 \bsEND
587 \end{pseudo*}
589 \paragraph{Flushing the encoder}$\ $\newline
590 After encoding, there may still be insufficient bits for a decoder
591 to determine the final few encoded symbols, partly because further
592 renormalisation is required -- for example, msbs may agree but the range
593 may still be larger than 0x4000) -- and partly because there may be
594 unresolved carries.
596 A four-stage process will adequately flush the encoder:
597 \begin{enumerate}
598 \item output remaining resolved msbs,
599 \item resolve remaining straddle conditions,
600 \item flush carry bits, and
601 \item byte align the output with padding bits.
602 \end{enumerate}
604 The remaining msbs are output as follows:
606 \begin{pseudo*}
607 \bsWHILE{(low+range-1)\wedge low<\text{0x8000}}
608 \bsCODE{write\_bit( low\&\text{0x8000} !=\text{0x0} )}
609 \bsWHILE{ carry>0}
610 \bsCODE{write\_bit( (low\&\text{0x8000})==\text{0x0} )}
611 \bsCODE{carry -= 1}
612 \bsEND
613 \bsCODE{low <<= 1}
614 \bsCODE{low \&= \text{0xFFFF}}
615 \bsCODE{range <<= 1}
616 \bsEND
617 \end{pseudo*}
619 Remaining straddles can then be resolved by:
621 \begin{pseudo*}
622 \bsWHILE{ (low \& \text{0x4000}) \text{ and } (((low+range-1) \& \text{0x4000})!=\text{0x0})}
623 \bsCODE{carry+=1}
624 \bsCODE{low \wedge= \text{0x4000}}
625 \bsCODE{low <<= 1}
626 \bsCODE{range <<= 1}
627 \bsCODE{low \&= \text{0xFFFF}}
628 \bsEND
629 \end{pseudo*}
631 Carry bits can be discharged by picking a resolution of
632 the final straddles:
634 \begin{pseudo*}
635 \bsCODE{write\_bit(low \& \text{0x4000}!=\text{0x0})}
636 \bsFOR{c=0}{carry}
637 \bsCODE{write\_bit((low \& \text{0x4000})==\text{0x0})}
638 \bsEND
639 \end{pseudo*}
641 Finally, 0-7 padding bits are added to the encoded output to make
642 a whole number of bytes. These are not necessary for decoding, but
643 for stream compliance.
645 \subsection{Efficient implementation}
647 Some similar techniques to those described in Section~\ref{arithdecopts} can be
648 used in the encoder to speed up operation.
650 \paragraph{Bytewise operation}$\ $\newline
651 It is not necessary to output bits one by one. Instead, $low$ may be allowed to
652 accumulate bits at the lower end and output them when a byte has accumulated. If the
653 last bit determined was a 1, this 1 must be carried to the previous byte, so
654 renormalisation becomes:
656 \begin{pseudo*}
657 \bsWHILE{range<=\text{0x4000}}
658 \bsCODE{low \ll= 1}
659 \bsCODE{range \ll= 1}
660 \bsCODE{counter -= 1}
661 \bsIF{counter==0}
662 \bsIF{low<=1\ll 24 \text{ and } low+range> 1\ll24}
663 \bsCODE{carry+=1}
664 \bsELSE
665 \bsIF{low<1\ll 24}
666 \bsWHILE{carry}
667 \bsCODE{data[pos]=\text{0xFF}}
668 \bsCODE{carry -= 1}
669 \bsCODE{pos += 1}
670 \bsEND
671 \bsELSE
672 \bsCODE{data[pos-1] += 1}
673 \bsWHILE{carry}
674 \bsCODE{data[pos]=\text{0x00}}
675 \bsCODE{carry -= 1}
676 \bsCODE{pos += 1}
677 \bsEND
678 \bsEND
679 \bsCODE{data[pos](low\gg16}
680 \bsCODE{pos += 1}
681 \bsCODE{low \&= \text{0xFFFF}}
682 \bsEND
683 \bsCODE{counter=8}
684 \bsEND
685 \bsEND
686 \end{pseudo*}
688 \paragraph{Overlap and add}$\ $\newline
689 Renormalisation can be simplified still further by observing that carries occur if and only
690 if the top byte of $low$ becomes 0xFF. In this case a carried 1 would propagate up multiple
691 bytes, turning 0xFFs into 0x00s. So it is possible to store the top two bytes of $low$ (i.e.
692 bit 24 containing the carry bit and the next byte) and do an overlap and add at the end to correctly
693 propagate values back to the beginning. I.e. renormalisation becomes:
695 \begin{pseudo*}
696 \bsWHILE{range<=\text{0x4000}}
697 \bsCODE{low \ll= 1}
698 \bsCODE{range \ll= 1}
699 \bsCODE{counter -= 1}
700 \bsIF{counter==0}
701 \bsCODE{low\_list[pos] = low\gg16}
702 \bsCODE{pos+=1}
703 \bsCODE{counter=8}
704 \bsCODE{low \&= \text{0xFFFF}}
705 \bsEND
706 \bsEND
707 \end{pseudo*}
709 At the end of coding all values, a flush function will complete $low\_list$ for remaining values
710 and perform the overlap and add:
712 \begin{pseudo*}
713 \bsCODE{low += 1}
714 \bsCODE{low \ll= counter}
715 \bsCODE{low\_list[pos] = (low \gg 16)\&\text{0xFFFF}}
716 \bsCODE{pos += 1}
717 \bsCODE{low\_list[pos] = (low \gg 8)\&\text{0xFFFF}}
718 \bsCODE{pos += 1}
719 \bsCODE{data[pos-1] = low\_list[pos-1] \& \text{0xFF}}
720 \bsCODE{data[pos-2] = low\_list[pos-2] \& \text{0xFF}}
721 \bsFOR{i=0}{pos-3}
722 \bsCODE{low\_list[pos-3-i] += low\_list[pos-2-i]\gg8}
723 \bsCODE{data[pos-3-i] = low\_list[pos-3-i]\&\text{0xFF}}
724 \bsEND
725 \end{pseudo*}