1 \documentclass[9pt,letterpaper
]{book
}
11 \usepackage{longtable
}
14 \usepackage[pdfpagemode=None,pdfstartview=FitH,pdfview=FitH,colorlinks=true
]%
17 \newtheorem{theorem
}{Theorem
}[section
]
18 \newcommand{\idx}[1]{{\ensuremath{\mathit{#1}}}}
19 \newcommand{\qti}{\idx{qti
}}
20 \newcommand{\qtj}{\idx{qtj
}}
21 \newcommand{\pli}{\idx{pli
}}
22 \newcommand{\plj}{\idx{plj
}}
23 \newcommand{\qi}{\idx{qi
}}
24 \newcommand{\ci}{\idx{ci
}}
25 \newcommand{\bmi}{\idx{bmi
}}
26 \newcommand{\bmj}{\idx{bmj
}}
27 \newcommand{\qri}{\idx{qri
}}
28 \newcommand{\qrj}{\idx{qrj
}}
29 \newcommand{\hti}{\idx{hti
}}
30 \newcommand{\sbi}{\idx{sbi
}}
31 \newcommand{\bi}{\idx{bi
}}
32 \newcommand{\bj}{\idx{bj
}}
33 \newcommand{\mbi}{\idx{mbi
}}
34 \newcommand{\mbj}{\idx{mbj
}}
35 \newcommand{\mi}{\idx{mi
}}
36 \newcommand{\cbi}{\idx{cbi
}}
37 \newcommand{\qii}{\idx{qii
}}
38 \newcommand{\ti}{\idx{ti
}}
39 \newcommand{\tj}{\idx{tj
}}
40 \newcommand{\rfi}{\idx{rfi
}}
41 \newcommand{\zzi}{\idx{zzi
}}
42 \newcommand{\ri}{\idx{ri
}}
43 %This somewhat odd construct ensures that \bitvar{\qi}, etc., will set the
44 % qi in bold face, even though it is in a \mathit font, yet \bitvar{VAR} will
45 % set VAR in a bold, roman font.
46 \newcommand{\bitvar}[1]{\ensuremath{\mathbf{\bm{#1}}}}
47 \newcommand{\locvar}[1]{\ensuremath{\mathrm{#1}}}
48 \newcommand{\term}[1]{{\em #1}}
49 \newcommand{\bin}[1]{\ensuremath{\mathtt{b
#1}}}
50 \newcommand{\hex}[1]{\ensuremath{\mathtt{0x
#1}}}
51 \newcommand{\ilog}{\ensuremath{\mathop{\mathrm{ilog
}}\nolimits}}
52 \newcommand{\round}{\ensuremath{\mathop{\mathrm{round
}}\nolimits}}
53 \newcommand{\sign}{\ensuremath{\mathop{\mathrm{sign
}}\nolimits}}
54 \newcommand{\lflim}{\ensuremath{\mathop{\mathrm{lflim
}}\nolimits}}
56 %Section-based table, figure, and equation numbering.
57 \numberwithin{equation
}{chapter
}
58 \numberwithin{figure
}{chapter
}
59 \numberwithin{table
}{chapter
}
64 \bibliographystyle{alpha
}
66 \title{Theora Specification
}
67 \author{Xiph.org Foundation
}
96 \markboth{{\sc Notation and Conventions
}}{{\sc Notation and Conventions
}}
97 \chapter*
{Notation and Conventions
}
99 All parameters either passed in or out of a decoding procedure are given in
102 The prefix
\bin{} indicates that the following value is to be interpreted as a
103 binary number (base
2).
105 {\bf Example:
} The value
\bin{1110100} is equal to the decimal value
116.
108 The prefix
\hex{} indicates the the following value is to be interpreted as a
109 hexadecimal number (base
16).
111 {\bf Example:
} The value
\hex{74} is equal to the decimal value
116.
114 All arithmetic defined by this specification is exact.
115 However, any real numbers that do arise will always be converted back to
116 integers again in short order.
117 The entire specification can be implemented using only normal integer
119 All operations are to be implemented with sufficiently large integers so that
120 overflow cannot occur.
121 Where the result of a computation is to be truncated to a fixed-sized binary
122 representation, this will be explicitly noted.
123 The size given for all variables is the maximum number of bits needed to store
124 any value in that variable.
125 Intermediate computations involving that variable may require more bits.
127 The following operators are defined:
131 The absolute value of a number $a$.
133 |a| & =
\left\
{\begin{array
}{ll
}
140 Multiplication of a number $a$ by a number $b$.
142 Exact division of a number $a$ by a number $b$, producing a potentially
145 \item[$
\left\lfloor a
\right\rfloor$
]
146 The largest integer less than or equal to a real number $a$.
148 \item[$
\left\lceil a
\right\rceil$
]
149 The smallest integer greater than or equal to a real number $a$.
152 Integer division of $a$ by $b$.
154 a//b & =
\left\
{\begin{array
}{ll
}
155 \left\lceil\frac{a
}{b
}\right\rceil, & a <
0 \\
156 \left\lfloor\frac{a
}{b
}\right\rfloor, & a
\ge 0
161 The remainder from the integer division of $a$ by $b$.
163 a\%b & = a-|b|*
\left\lfloor\frac{a
}{|b|
}\right\rfloor
165 Note that with this definition, the result is always non-negative and less than
169 The value obtained by left-shifting the two's complement integer $a$ by $b$
171 For purposes of this specification, overflow is ignored, and so this is
172 equivalent to integer multiplication of $a$ by $
2^b$.
175 The value obtained by right-shifting the two's complement integer $a$ by $b$
176 bits, filling in the leftmost bits of the new value with $
0$ if $a$ is
177 non-negative and $
1$ if $a$ is negative.
178 This is
{\em not
} equivalent to integer division of $a$ by $
2^b$.
181 a>>b & =
\left\lfloor\frac{a
}{2^b
}\right\rfloor.
185 Rounds a number $a$ to the nearest integer, with ties rounded away from $
0$.
187 \round(a) =
\left\
{\begin{array
}{ll
}
188 \lceil a-
\frac{1}{2}\rceil & a
\le 0 \\
189 \lfloor a+
\frac{1}{2}\rfloor & a >
0
194 Returns the sign of a given number.
196 \sign(a) =
\left\
{\begin{array
}{ll
}
204 The minimum number of bits required to store a positive integer $a$ in
205 two's complement notation, or $
0$ for a non-positive integer $a$.
207 \ilog(a) =
\left\
{\begin{array
}{ll
}
209 \left\lfloor\log_2{a
}\right\rfloor+
1, & a >
0
227 The minimum of two numbers $a$ and $b$.
230 The maximum of two numbers $a$ and $b$.
236 \thispagestyle{plain
}
237 \markboth{{\sc Key words
}}{{\sc Key words
}}
240 %We can't rewrite this, because this is text required by RFC 2119, so we use
241 % some emergency stretching to get it typeset properly.
242 \setlength{\emergencystretch}{2em
}
243 The key words ``MUST'', ``MUST NOT'', ``REQUIRED'', ``SHALL'', ``SHALL NOT'',
244 ``SHOULD'', ``SHOULD NOT'', ``RECOMMENDED'', ``MAY'', and ``OPTIONAL'' in this
245 document are to be intrepreted as described in RFC
2119 \cite{rfc2119
}.
\par
246 \setlength{\emergencystretch}{0em
}
248 Where such assertions are placed on the contents of a Theora bitstream itself,
249 implementations should be prepared to encounter bitstreams that do not follow
251 An application's behavior in the presecence of such non-conforming bitstreams
252 is not defined by this specification, but any reasonable method of handling
254 By way of example, applications MAY discard the current frame, retain the
255 current output thus far, or attempt to continue on by assuming some default
256 values for the erroneous bits.
257 When such an error occurs in the bitstream headers, an application MAY refuse
258 to decode the entire stream.
259 An application SHOULD NOT allow such non-conformant bitstreams to overflow
260 buffers and potentially execute arbitrary code, as this represents a serious
263 An application MUST, however, ensure any bits marked as reserved have the value
264 zero, and refuse to decode the stream if they do not.
265 These are used as place holders for future bitstream features with which the
266 current bitstream is forward-compatible.
267 Such features may not increment the bitstream version number, and can only be
268 recognized by checking the value of these reserved bits.
276 \pagenumbering{arabic
}
279 \chapter{Introduction
}
281 Theora is a general purpose, lossy video codec.
282 It is based on the VP3 video codec produced by On2 Technologies
283 (
\url{http://www.on2.com/
}).
284 On2 donated the VP3.1 source code to the Xiph.org Foundation and released it
285 under a BSD-like license.
286 On2 also made an irrevocable, royalty-free license grant for any patent claims
287 it might have over the software and any derivatives.
288 No formal specification exists for the VP3 format beyond this source code,
289 however Mike Melanson maintains a detailed description
\cite{Mel04
}.
290 Portions of this specification were adopted from that text with permission.
292 \section{VP3 and Theora
}
294 Theora contains a superset of the features that were available in the original
296 Content encoded with VP3.1 can be losslessly transcoded into the Theora format.
297 Theora content cannot, in general, be losslessly transcoded into the VP3
299 If a feature is not available in the original VP3 format, this is mentioned
300 when that feature is defined.
301 A complete list of these features appears in Appendix~
\ref{app:vp3-compat
}.
302 %TODO: VP3 - theora comparison in appendix
304 \section{Video Formats
}
306 Theora currently supports progressive video data of arbitrary dimensions at a
307 constant frame rate in one of several $Y'C_bC_r$
color spaces.
308 The precise definition the supported
color spaces appears in
309 Section~
\ref{sec:colorspaces
}.
310 Three different chroma subsampling formats are supported:
4:
2:
0,
4:
2:
2,
312 The precise details of each of these formats and their sampling locations are
313 described in Section~
\ref{sec:pixfmts
}.
315 The Theora format does not support interlaced material, variable frame rates,
316 bit-depths larger than
8 bits per component, nor alternate
color spaces such
317 as RGB or arbitrary multi-channel spaces.
318 Black and white content can be efficiently encoded, however, because the
319 uniform chroma planes compress well.
320 Support for interlaced material is planned for a future version.
322 {\bf Note:
} Infrequently changing frame rates---as when film and video
323 sequences are cut together---can be supported in the Ogg container format by
324 chaining several Theora streams together.
326 Support for increased bit depths or additional
color spaces is not planned.
328 \section{Classification
}
330 Theora is a block-based lossy transform codec that utilizes an
331 $
8\times 8$ Type-II Discrete Cosine Transform and block-based motion
333 This places it in the same class of codecs as MPEG-
1, -
2, -
4, and H
.263.
334 The details of how individual blocks are organized and how DCT coefficients are
335 stored in the bitstream differ substantially from these codecs, however.
336 Theora supports only intra frames (I frames in MPEG) and inter frames (P frames
338 There is no equivalent to the bi-predictive frames (B frames) found in MPEG
341 \section{Assumptions
}
343 The Theora codec design assumes a complex, psychovisually-aware encoder and a
344 simple, low-complexity decoder.
345 %TODO: Talk more about implementation complexity.
347 Theora provides none of its own framing, synchronization, or protection against
349 An encoder is solely a method of accepting input video frames and
350 compressing these frames into raw, unformatted `packets'.
351 The decoder then accepts these raw packets in sequence, decodes them, and
352 synthesizes a fascimile of the original video frames.
353 Theora is a free-form variable bit rate (VBR) codec, and packets have no
354 minimum size, maximum size, or fixed/expected size.
356 Theora packets are thus intended to be used with a transport mechanism that
357 provides free-form framing, synchronization, positioning, and error correction
358 in accordance with these design assumptions, such as Ogg (for file transport)
359 or RTP (for network multicast).
360 For the purposes of a few examples in this
document, we will assume that Theora
361 is embedded in an Ogg stream specifically, although this is by no means a
362 requirement or fundamental assumption in the Theora design.
364 The specification for embedding Theora into an Ogg transport stream is given in
365 Appendix~
\ref{app:oggencapsulation
}.
367 \section{Codec Setup and Probability Model
}
369 Theora's heritage is the proprietary commerical codec VP3, and it retains a
370 fair amount of inflexibility when compared to Vorbis
\cite{vorbis
}, the first
371 Xiph.org codec, which began as a research codec.
372 However, to provide additional scope for encoder improvement, Theora adopts
373 some of the configurable aspects of decoder setup that are present in Vorbis.
374 This configuration data is not available in VP3, which uses hardcoded values
377 Theora makes the same controversial design decision that Vorbis made to include
378 the entire probability model for the DCT coefficients and all the quantization
379 parameters in the bitstream headers.
380 This is often several hundred fields.
381 It is therefore impossible to decode any frame in the stream without
382 having previously fetched the codec info and codec setup headers.
385 {\bf Note:
} Theora
{\em can
} initiate decode at an arbitrary intra-frame packet
386 within a bitstream so long as the codec has been initialized with the setup
390 Thus, Theora headers are both required for decode to begin and relatively large
391 as bitstream headers go.
392 The header size is unbounded, although as a rule-of-thumb less than
16kB is
393 recommended, and Xiph.org's reference encoder follows this suggestion.
394 %TODO: Is 8kB enough? My setup header is 7.4kB, that doesn't leave much room
396 %RG: the lesson from vorbis is that as small as possible is really
397 % important in some applications. Practically, what's acceptable
398 % depends a great deal on the target bitrate. I'd leave 16 kB in the
399 % spec for now. fwiw more than 1k of comments is quite unusual.
401 Our own design work indicates that the primary liability of the required header
402 is in mindshare; it is an unusual design and thus causes some amount of
403 complaint among engineers as this runs against current design trends and
404 points out limitations in some existing software/interface designs.
405 However, we find that it does not fundamentally limit Theora's suitable
409 %\subsection{Format Specification}
410 \section{Format Conformance
}
412 The Theora format is well-defined by its decode specification; any encoder that
413 produces packets that are correctly decoded by an implementation following
414 this specification may be considered a proper Theora encoder.
415 A decoder must faithfully and completely implement the specification defined
416 herein
%, except where noted,
417 to be considered a conformant Theora decoder.
418 A decoder need not be implemented strictly as described, but the
419 actual decoder process MUST be
{\em entirely mathematically equivalent
}
420 to the described process.
421 Where appropriate, a non-normative description of encoder processes is
423 These sections will be marked as such, and a proper Theora encoder is not
424 bound to follow them.
426 %TODO: \subsection{Hardware Profile}
429 \chapter{Coded Video Structure
}
431 Theora's encoding and decoding process is based on $
8\times 8$ blocks of
433 This sections describes how a video frame is laid out, divided into
434 blocks, and how those blocks are organized.
436 \section{Frame Layout
}
438 A video frame in Theora is a two-dimensional array of pixels.
439 Theora, like VP3, uses a right-handed coordinate system, with the origin in the
440 lower-left corner of the frame.
441 This is contrary to many video formats which use a left-handed coordinate
442 system with the origin in the upper-left corner of the frame.
443 %INT: This means that for interlaced material, the definition of `even fields'
444 %INT: and `odd fields' may be reversed between Theora and other video codecs.
445 %INT: This document will always refer to them as `top fields' and `bottom
448 Theora divides the pixel array up into three separate
\term{color planes
}, one
449 for each of the $Y'$, $C_b$, and $C_r$ components of the pixel.
450 The $Y'$ plane is also called the
\term{luma plane
}, and the $C_b$ and $C_r$
451 planes are also called the
\term{chroma planes
}.
452 Each plane is assigned a numerical value, as shown in
453 Table~
\ref{tab:
color-planes
}.
457 \begin{tabular
}{cl
}\toprule
458 Index & Color Plane \\
\midrule
462 \bottomrule\end{tabular
}
464 \caption{Color Plane Indices
}
465 \label{tab:
color-planes
}
468 In some pixel formats, the chroma planes are subsampled by a factor of two
469 in one or both directions.
470 This means that the width or height of the chroma planes may be half that of
471 the total frame width and height.
472 The luma plane is never subsampled.
474 \section{Picture Region
}
476 An encoded video frame in Theora is required to have a width and height that
477 are multiples of sixteen, making an integral number of blocks even when the
478 chroma planes are subsampled.
479 However, inside a frame a smaller
\term{picture region
} may be defined
480 to present material whose dimensions are not a multiple of sixteen pixels, as
481 shown in Figure~
\ref{fig:pic-frame
}.
482 The picture region can be offset from the lower-left corner of the frame by up
483 to
255 pixels in each direction, and may have an arbitrary width and height,
484 provided that it is contained entirely within the coded frame.
485 It is this picture region that contains the actual video data.
486 The portions of the frame which lie outside the picture region may contain
487 arbitrary image data, so the frame must be cropped to the picture region
489 The picture region plays no other role in the decode process, which operates on
490 the entire video frame.
494 \includegraphics{pic-frame
}
496 \caption{Location of frame and picture regions
}
497 \label{fig:pic-frame
}
500 \section{Blocks and Super Blocks
}
501 \label{sec:blocks-and-sbs
}
503 Each
color plane is subdivided into
\term{blocks
} of $
8\times 8$ pixels.
504 Blocks are grouped into $
4\times 4$ arrays called
\term{super blocks
} as
505 shown in Figure~
\ref{fig:superblock
}.
506 Each
color plane has its own set of blocks and super blocks.
507 If the chroma planes are subsampled, they are still divided into $
8\times 8$
508 blocks of pixels; there are just fewer blocks than in the luma plane.
509 The boundaries of blocks and super blocks in the luma plane do not necessarily
510 coincide with those of the chroma planes, if the chroma planes have been
515 \includegraphics{superblock
}
517 \caption{Subdivision of a frame into blocks and super blocks
}
518 \label{fig:superblock
}
521 Blocks are accessed in two different orders in the various decoder processes.
522 The first is
\term{raster order
}, illustrated in Figure~
\ref{fig:raster-block
}.
523 This accesses each block in row-major order, starting in the lower left of the
524 frame and continuing along the bottom row of the entire frame, followed by the
525 next row up, starting on the left edge of the frame, etc.
529 \includegraphics{raster-block
}
531 \caption{Raster ordering of $n
\times m$ blocks
}
532 \label{fig:raster-block
}
535 The second is
\term{coded order
}.
536 In coded order, blocks are accessed by super block.
537 Within each frame, super blocks are traversed in raster order,
538 similar to raster order for blocks.
539 Within each super block, however, blocks are accessed in a Hilbert curve
540 pattern, illustrated in Figure~
\ref{fig:hilbert-block
}.
541 If a
color plane does not contain a complete super block on the top or right
542 sides, the same ordering is still used, simply with any blocks outside the
543 frame boundary ommitted.
547 \includegraphics{hilbert-block
}
549 \caption{Hilbert curve ordering of blocks within a super block
}
550 \label{fig:hilbert-block
}
553 To illustrate this ordering, consider a frame that is
240 pixels wide and
555 Each row of the luma plane has
30 blocks and
8 super blocks, and there are
6
556 rows of blocks and two rows of super blocks.
558 %When accessed in raster order, each block in the luma plane is assigned the
561 %\vspace{\baselineskip}
563 %\begin{tabular}{|ccccccc|}\hline
564 %150 & 151 & 152 & 153 & $\ldots$ & 178 & 179 \\
565 %120 & 121 & 122 & 123 & $\ldots$ & 148 & 149 \\\hline
566 % 90 & 91 & 92 & 93 & $\ldots$ & 118 & 119 \\
567 % 60 & 61 & 62 & 63 & $\ldots$ & 88 & 89 \\
568 % 30 & 31 & 32 & 33 & $\ldots$ & 58 & 59 \\
569 % 0 & 1 & 2 & 3 & $\ldots$ & 28 & 29 \\\hline
572 %\vspace{\baselineskip}
574 When accessed in coded order, each block in the luma plane is assigned the
577 \vspace{\baselineskip}
579 \begin{tabular
}{|cccc|c|cc|
}\hline
580 123 &
122 &
125 &
124 & $
\ldots$ &
179 &
178 \\
581 120 &
121 &
126 &
127 & $
\ldots$ &
176 &
177 \\
\hline
582 5 &
6 &
9 &
10 & $
\ldots$ &
117 &
118 \\
583 4 &
7 &
8 &
11 & $
\ldots$ &
116 &
119 \\
584 3 &
2 &
13 &
12 & $
\ldots$ &
115 &
114 \\
585 0 &
1 &
14 &
15 & $
\ldots$ &
112 &
113 \\
\hline
588 \vspace{\baselineskip}
590 Here the index values specify the order in which the blocks would be accessed.
591 The indices of the blocks are numbered continuously from one
color plane to the
593 They do not reset to zero at the start of each plane.
594 Instead, the numbering increases continuously from the $Y'$ plane to the $C_b$
595 plane to the $C_r$ plane.
596 The implication is that the blocks from all planes are treated as a unit during
597 the various processing steps.
599 Although blocks are sometimes accessed in raster order, in this
document the
600 index associated with a block is
{\em always
} its index in coded order.
602 \section{Macro Blocks
}
605 A macro block contains a $
2\times 2$ array of blocks in the luma plane
606 {\em and
} the co-located blocks in the chroma planes, as shown in
607 Figure~
\ref{fig:macroblock
}.
608 Thus macro blocks can represent anywhere from six to twelve blocks, depending
609 on how the chroma planes are subsampled.
610 This is in contrast to super blocks, which only contain blocks from a single
612 % the whole super vs. macro blocks thing is a little confusing, and it can be
613 % hard to remember which is what initially. A figure would/will help here,
614 % but I tried to add some text emphasizing the difference in terms of
616 %TBT: At this point we haven't described any functionality yet.
617 %TBT: As far as the reader knows, the only purpose of the blocks, macro blocks
618 %TBT: and super blocks is for data organization---and for blocks and super
619 %TBT: blocks, this is essentially true.
620 %TBT: So lets restrict the differences we emphasize to those of data
621 %TBT: organization, which the sentence I just added above does.
622 Macro blocks contain information about coding mode and motion vectors for the
623 corresponding blocks in all
color planes.
627 \includegraphics{macroblock
}
629 \caption{Subdivision of a frame into macro blocks
}
630 \label{fig:macroblock
}
633 Macro blocks are also accessed in a
\term{coded order
}.
634 This coded order proceeds by examining each super block in the luma plane in
635 raster order, and traversing the four macro blocks inside using a smaller
636 Hilbert curve, as shown in Figure~
\ref{fig:hilbert-mb
}.
637 %r: I rearranged the wording to make a more formal idiom here
638 If the luma plane does not contain a complete super block on the top or right
639 sides, the same ordering is still used, with any macro blocks outside
640 the frame boundary simply omitted.
641 Because the frame size is constrained to be a multiple of
16, there are never
642 any partial macro blocks.
643 Unlike blocks, macro blocks need never be accessed in a pure raster order.
647 \includegraphics{hilbert-mb
}
649 \caption{Hilbert curve ordering of macro blocks within a super block
}
650 \label{fig:hilbert-mb
}
653 Using the same frame size as the example above, there are
15 macro blocks in
654 each row and
3 rows of macro blocks.
655 The macro blocks are assigned the following indices:
657 \vspace{\baselineskip}
659 \begin{tabular
}{|cc|cc|c|cc|c|
}\hline
660 30 &
31 &
32 &
33 & $
\cdots$ &
42 &
43 &
44 \\
\hline
661 1 &
2 &
5 &
6 & $
\cdots$ &
25 &
26 &
29 \\
662 0 &
3 &
4 &
7 & $
\cdots$ &
24 &
27 &
28 \\
\hline
665 \vspace{\baselineskip}
667 \section{Coding Modes and Prediction
}
669 Each block is coded using one of a small, fixed set of
\term{coding modes
} that
670 define how the block is predicted from previous frames.
671 A block is predicted using one of two
\term{reference frames
}, selected
672 according to the coding mode.
673 A reference frame is the fully decoded version of a previous frame in the
675 The first available reference frame is the previous intra frame, called the
677 The second available reference frame is the previous frame, whether it was an
678 intra frame or an inter frame.
679 If the previous frame was an intra frame, then both reference frames are the
681 See Figure~
\ref{fig:reference-frames
} for an illustration of the reference
682 frames used for an intra frame that does not follow an intra frame.
686 \includegraphics{reference-frames
}
688 \caption{Example of reference frames for an inter frame
}
689 \label{fig:reference-frames
}
692 Two coding modes in particular are worth mentioning here.
693 The INTRA mode is used for blocks that are not predicted from either reference
695 This is the only coding mode allowed in intra frames.
696 The INTER
\_NOMV coding mode uses the co-located contents of the block in the
697 previous frame as the predictor.
698 This is the default coding mode.
700 \section{DCT Coefficients
}
701 \label{sec:dct-coeffs
}
703 A
\term{residual
} is added to the predicted contents of a block to form the
704 final reconstruction.
705 The residual is stored as a set of quantized coefficients from an integer
706 approximation of a two-dimensional Type II Discrete Cosine Transform.
707 The DCT takes an $
8\times 8$ array of pixel values as input and returns an
708 $
8\times 8$ array of coefficient values.
709 The
\term{natural ordering
} of these coefficients is defined to be row-major
710 order, from lowest to highest frequency.
711 They are also often indexed in
\term{zig-zag order
}, as shown in
712 Figure~
\ref{tab:zig-zag
}.
716 \begin{tabular
}[c
]{rr|c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c@
{}c
}
717 &
\multicolumn{1}{r
}{} & && &&&&&$c$&&& && && \\
718 &
\multicolumn{1}{r
}{} &
0&&
1&&
2&&
3&&
4&&
5&&
6&&
7 \\
\cline{3-
17}
719 &
0 &
0 &$
\rightarrow$&
1 &&
5 &$
\rightarrow$&
6 &&
14 &$
\rightarrow$&
15 &&
27 &$
\rightarrow$&
28 \\
[-
0.5\defaultaddspace]
720 & & &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$& \\
721 &
1 &
2 & &
4 &&
7 & &
13 &&
16 & &
26 &&
29 & &
42 \\
[-
0.5\defaultaddspace]
722 & &$
\downarrow$&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&$
\downarrow$ \\
723 &
2 &
3 & &
8 &&
12 & &
17 &&
25 & &
30 &&
41 & &
43 \\
[-
0.5\defaultaddspace]
724 & & &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$& \\
725 &
3 &
9 & &
11 &&
18 & &
24 &&
31 & &
40 &&
44 & &
53 \\
[-
0.5\defaultaddspace]
726 $r$&&$
\downarrow$&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&$
\downarrow$ \\
727 &
4 &
10 & &
19 &&
23 & &
32 &&
39 & &
45 &&
52 & &
54 \\
[-
0.5\defaultaddspace]
728 & & &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$& \\
729 &
5 &
20 & &
22 &&
33 & &
38 &&
46 & &
51 &&
55 & &
60 \\
[-
0.5\defaultaddspace]
730 & &$
\downarrow$&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&&$
\swarrow$&&$
\nearrow$&$
\downarrow$ \\
731 &
6 &
21 & &
34 &&
37 & &
47 &&
50 & &
56 &&
59 & &
61 \\
[-
0.5\defaultaddspace]
732 & & &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$&&$
\nearrow$& &$
\swarrow$& \\
733 &
7 &
35 &$
\rightarrow$&
36 &&
48 &$
\rightarrow$&
49 &&
57 &$
\rightarrow$&
58 &&
62 &$
\rightarrow$&
63
736 \caption{Zig-zag order
}
741 {\bf Note:
} the row and column indices refer to
{\em frequency number
} and not
743 The frequency numbers are defined independently of the memory organization of
745 They have been written from top to bottom here to follow conventional notation,
746 despite the right-handed coordinate system Theora uses for pixel locations.
747 %RG: I'd rather we were internally consistent and put dc at the lower left.
748 Many implementations of the DCT operate `in-place'.
749 That is, they return DCT coefficients in the same memory buffer that the
750 initial pixel values were stored in.
751 Due to the right-handed coordinate system used for pixel locations in Theora,
752 one must note carefully how both pixel values and DCT coefficients are
753 organized in memory in such a system.
756 DCT coefficient $(
0,
0)$ is called the
\term{DC coefficient
}.
757 All the other coefficients are called
\term{AC coefficients
}.
760 \chapter{Decoding Overview
}
762 This section provides a high level description of the Theora codec's
764 A bit-by-bit specification appears beginning in Section~
\ref{sec:bitpacking
}.
765 The later sections assume a high-level understanding of the Theora decode
766 process, which is provided below.
768 \section{Decoder Configuration
}
770 Decoder setup consists of configuration of the quantization matrices and the
771 Huffman codebooks for the DCT coefficients, and a table of limit values for
772 the deblocking filter.
773 The remainder of the decoding pipeline is not configurable.
775 \subsection{Global Configuration
}
777 The global codec configuration consists of a few video related fields, such as
778 frame rate, frame size, picture size and offset, aspect ratio,
color space,
779 pixel format, and a version number.
780 The version number is divided into a major version, a minor version, amd a
781 minor revision number.
782 %r: afaik the released vp3 codec called itself 3.1 and is compatible w/ theora
783 %r: even though we received the in-progress 3.2 codebase
784 For the format defined in this specification, these are `
3', `
2', and
785 `
1', respectively, in reference to Theora's origin as a successor to
788 \subsection{Quantization Matrices
}
790 Theora allows up to
384 different quantization matrices to be defined, one for
791 each
\term{quantization type
},
\term{color plane
} ($Y'$, $C_b$, or $C_r$), and
792 \term{quantization index
},
\qi, which ranges from zero to
63, inclusive.
793 There are currently two quantization types defined, which depend on the coding
794 mode of the block being dequantized, as shown in Table~
\ref{tab:quant-types
}.
798 \begin{tabular
}{cl
}\toprule
799 Quantization Type & Usage \\
\midrule
800 $
0$ & INTRA-mode blocks \\
801 $
1$ & Blocks in any other mode. \\
802 \bottomrule\end{tabular
}
804 \caption{Quantization Type Indices
}
805 \label{tab:quant-types
}
808 %r: I think 'nominally' is more specific than 'generally' here
809 The quantization index, on the other hand, nominally represents a progressive
810 range of quality levels, from low quality near zero to high quality near
63.
811 However, the interpretation is arbitrary, and it is possible, for example, to
812 partition the scale into two completely separate ranges with
32 levels each
813 that are meant to represent different classes of source material, or any
814 other arrangement that suits the encoder's requirements.
816 Each quantization matrix is an $
8\times 8$ matrix of
16-bit values, which is
817 used to quantize the output of the $
8\times 8$ DCT\@.
818 Quantization matrices are specified using three components: a
819 \term{base matrix
} and two
\term{scale values
}.
820 The first scale value is the
\term{DC scale
}, which is applied to the DC
821 component of the base matrix.
822 The second scale value is the
\term{AC scale
}, which is applied to all the
823 other components of the base matrix.
824 There are
64 DC scale values and
64 AC scale values, one for each
\qi\ value.
826 There are
64 elements in each base matrix, one for each DCT coefficient.
827 They are stored in natural order (cf. Section~
\ref{sec:dct-coeffs
}).
828 There is a separate set of base matrices for each quantization type and each
829 color plane, with up to
64 possible base matrices in each set, one for each
831 %r: we will mention that the given matricies must bound the \qi range
832 %r: in the detailed section. it's not important at this level.
833 Typically the bitstream contains matrices for only a sparse subset of the
834 possible
\qi\ values.
835 The base matrices for the remainder of the
\qi\ values are computed using
836 linear interpolation.
837 This configuration allows the encoder to adjust the quantization matrices to
838 approximate the complex, non-linear response of the human visual system to
839 different quantization errors.
841 Finally, because the in-loop deblocking filter strength depends on the strength
842 of the quantization matrices defined in this header, a table of
64 \term{loop
843 filter limit values
} is defined, one for each
\qi\ value.
845 The precise specification of how all of this information is decoded appears in
846 Section~
\ref{sub:loop-filter-limits
} and Section~
\ref{sub:quant-params
}.
848 \subsection{Huffman Codebooks
}
850 Theora uses
80 configurable binary Huffman codes to represent the
32 tokens
851 used to encode DCT coefficients.
852 Each of the
32 token values has a different semantic meaning and is used to
853 represent single coefficient values, zero runs, combinations of the two, and
854 \term{End-Of-Block markers
}.
856 The
80 codes are divided up into five groups of
16, with each group
857 corresponding to a set of DCT coefficient indices.
858 The first group corresponds to the DC coefficient, while the remaining four
859 groups correspond to different subsets of the AC coefficients.
860 Within each frame, two pairs of
4-bit codebook indices are stored.
861 The first pair selects which codebooks to use from the DC coefficient group for
862 the $Y'$ coefficients and the $C_b$ and $C_r$ coefficients.
863 The second pair selects which codebooks to use from
{\em all four
} of the AC
864 coefficient groups for the $Y'$ coefficients and the $C_b$ and $C_r$
867 The precise specification of how the codebooks are decoded appears in
868 Section~
\ref{sub:huffman-tables
}.
870 \section{High-Level Decode Process
}
872 \subsection{Decoder Setup
}
874 Before decoding can begin, a decoder MUST be initialized using the bitstream
875 headers corresponding to the stream to be decoded.
876 Theora uses three header packets; all are required, in order, by this
878 Once set up, decode may begin at any intra-frame packet---or even inter-frame
879 packets, provided the appropriate decoded reference frames have already been
880 decoded and cached---belonging to the Theora stream.
881 In Theora I, all packets after the three initial headers are intra-frame or
884 The header packets are, in order, the identification header, the comment
885 header, and the setup header.
887 \paragraph{Identification Header
}
889 The identification header identifies the stream as Theora, provides a version
890 number, and defines the characteristics of the video stream such as frame
892 A complete description of the identification header appears in
893 Section~
\ref{sec:idheader
}.
895 \paragraph{Comment Header
}
897 The comment header includes user text comments (`tags') and a vendor string
898 for the application/library that produced the stream.
899 The format of the comment header is the same as that used in the Vorbis I and
900 Speex codecs, with slight modifications due to the use of a different bit
902 A complete description of how the comment header is coded appears in
903 Section~
\ref{sec:commentheader
}, along with a suggested set of tags.
905 \paragraph{Setup Header
}
907 The setup header includes extensive codec setup information, including the
908 complete set of quantization matrices and Huffman codebooks needed to decode
909 the DCT coefficients.
910 A complete description of the setup header appears in
911 Section~
\ref{sec:setupheader
}.
913 \subsection{Decode Procedure
}
915 The decoding and synthesis procedure for all video packets is fundamentally the
916 same, with some steps omitted for intra frames.
919 Decode packet type flag.
923 Decode coded block information (inter frames only).
925 Decode macro block mode information (inter frames only).
927 Decode motion vectors (inter frames only).
929 Decode block-level
\qi\ information.
931 Decode DC coefficient for each coded block.
933 Decode
1st AC coefficient for each coded block.
935 Decode
2nd AC coefficient for each coded block.
939 Decode
63rd AC coefficient for each coded block.
940 \item Perform DC coefficient prediction.
941 \item Reconstruct coded blocks.
942 \item Copy uncoded bocks.
943 \item Perform loop filtering.
947 {\bf Note:
} clever rearrangement of the steps in this process is possible.
948 As an example, in a memory-constrained environment, one can make multiple
949 passes through the DCT coefficients to avoid buffering them all in memory.
950 On the first pass, the starting location of each coefficient is identified, and
951 then
64 separate get pointers are used to read in the
64 DCT coefficients
952 required to reconstruct each coded block in sequence.
953 This operation produces entirely equivalent output and is naturally perfectly
955 It may even be a benefit in non-memory-constrained environments due to a
956 reduced cache footprint.
959 Theora makes equivalence easy to check by defining all decoding operations in
960 terms of exact integer operations.
961 No floating-point math is required, and in particular, the implementation of
962 the iDCT transform MUST be followed precisely.
963 This prevents the decoder mismatch problem commonly associated with codecs that
964 provide a less rigorous transform specification.
965 Such a mismatch problem would be devastating to Theora, since a single rounding
966 error in one frame could propagate throughout the entire succeeding frame due
969 \paragraph{Packet Type Decode
}
971 Theora uses four packet types.
972 The first three packet types mark each of the three Theora headers described
974 The fourth packet type marks a video packet.
975 All other packet types are reserved; packets marked with a reserved type should
978 Additionally, zero-length packets are treated as if they were an inter
979 frame with no blocks coded. That is, as a duplicate frame.
981 \paragraph{Frame Header Decode
}
983 The frame header contains some global information about the current frame.
984 The first is the frame type field, which specifies if this is an intra frame or
986 Inter frames predict their contents from previously decoded reference frames.
987 Intra frames can be independently decoded with no established reference frames.
989 The next piece of information in the frame header is the list of
\qi\ values
990 allowed in the frame.
991 Theora allows from one to three different
\qi\ values to be used in a single
992 frame, each of which selects a set of six quantization matrices, one for each
993 quantization type (inter or intra), and one for each
color plane.
994 The first
\qi\ value is
{\em always
} used when dequantizing DC coefficients.
995 The
\qi\ value used when dequantizing AC coefficients, however, can vary from
997 VP3, in contrast, only allows a single
\qi\ value per frame for both the DC and
1000 \paragraph{Coded Block Information
}
1002 This stage determines which blocks in the frame are coded and which are
1004 A
\term{coded block list
} is constructed which lists all the coded blocks in
1006 For intra frames, every block is coded, and so no data needs to be read from
1009 \paragraph{Macro Block Mode Information
}
1011 For intra frames, every block is coded in INTRA mode, and this stage is
1013 In inter frames a
\term{coded macro block list
} is constructed from the coded
1015 Any macro block which has at least one of its luma blocks coded is considered
1016 coded; all other macro blocks are uncoded, even if they contain coded chroma
1018 A coding mode is decoded for each coded macro block, and assigned to all its
1019 constituent coded blocks.
1020 All coded chroma blocks in uncoded macro blocks are assigned the INTER
\_NOMV
1023 \paragraph{Motion Vectors
}
1025 Intra frames are coded entirely in INTRA mode, and so this stage is skipped.
1026 Some inter coding modes, however, require one or more motion vectors to be
1027 specified for each macro block.
1028 These are decoded in this stage, and an appropriate motion vector is assigned
1029 to each coded block in the macro block.
1031 \paragraph{Block-Level
\qi\ Information
}
1033 If a frame allows multiple
\qi\ values, the
\qi\ value assigned to each block
1035 Frames that use only a single
\qi\ value have nothing to decode.
1037 \paragraph{DCT Coefficients
}
1039 Finally, the quantized DCT coefficients are decoded.
1040 A list of DCT coefficients in zig-zag order for a single block is represented
1041 by a list of tokens.
1042 A token can take on one of
32 different values, each with a different semantic
1044 A single token can represent a single DCT coefficient, a run of zero
1045 coefficients within a single block, a combination of a run of zero
1046 coefficients followed by a single non-zero coefficient, an
1047 \term{End-Of-Block marker
}, or a run of EOB markers.
1048 EOB markers signify that the remainder of the block is one long zero run.
1049 Unlike JPEG and MPEG, there is no requirement for each block to end with
1051 If non-EOB tokens yield values for all
64 of the coefficients in a block, then
1052 no EOB marker occurs.
1054 Each token is associated with a specific
\term{token index
} in a block.
1055 For single-coefficient tokens, this index is the zig-zag index of the token in
1057 For zero-run tokens, this index is the zig-zag index of the
{\em first
}
1058 coefficient in the run.
1059 For combination tokens, the index is again the zig-zag index of the first
1060 coefficient in the zero run.
1061 For EOB markers, which signify that the remainder of the block is one long zero
1062 run, the index is the zig-zag index of the first zero coefficient in that run.
1063 For EOB runs, the token index is that of the first EOB marker in the run.
1064 Due to zero runs and EOB markers, a block does not have to have a token for
1065 every zig-zag index.
1067 Tokens are grouped in the stream by token index, not by the block they
1069 This means that for each zig-zag index in turn, the tokens with that index from
1070 {\em all
} the coded blocks are coded in coded block order.
1071 When decoding, a current token index is maintained for each coded block.
1072 This index is advanced by the number of coefficients that are added to the
1073 block as each token is decoded.
1074 After fully decoding all the tokens with token index
\ti, the current token
1075 index of every coded block will be
\ti\ or greater.
1077 If an EOB run of $n$ blocks is decoded at token index
\ti, then it ends the
1078 next $n$ blocks in coded block order whose current token index is equal to
1079 \ti, but not greater.
1080 If there are fewer than $n$ blocks with a current token index of
\ti, then the
1081 decoder goes through the coded block list again from the start, ending blocks
1082 with a current token index of $
\ti+
1$, and so on, until $n$ blocks have been
1085 Tokens are read by parsing a Huffman code that depends on
\ti\ and the
color
1086 plane of the next coded block whose current token index is equal to
\ti, but
1088 The Huffman codebooks are selected on a per-frame basis from the
80 codebooks
1089 defined in the setup header.
1090 Many tokens have a fixed number of
\term{extra bits
} associated with them.
1091 These bits are read from the packet immediately after the token is decoded.
1092 These are used to define things such as coefficient magnitude, sign, and the
1095 \paragraph{DC Prediction
}
1097 After the coefficients for each block are decoded, the quantized DC value of
1098 each block is adjusted based on the DC values of its neighbors.
1099 This adjustment is performed by scanning the blocks in raster order, not coded
1102 \paragraph{Reconstruction
}
1104 Finally, using the coding mode, motion vector (if applicable), quantized
1105 coefficient list, and
\qi\ value defined for each block, all the coded blocks
1107 The DCT coefficients are dequantized, an inverse DCT transform is applied, and
1108 the predictor is formed from the coding mode and motion vector and added to
1111 \paragraph{Loop Filtering
}
1113 To complete the reconstructed frame, an ``in-loop'' deblocking filter is
1114 applied to the edges of all coded blocks.
1117 \chapter{Video Formats
}
1119 This section gives a precise description of the video formats that Theora is
1121 The Theora bitstream is capable of handling video at any arbitrary resolution
1122 up to $
1048560\times 1048560$.
1123 Such video would require almost three terabytes of storage per frame for
1124 uncompressed data, so compliant decoders MAY refuse to decode images with
1125 sizes beyond their capabilities.
1126 %TODO: What MUST a "compliant" decoder accept?
1127 %TODO: What SHOULD a decoder use for an upper bound? (derive from total amount
1128 %TODO: of memory and memory bandwidth)
1129 %TODO: Any lower limits?
1130 %TODO: We really need hardware device profiles, but such things should be
1131 %TODO: developed with input from the hardware community.
1132 %TODO: And even then sometimes they're useless
1134 The remainder of this section talks about two specific aspects of the video
1135 format: the
color space and the pixel format.
1136 The first describes how
color is represented and how to transform that
color
1137 representation into a device independent
color space such as CIE $XYZ$ (
1931).
1138 The second describes the various schemes for sampling the
color values in time
1141 \section{Color Space Conventions
}
1143 There are a large number of different
color standards used in digital video.
1144 Since Theora is a lossy codec, it restricts itself to only a few of them to
1146 Unlike the alternate method of describing all the parameters of the
color
1147 model, this allows a few dedicated routines for
color conversion to be written
1148 and heavily optimized in a decoder.
1149 More flexible conversion functions should instead be specified in an encoder,
1150 where additional computational complexity is more easily tolerated.
1151 The
color spaces were selected to give a fair representation of
color standards
1152 in use around the world today.
1153 Most of the standards that do not exactly match one of these can be converted
1154 to one fairly easily.
1156 All Theora
color spaces are $Y'C_bC_r$
color spaces with one luma channel and
1157 two chroma channels.
1158 Each channel contains
8-bit discrete values in the range $
0\ldots255$, which
1159 represent non-linear gamma pre-corrected signals.
1160 The Theora identification header contains an
8-bit value that describes the
1162 This merely selects one of the
color spaces available from an enumerated list.
1163 Currently, only two
color spaces are defined, with a third possibility that
1164 indicates the
color space is ``unknown".
1166 \section{Color Space Conversions and Parameters
}
1167 \label{sec:
color-xforms
}
1169 The parameters which describe the conversions between each
color space are
1171 These are the parameters needed to map colors from the encoded $Y'C_bC_r$
1172 representation to the device-independent
color space CIE $XYZ$ (
1931).
1173 These parameters define abstract mathematical conversion functions which are
1175 The accuracy and precision with which the conversions are performed in a real
1176 system is determined by the quality of output desired and the available
1178 Exact decoder output is defined by this specification only in the original
1182 \item[$Y'C_bC_r$ to $Y'P_bP_r$:
]
1183 \vspace{\baselineskip}\hfill
1185 This conversion takes
8-bit discrete values in the range $
[0\ldots255]$ and
1186 maps them to real values in the range $
[0\ldots1]$ for Y and
1187 $
[-
\frac{1}{2}\ldots\frac{1}{2}]$ for $P_b$ and $P_r$.
1188 Because some values may fall outside the offset and excursion defined for each
1189 channel in the $Y'C_bC_r$ space, the results may fall outside these ranges in
1191 No clamping should be done at this stage.
1195 \frac{Y'_
\mathrm{in
}-
\mathrm{Offset
}_Y
}{\mathrm{Excursion
}_Y
} \\
1197 \frac{C_b-
\mathrm{Offset
}_
{C_b
}}{\mathrm{Excursion
}_
{C_b
}} \\
1199 \frac{C_r-
\mathrm{Offset
}_
{C_r
}}{\mathrm{Excursion
}_
{C_r
}}
1202 Parameters: $
\mathrm{Offset
}_
{Y,C_b,C_r
}$, $
\mathrm{Excursion
}_
{Y,C_b,C_r
}$.
1204 \item[$Y'P_bP_r$ to $R'G'B'$:
]
1205 \vspace{\baselineskip}\hfill
1207 This conversion takes the one luma and two chroma channel representation and
1208 maps it to the non-linear $R'G'B'$ space used to drive actual output devices.
1209 Values should be clamped into the range $
[0\ldots1]$ after this stage.
1212 R' & = Y'+
2(
1-K_r)P_r \\
1213 G' & = Y'-
2\frac{(
1-K_b)K_b
}{1-K_b-K_r
}P_b-
2\frac{(
1-K_r)K_r
}{1-K_b-K_r
}P_r\\
1214 B' & = Y'+
2(
1-K_b)P_b
1217 Parameters: $K_b,K_r$.
1219 \item[$R'G'B'$ to $RGB$ (Output device gamma correction):
]
1220 \vspace{\baselineskip}\hfill
1222 This conversion takes the non-linear $R'G'B'$ voltage levels and maps them to
1223 linear light levels produced by the actual output device.
1224 Note that this conversion is only that of the output device, and its inverse is
1225 {\em not
} that used by the input device.
1226 Because a dim viewing environment is assumed in most television standards, the
1227 overall gamma between the input and output devices is usually around $
1.1$ to
1228 $
1.2$, and not a strict $
1.0$.
1230 For calibration with actual output devices, the model
1232 L & =(E'+
\Delta)^
\gamma
1234 should be used, with $
\Delta$ the free parameter and $
\gamma$ held fixed to
1235 the value specified in this
document.
1236 The conversion function presented here is an idealized version with $
\Delta=
0$.
1244 Parameters: $
\gamma$.
1246 \item[$RGB$ to $R'G'B'$ (Input device gamma correction):
]
1247 \vspace{\baselineskip}\hfill
1249 %TODO: Tag section as non-normative
1251 This conversion takes linear light levels and maps them to the non-linear
1252 voltage levels produced in the actual input device.
1253 This information is merely informative.
1254 It is not required for building a decoder or for converting between the various
1255 formats and the actual output capabilities of a particular device.
1257 A linear segment is introduced on the low end to reduce noise in dark areas of
1259 The rest of the scale is adjusted so that the power segment of the curve
1260 intersects the linear segment with the proper slope, and so that it still maps
1266 \alpha R, &
0\le R<
\delta \\
1267 (
1+
\epsilon)R^
\beta-
\epsilon, &
\delta\le R
\le1
1268 \end{array
}\right. \\
1271 \alpha G, &
0\le G<
\delta \\
1272 (
1+
\epsilon)G^
\beta-
\epsilon, &
\delta\le G
\le1
1273 \end{array
}\right. \\
1276 \alpha B, &
0\le B<
\delta \\
1277 (
1+
\epsilon)B^
\beta-
\epsilon, &
\delta\le B
\le1
1281 Parameters: $
\beta$, $
\alpha$, $
\delta$, $
\epsilon$.
1283 \item[$RGB$ to CIE $XYZ$ (
1931):
]
1284 \vspace{\baselineskip}\hfill
1286 This conversion maps a device-dependent linear RGB space to the
1287 device-independent linear CIE $XYZ$ space.
1288 The parameters are the CIE chromaticity coordinates of the three
1289 primaries---red, green, and blue---as well as the chromaticity coordinates
1290 of the white point of the device.
1291 This is how hardware manufacturers and standards typically describe a
1292 particular $RGB$ space.
1293 The math required to convert these parameters into a useful transformation
1294 matrix is reproduced below.
1298 \left[\begin{array
}{ccc
}
1299 \frac{x_r
}{y_r
} &
\frac{x_g
}{y_g
} &
\frac{x_b
}{y_b
} \\
1301 \frac{1-x_r-y_r
}{y_r
} &
\frac{1-x_g-y_g
}{y_g
} &
\frac{1-x_b-y_b
}{y_b
}
1302 \end{array
}\right] \\
1303 \left[\begin{array
}{c
}
1307 \end{array
}\right] & =
1308 F^
{-
1}\left[\begin{array
}{c
}
1311 \frac{1-x_w-y_w
}{y_w
}
1312 \end{array
}\right] \\
1313 \left[\begin{array
}{c
}
1317 \end{array
}\right] & =
1318 F
\left[\begin{array
}{c
}
1324 Parameters: $x_r,x_g,x_b,x_w, y_r,y_g,y_b,y_w$.
1328 \section{Available Color Spaces
}
1329 \label{sec:colorspaces
}
1331 These are the
color spaces currently defined for use by Theora video.
1332 Each one has a short name, with which it is referred to in this
document, and
1333 a more detailed specification of the standards from which its parameters are
1335 Some standards do not specify all the parameters necessary.
1336 For these unspecified parameters, this
document serves as the definition of
1337 what should be used when encoding or decoding Theora video.
1339 \subsection{Rec.~
470M (Rec.~ITU-R~BT
.470-
6 System M/NTSC with
1340 Rec.~ITU-R~BT
.601-
5)
}
1343 This
color space is used by broadcast television and DVDs in much of the
1344 Americas, Japan, Korea, and the Union of Myanmar
\cite{rec470
}.
1345 This
color space may also be used for System M/PAL (Brazil), with an
1346 appropriate conversion supplied by the encoder to compensate for the
1347 different gamma value.
1348 See Section~
\ref{sec:
470bg
} for an appropriate gamma value to assume for M/PAL
1351 In the US, studio monitors are adjusted to a D65 white point
1352 ($x_w,y_w=
0.313,
0.329$).
1353 In Japan, studio monitors are adjusted to a D white of
9300K
1354 ($x_w,y_w=
0.285,
0.293$).
1356 Rec.~
470 does not specify a digital encoding of the
color signals.
1357 For Theora, Rec.~ITU-R~BT
.601-
5 \cite{rec601
} is used, starting from the
1358 $R'G'B'$ signals specified by Rec.~
470.
1360 Rec.~
470 does not specify an input gamma function.
1361 For Theora, the Rec.~
709 \cite{rec709
} input function is assumed.
1362 This is the same as that specified by SMPTE
170M
\cite{smpte170m
}, which claims
1363 to reflect modern practice in the creation of NTSC signals circa
1994.
1365 The parameters for all the
color transformations defined in
1366 Section~
\ref{sec:
color-xforms
} are given in Table~
\ref{tab:
470m
}.
1370 \mathrm{Offset
}_
{Y,C_b,C_r
} & = (
16,
128,
128) \\
1371 \mathrm{Excursion
}_
{Y,C_b,C_r
} & = (
219,
224,
224) \\
1378 \epsilon & =
0.099 \\
1379 x_r,y_r & =
0.67,
0.33 \\
1380 x_g,y_g & =
0.21,
0.71 \\
1381 x_b,y_b & =
0.14,
0.08 \\
1382 \text{(Illuminant C)
} x_w,y_w & =
0.310,
0.316 \\
1384 \caption{Rec.~
470M Parameters
}
1388 \subsection{Rec.~
470BG (Rec.~ITU-R~BT
.470-
6 Systems B and G with
1389 Rec.~ITU-R~BT
.601-
5)
}
1392 This
color space is used by the PAL and SECAM systems in much of the rest of
1393 the world
\cite{rec470
}
1394 This can be used directly by systems (B, B1, D, D1, G, H, I, K, N)/PAL and (B,
1395 D, G, H, K, K1, L)/SECAM\@.
1398 {\bf Note:
} the Rec.~
470BG chromaticity values are different from those
1399 specified in Rec.~
470M\@.
1400 When PAL and SECAM systems were first designed, they were based upon the same
1401 primaries as NTSC\@.
1402 However, as methods of making
color picture tubes have changed, the primaries
1403 used have changed as well.
1404 The U.S. recommends using correction circuitry to approximate the existing,
1405 standard NTSC primaries.
1406 Current PAL and SECAM systems have standardized on primaries in accord with
1407 more recent technology.
1410 Rec.~
470 provisionally permits the use of the NTSC chromaticity values (given
1411 in Section~
\ref{sec:
470m
}) with legacy PAL and SECAM equipment.
1412 In Theora, material must be decoded assuming the new PAL and SECAM primaries.
1413 Material intended for display on old legacy devices should be converted by the
1416 The official Rec.~
470BG specifies a gamma value of $
\gamma=
2.8$.
1417 However, in practice this value is unrealistically high
\cite{Poyn97
}.
1418 Rec.~
470BG states that the overall system gamma should be approximately
1420 Since most cameras pre-correct with a gamma value of $
\beta=
0.45$,
1421 this suggests an output device gamma of approximately $
\gamma=
2.67$.
1422 This is the value recommended for use with PAL systems in Theora.
1424 Rec.~
470 does not specify a digital encoding of the
color signals.
1425 For Theora, Rec.~ITU-R~BT
.601-
5 \cite{rec601
} is used, starting from the
1426 $R'G'B'$ signals specified by Rec.~
470.
1428 Rec.~
470 does not specify an input gamma function.
1429 For Theora, the Rec
709 \cite{rec709
} input function is assumed.
1431 The parameters for all the
color transformations defined in
1432 Section~
\ref{sec:
color-xforms
} are given in Table~
\ref{tab:
470bg
}.
1436 \mathrm{Offset
}_
{Y,C_b,C_r
} & = (
16,
128,
128) \\
1437 \mathrm{Excursion
}_
{Y,C_b,C_r
} & = (
219,
224,
224) \\
1444 \epsilon & =
0.099 \\
1445 x_r,y_r & =
0.64,
0.33 \\
1446 x_g,y_g & =
0.29,
0.60 \\
1447 x_b,y_b & =
0.15,
0.06 \\
1448 \text{(D65)
} x_w,y_w & =
0.313,
0.329 \\
1450 \caption{Rec.~
470BG Parameters
}
1454 \section{Pixel Formats
}
1457 Theora supports several different pixel formats, each of which uses different
1458 subsampling for the chroma planes relative to the luma plane.
1459 A decoder may need to recover a full resolution chroma plane with samples
1460 co-sited with the luma plane in order to convert to RGB for display or perform
1462 Decoders can assume that the chroma signal satisfies the Nyquist-Shannon
1464 The ideal low-pass reconstruction filter this implies is not practical, but any
1465 suitable approximation can be used, depending on the available computing
1467 Decoders MAY simply use a box filter, assigning to each luma sample the chroma
1468 sample closest to it.
1469 Encoders would not go wrong in assuming that this will be the most common
1472 \subsection{4:
4:
4 Subsampling
}
1475 All three
color planes are stored at full resolution---each pixel has a $Y'$,
1476 a $C_b$ and a $C_r$ value (see Figure~
\ref{fig:pixel444
}).
1477 The samples in the different planes are all at co-located sites.
1479 \begin{figure
}[htbp
]
1481 \includegraphics{pixel444
}
1483 \caption{Pixels encoded
4:
4:
4}
1484 \label{fig:pixel444
}
1498 \subsection{4:
2:
2 Subsampling
}
1501 The $C_b$ and $C_r$ planes are stored with half the horizontal resolution of
1503 Thus, each of these planes has half the number of horizontal blocks as the luma
1504 plane (see Figure~
\ref{fig:pixel422
}).
1505 Similarly, they have half the number of horizontal super blocks, rounded up.
1506 Macro blocks are defined across
color planes, and so their number does not
1507 change, but each macro block contains half as many chroma blocks.
1509 The chroma samples are vertically aligned with the luma samples, but
1510 horizontally centered between two luma samples.
1511 Thus, each luma sample has a unique closest chroma sample.
1512 A horizontal phase shift may be required to produce signals which use different
1513 horizontal chroma sampling locations for compatibility with different systems.
1515 \begin{figure
}[htbp
]
1517 \includegraphics{pixel422
}
1519 \caption{Pixels encoded
4:
2:
2}
1520 \label{fig:pixel422
}
1533 \subsection{4:
2:
0 Subsampling
}
1536 The $C_b$ and $C_r$ planes are stored with half the horizontal and half the
1537 vertical resolution of the $Y'$ plane.
1538 Thus, each of these planes has half the number of horizontal blocks and half
1539 the number of vertical blocks as the luma plane, for a total of one quarter
1540 the number of blocks (see Figure~
\ref{fig:pixel420
}).
1541 Similarly, they have half the number of horizontal super blocks and half the
1542 number of vertical super blocks, rounded up.
1543 Macro blocks are defined across
color planes, and so their number does not
1544 change, but each macro block contains within it one quarter as many
1547 The chroma samples are vertically and horizontally centered between four luma
1549 Thus, each luma sample has a unique closest chroma sample.
1550 This is the same sub-sampling pattern used with JPEG, MJPEG, and MPEG-
1, and
1551 was inherited from VP3.
1552 A horizontal or vertical phase shift may be required to produce signals which
1553 use different chroma sampling locations for compatibility with different
1556 \begin{figure
}[htbp
]
1558 \includegraphics{pixel420
}
1560 \caption{Pixels encoded
4:
2:
0}
1561 \label{fig:pixel420
}
1582 \subsection{Subsampling and the Picture Region
}
1584 Although the frame size must be an integral number of macro blocks, and thus
1585 both the number of pixels and the number of blocks in each direction must be
1586 even, no such requirement is made of the picture region.
1587 Thus, when using subsampled pixel formats, careful attention must be paid to
1588 which chroma samples correspond to which luma samples.
1590 As mentioned above, for each pixel format, there is a unique chroma sample that
1591 is the closest to each luma sample.
1592 When cropping the chroma planes to the picture region, all the chroma samples
1593 corresponding to a luma sample in the cropped picture region must be included.
1594 Thus, when dividing the width or height of the picture region by two to obtain
1595 the size of the subsampled chroma planes, they must be rounded up.
1597 Furthermore, the sampling locations are defined relative to the frame,
1598 {\em not
} the picture region.
1599 When using the
4:
2:
2 and
4:
2:
0 formats, the locations of chroma samples
1600 relative to the luma samples depends on whether or not the X offset of the
1601 picture region is odd.
1602 If the offset is even, each column of chroma samples corresponds to two columns
1603 of luma samples (see Figure~
\ref{fig:pic_even
} for an example).
1604 The only exception is if the width is odd, in which case the last column
1605 corresponds to only one column of luma samples (see Figure~
\ref{fig:pic_even_odd
}).
1606 If the offset is odd, then the first column of chroma samples corresponds to
1607 only one column of luma samples, while the remaining columns each correspond
1608 to two (see Figure~
\ref{fig:pic_odd
}).
1609 In this case, if the width is even, the last column again corresponds to only
1610 one column of luma samples (see Figure~
\ref{fig:pic_odd_even
}).
1612 A similar process is followed with the rows of a picture region of odd height
1613 encoded in the
4:
2:
0 format.
1614 If the Y offset is even, each row of chroma samples corresponds to two rows of
1615 luma samples (see Figure~
\ref{fig:pic_even
}), except with an odd height, where
1616 the last row corresponds to one row of chroma luna samples only (see
1617 Figure~
\ref{fig:pic_even_odd
}).
1618 If the offset is odd, then it is the first row of chroma samples which
1619 corresponds to only one row of luma samples, while the remaining rows each
1620 correspond to two (Figure~
\ref{fig:pic_odd
}), except with an even height,
1621 where the last row also corresponds to one (Figure~
\ref{fig:pic_odd_even
}).
1623 Encoders should be aware of these differences in the subsampling when using an
1625 In the typical case, with an even width and height, where one expects two rows
1626 or columns of luma samples for every row or column of chroma samples, the
1627 encoder must take care to ensure that the offsets used are both even.
1629 \begin{figure
}[htbp
]
1631 \includegraphics[width=
\textwidth]{pic_even
}
1633 \caption{Pixel correspondence between
color planes with even picture
1634 offset and even picture size
}
1635 \label{fig:pic_even
}
1638 \begin{figure
}[htbp
]
1640 \includegraphics[width=
\textwidth]{pic_even_odd
}
1642 \caption{Pixel correspondence with even picture offset and
1644 \label{fig:pic_even_odd
}
1647 \begin{figure
}[htbp
]
1649 \includegraphics[width=
\textwidth]{pic_odd
}
1651 \caption{Pixel correspondence with odd picture offset and
1656 \begin{figure
}[htbp
]
1658 \includegraphics[width=
\textwidth]{pic_odd_even
}
1660 \caption{Pixel correspondence with odd picture offset and
1662 \label{fig:pic_odd_even
}
1666 \chapter{Bitpacking Convention
}
1667 \label{sec:bitpacking
}
1671 The Theora codec uses relatively unstructured raw packets containing
1672 binary integer fields of arbitrary width.
1673 Logically, each packet is a bitstream in which bits are written one-by-one by
1674 the encoder and then read one-by-one in the same order by the decoder.
1675 Most current binary storage arrangements group bits into a native storage unit
1676 of eight bits (octets), sixteen bits, thirty-two bits, or less commonly other
1678 The Theora bitpacking convention specifies the correct mapping of the logical
1679 packet bitstream into an actual representation in fixed-width units.
1681 \subsection{Octets and Bytes
}
1683 In most contemporary architectures, a `byte' is synonymous with an `octect',
1684 that is, eight bits.
1685 For purposes of the bitpacking convention, a byte implies the smallest native
1686 integer storage representation offered by a platform.
1687 Modern file systems invariably offer bytes as the fundamental atom of storage.
1689 The most ubiquitous architectures today consider a `byte' to be an octet.
1690 Note, however, that the Theora bitpacking convention is still well defined for
1691 any native byte size; an implementation can use the native bit-width of a
1692 given storage system.
1693 This
document assumes that a byte is one octet for purposes of example only.
1695 \subsection{Words and Byte Order
}
1697 A `word' is an integer size that is a grouped multiple of the byte size.
1698 Most architectures consider a word to be a group of two, four, or eight bytes.
1699 Each byte in the word can be ranked by order of `significance', e.g.\ the
1700 significance of the bits in each byte when storing a binary integer in the
1702 Several byte orderings are possible in a word.
1706 in which the most significant byte comes first, e.g.\
3-
2-
1-
0,
1707 \item{Little-endian:
}
1708 in which the least significant byte comes first, e.g.\
0-
1-
2-
3, and
1709 \item{Mixed-endian:
}
1710 one of the less-common orderings that cannot be put into the above two
1711 categories, e.g.\
3-
1-
2-
0 or
0-
2-
1-
3.
1714 The Theora bitpacking convention specifies storage and bitstream manipulation
1715 at the byte, not word, level.
1716 Thus host word ordering is of a concern only during optimization, when writing
1717 code that operates on a word of storage at a time rather than a byte.
1718 Logically, bytes are always encoded and decoded in order from byte zero through
1721 \subsection{Bit Order
}
1723 A byte has a well-defined `least significant' bit (LSb), which is the only bit
1724 set when the byte is storing the two's complement integer value $+
1$.
1725 A byte's `most significant' bit (MSb) is at the opposite end.
1726 Bits in a byte are numbered from zero at the LSb to $n$ for the MSb, where
1729 \section{Coding Bits into Bytes
}
1731 The Theora codec needs to encode arbitrary bit-width integers from zero to
32
1732 bits wide into packets.
1733 These integer fields are not aligned to the boundaries of the byte
1734 representation; the next field is read at the bit position immediately
1735 after the end of the previous field.
1737 The decoder logically unpacks integers by first reading the MSb of a binary
1738 integer from the logical bitstream, followed by the next most significant
1739 bit, etc., until the required number of bits have been read.
1740 When unpacking the bytes into bits, the decoder begins by reading the MSb of
1741 the integer to be read from the most significant unread bit position of the
1742 source byte, followed by the next-most significant bit position of the
1743 destination integer, and so on up to the requested number of bits.
1744 Note that this differs from the Vorbis I codec, which
1745 begins decoding with the LSb of the source integer, reading it from the
1746 LSb of the source byte.
1747 When all the bits of the current source byte are read, decoding continues with
1748 the MSb of the next byte.
1749 Any unfilled bits in the last byte of the packet MUST be cleared to zero by the
1752 \subsection{Signedness
}
1754 The binary integers decoded by the above process may be either signed or
1756 This varies from integer to integer, and this specification
1757 indicates how each value should be interpreted as it is read.
1758 That is, depending on context, the three bit binary pattern
\bin{111} can be
1759 taken to represent either `$
7$' as an unsigned integer or `$-
1$' as a signed,
1760 two's complement integer.
1762 \subsection{Encoding Example
}
1764 The following example shows the state of an (
8-bit) byte stream after several
1765 binary integers are encoded, including the location of the put pointer for the
1766 next bit to write to and the total length of the stream in bytes.
1768 Encode the
4 bit unsigned integer value `
12' (
\bin{1100}) into an empty byte
1771 \begin{tabular
}{r|ccccccccl
}
1772 \multicolumn{1}{r
}{}& &&&&$
\downarrow$&&&& \\
1773 &
7 &
6 &
5 &
4 &
3 &
2 &
1 &
0 & \\
\cline{1-
9}
1774 byte
0 &
\textbf{1} &
\textbf{1} &
\textbf{0} &
\textbf{0} &
1775 0 &
0 &
0 &
0 & $
\leftarrow$ \\
1776 byte
1 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1777 byte
2 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1778 byte
3 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1779 \multicolumn{1}{c|
}{$
\vdots$
}&
\multicolumn{8}{c
}{$
\vdots$
}& \\
1780 byte $n$ &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
1781 byte stream length:
1 byte
1783 \vspace{\baselineskip}
1785 Continue by encoding the
3 bit signed integer value `-
1' (
\bin{111}).
1787 \begin{tabular
}{r|ccccccccl
}
1788 \multicolumn{1}{r
}{} &&&&&&&&$
\downarrow$& \\
1789 &
7 &
6 &
5 &
4 &
3 &
2 &
1 &
0 & \\
\cline{1-
9}
1790 byte
0 &
\textbf{1} &
\textbf{1} &
\textbf{0} &
\textbf{0} &
1791 \textbf{1} &
\textbf{1} &
\textbf{1} &
0 & $
\leftarrow$ \\
1792 byte
1 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1793 byte
2 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1794 byte
3 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1795 \multicolumn{1}{c|
}{$
\vdots$
}&
\multicolumn{8}{c
}{$
\vdots$
}& \\
1796 byte $n$ &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
1797 byte stream length:
1 byte
1799 \vspace{\baselineskip}
1801 Continue by encoding the
7 bit integer value `
17' (
\bin{0010001}).
1803 \begin{tabular
}{r|ccccccccl
}
1804 \multicolumn{1}{r
}{} &&&&&&&$
\downarrow$&& \\
1805 &
7 &
6 &
5 &
4 &
3 &
2 &
1 &
0 & \\
\cline{1-
9}
1806 byte
0 &
\textbf{1} &
\textbf{1} &
\textbf{0} &
\textbf{0} &
1807 \textbf{1} &
\textbf{1} &
\textbf{1} &
\textbf{0} & \\
1808 byte
1 &
\textbf{0} &
\textbf{1} &
\textbf{0} &
\textbf{0} &
1809 \textbf{0} &
\textbf{1} &
0 &
0 & $
\leftarrow$ \\
1810 byte
2 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1811 byte
3 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 & \\
1812 \multicolumn{1}{c|
}{$
\vdots$
}&
\multicolumn{8}{c
}{$
\vdots$
}& \\
1813 byte $n$ &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
1814 byte stream length:
2 bytes
1816 \vspace{\baselineskip}
1818 Continue by encoding the
13 bit integer value `
6969' (
\bin{11011\
00111001}).
1820 \begin{tabular
}{r|ccccccccl
}
1821 \multicolumn{1}{r
}{} &&&&$
\downarrow$&&&&& \\
1822 &
7 &
6 &
5 &
4 &
3 &
2 &
1 &
0 & \\
\cline{1-
9}
1823 byte
0 &
\textbf{1} &
\textbf{1} &
\textbf{0} &
\textbf{0} &
1824 \textbf{1} &
\textbf{1} &
\textbf{1} &
\textbf{0} & \\
1825 byte
1 &
\textbf{0} &
\textbf{1} &
\textbf{0} &
\textbf{0} &
1826 \textbf{0} &
\textbf{1} &
\textbf{1} &
\textbf{1} & \\
1827 byte
2 &
\textbf{0} &
\textbf{1} &
\textbf{1} &
\textbf{0} &
1828 \textbf{0} &
\textbf{1} &
\textbf{1} &
\textbf{1} & \\
1829 byte
3 &
\textbf{0} &
\textbf{0} &
\textbf{1} &
1830 0 &
0 &
0 &
0 &
0 & $
\leftarrow$ \\
1831 \multicolumn{1}{c|
}{$
\vdots$
}&
\multicolumn{8}{c
}{$
\vdots$
}& \\
1832 byte $n$ &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
0 &
1833 byte stream length:
4 bytes
1835 \vspace{\baselineskip}
1837 \subsection{Decoding Example
}
1839 The following example shows the state of the (
8-bit) byte stream encoded in the
1840 previous example after several binary integers are decoded, including the
1841 location of the get pointer for the next bit to read.
1843 Read a two bit unsigned integer from the example encoded above.
1845 \begin{tabular
}{r|ccccccccl
}
1846 \multicolumn{1}{r
}{} &&&$
\downarrow$&&&&&& \\
1847 &
7 &
6 &
5 &
4 &
3 &
2 &
1 &
0 & \\
\cline{1-
9}
1848 byte
0 &
\textbf{1} &
\textbf{1} &
0 &
0 &
1 &
1 &
1 &
0 & $
\leftarrow$ \\
1849 byte
1 &
0 &
1 &
0 &
0 &
0 &
1 &
1 &
1 & \\
1850 byte
2 &
0 &
1 &
1 &
0 &
0 &
1 &
1 &
1 & \\
1851 byte
3 &
0 &
0 &
1 &
0 &
0 &
0 &
0 &
0 &
1852 byte stream length:
4 bytes
1854 \vspace{\baselineskip}
1856 Value read:
3 (
\bin{11}).
1858 Read another two bit unsigned integer from the example encoded above.
1860 \begin{tabular
}{r|ccccccccl
}
1861 \multicolumn{1}{r
}{} &&&&&$
\downarrow$&&&& \\
1862 &
7 &
6 &
5 &
4 &
3 &
2 &
1 &
0 & \\
\cline{1-
9}
1863 byte
0 &
\textbf{1} &
\textbf{1} &
\textbf{0} &
\textbf{0} &
1864 1 &
1 &
1 &
0 & $
\leftarrow$ \\
1865 byte
1 &
0 &
1 &
0 &
0 &
0 &
1 &
1 &
1 & \\
1866 byte
2 &
0 &
1 &
1 &
0 &
0 &
1 &
1 &
1 & \\
1867 byte
3 &
0 &
0 &
1 &
0 &
0 &
0 &
0 &
0 &
1868 byte stream length:
4 bytes
1870 \vspace{\baselineskip}
1872 Value read:
0 (
\bin{00}).
1874 Two things are worth noting here.
1877 Although these four bits were originally written as a single four-bit integer,
1878 reading some other combination of bit-widths from the bitstream is well
1880 No artificial alignment boundaries are maintained in the bitstream.
1882 The first value is the integer `$
3$' only because the context stated we were
1883 reading an unsigned integer.
1884 Had the context stated we were reading a signed integer, the returned value
1885 would have been the integer `$-
1$'.
1888 \subsection{End-of-Packet Alignment
}
1890 The typical use of bitpacking is to produce many independent byte-aligned
1891 packets which are embedded into a larger byte-aligned container structure,
1892 such as an Ogg transport bitstream.
1893 Externally, each bitstream encoded as a byte stream MUST begin and end on a
1895 Often, the encoded packet bitstream is not an integer number of bytes, and so
1896 there is unused space in the last byte of a packet.
1898 %r: I think the generality here is necessary to be consistent with our assertions
1899 %r: elsewhere about being independent of transport and byte width
1900 When a Theora encoder produces packets for embedding in a byte-aligned
1901 container, unused space in the last byte of a packet is always zeroed during
1902 the encoding process.
1903 Thus, should this unused space be read, it will return binary zeroes.
1904 There is no marker pattern or stuffing bits that will allow the decoder to
1905 obtain the exact size, in bits, of the original bitstream.
1906 This knowledge is not required for decoding.
1908 Attempting to read past the end of an encoded packet results in an
1909 `end-of-packet' condition.
1910 Any further read operations after an `end-of-packet' condition shall also
1911 return `end-of-packet'.
1912 Unlike Vorbis, Theora does not use truncated packets as a normal mode of
1914 Therefore if a decoder encounters the `end-of-packet' condition during normal
1915 decoding, it may attempt to use the bits that were read to recover as much of
1916 encoded data as possible, signal a warning or error, or both.
1918 \subsection{Reading Zero Bit Integers
}
1920 Reading a zero bit integer returns the value `$
0$' and does not increment
1922 Reading to the end of the packet, but not past the end, so that an
1923 `end-of-packet' condition is not triggered, and then reading a zero bit
1924 integer shall succeed, returning `$
0$', and not trigger an `end-of-packet'
1926 Reading a zero bit integer after a previous read sets the `end-of-packet'
1927 condition shall fail, also returning `end-of-packet'.
1929 \chapter{Bitstream Headers
}
1932 A Theora bitstream begins with three header packets.
1933 The header packets are, in order, the identification header, the comment
1934 header, and the setup header.
1935 All are required for decode compliance.
1936 An end-of-packet condition encountered while decoding the identification or
1937 setup header packets renders the stream undecodable.
1938 An end-of-packet condition encountered while decode the comment header is a
1939 non-fatal error condition, and MAY be ignored by a decoder.
1941 \paragraph{VP3 Compatibility
}
1943 VP3 relies on the headers provided by its container, usually either AVI or
1945 As such, several parameters available in these headers are not available to VP3
1947 These are indicated as they appear in the sections below.
1949 \section{Common Header Decode
}
1950 \label{sub:common-header
}
1952 \begin{figure
}[Htbp
]
1956 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1957 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1958 | header type | `t' | `h' | `e' |
1959 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1960 | `o' | `r' | `a' | data... |
1961 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1962 | ... header-specific data ... |
1964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1967 \caption{Common Header Packet Layout
}
1968 \label{fig:commonheader
}
1972 \paragraph{Input parameters:
} None.
1974 \paragraph{Output parameters:
}\hfill\\*
1975 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
1976 \multicolumn{1}{c
}{Name
} &
1977 \multicolumn{1}{c
}{Type
} &
1978 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
1979 \multicolumn{1}{c
}{Signed?
} &
1980 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
1981 \bitvar{HEADERTYPE
} & Integer &
8 & No & The type of the header being
1983 \bottomrule\end{tabularx
}
1985 \paragraph{Variables used:
} None.
1988 Each header packet begins with the same header fields, which are decoded as
1993 Read an
8-bit unsigned integer as
\bitvar{HEADERTYPE
}.
1994 If the most significant bit of this integer is not set, then stop.
1995 This is not a header packet.
1997 Read
6 8-bit unsigned integers.
1998 If these do not have the values
\hex{74},
\hex{68},
\hex{65},
\hex{6F
},
1999 \hex{72}, and
\hex{61}, respectively, then stop.
2000 This stream is not decodable by this specification.
2001 These values correspond to the ASCII values of the characters `t', `h', `e',
2005 Decode continues according to
\bitvar{HEADERTYPE
}.
2006 The identification header is type
\hex{80}, the comment header is type
2007 \hex{81}, and the setup header is type
\hex{82}.
2008 These packets must occur in the order: identification, comment, setup.
2009 %r: I clarified the initial-bit scheme here
2010 %TBT: Dashes let the reader know they'll have to pick up the rest of the
2011 %TBT: sentence after the explanatory phrase.
2012 %TBT: Otherwise it just sounds like the bit must exist.
2013 All header packets have the most significant bit of the type
2014 field---which is the initial bit in the packet---set.
2015 This distinguishes them from video data packets in which the first bit
2017 % extra header packets are a feature Dan argued for way back when for
2018 % backward-compatible extensions (and icc colourspace for example)
2019 % I think it's reasonable
2020 %TBT: You can always just stick more stuff in the setup header.
2021 Packets with other header types (
\hex{83}--
\hex{FF
}) are reserved and MUST be
2024 \section{Identification Header Decode
}
2025 \label{sec:idheader
}
2027 \begin{figure
}[Htbp
]
2031 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2032 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2033 |
0x80 | `t' | `h' | `e' |
2034 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2035 | `o' | `r' | `a' | VMAJ |
2036 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2037 | VMIN | VREV | FMBW |
2038 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2040 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2042 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2043 | PICX | PICY | FRN... |
2044 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2046 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2047 | ...FRD | PARN... |
2048 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2050 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2052 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2053 | QUAL | KFGSHIFT| PF| Res |
2054 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2057 \caption{Identification Header Packet
}
2058 \label{fig:idheader
}
2061 \paragraph{Input parameters:
} None.
2063 \paragraph{Output parameters:
}\hfill\\*
2064 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2065 \multicolumn{1}{c
}{Name
} &
2066 \multicolumn{1}{c
}{Type
} &
2067 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2068 \multicolumn{1}{c
}{Signed?
} &
2069 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2070 \bitvar{VMAJ
} & Integer &
8 & No & The major version number. \\
2071 \bitvar{VMIN
} & Integer &
8 & No & The minor version number. \\
2072 \bitvar{VREV
} & Integer &
8 & No & The version revision number. \\
2073 \bitvar{FMBW
} & Integer &
16 & No & The width of the frame in macro
2075 \bitvar{FMBH
} & Integer &
16 & No & The height of the frame in macro
2077 \bitvar{NSBS
} & Integer &
32 & No & The total number of super blocks in a
2079 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
2081 \bitvar{NMBS
} & Integer &
32 & No & The total number of macro blocks in a
2083 \bitvar{PICW
} & Integer &
20 & No & The width of the picture region in
2085 \bitvar{PICH
} & Integer &
20 & No & The height of the picture region in
2087 \bitvar{PICX
} & Integer &
8 & No & The X offset of the picture region in
2089 \bitvar{PICY
} & Integer &
8 & No & The Y offset of the picture region in
2091 \bitvar{FRN
} & Integer &
32 & No & The frame-rate numerator. \\
2092 \bitvar{FRD
} & Integer &
32 & No & The frame-rate denominator. \\
2093 \bitvar{PARN
} & Integer &
24 & No & The pixel aspect-ratio numerator. \\
2094 \bitvar{PARD
} & Integer &
24 & No & The pixel aspect-ratio denominator. \\
2095 \bitvar{CS
} & Integer &
8 & No & The
color space. \\
2096 \bitvar{PF
} & Integer &
2 & No & The pixel format. \\
2097 \bitvar{NOMBR
} & Integer &
24 & No & The nominal bitrate of the stream, in
2099 \bitvar{QUAL
} & Integer &
6 & No & The quality hint. \\
2100 \bitvar{KFGSHIFT
} & Integer &
5 & No & The amount to shift the key frame
2101 number by in the granule position. \\
2102 \bottomrule\end{tabularx
}
2104 \paragraph{Variables used:
} None.
2107 The identification header is a short header with only a few fields used to
2108 declare the stream definitively as Theora and provide detailed information
2109 about the format of the fully decoded video data.
2110 The identification header is decoded as follows:
2114 Decode the common header fields according to the procedure described in
2115 Section~
\ref{sub:common-header
}.
2116 If
\bitvar{HEADERTYPE
} returned by this procedure is not
\hex{80}, then stop.
2117 This packet is not the identification header.
2119 Read an
8-bit unsigned integer as
\bitvar{VMAJ
}.
2120 If
\bitvar{VMAJ
} is not $
3$, then stop.
2121 This stream is not decodable according to this specification.
2123 Read an
8-bit unsigned integer as
\bitvar{VMIN
}.
2124 If
\bitvar{VMIN
} is not $
2$, then stop.
2125 This stream is not decodable according to this specification.
2127 Read an
8-bit unsigned integer as
\bitvar{VREV
}.
2128 If
\bitvar{VREV
} is greater than $
1$, then this stream
2129 may contain optional features or interpretational changes
2130 documented in a future version of this specification.
2131 Regardless of the value of
\bitvar{VREV
}, the stream is decodable
2132 according to this specification.
2134 Read a
16-bit unsigned integer as
\bitvar{FMBW
}.
2135 This MUST be greater than zero.
2136 This specifies the width of the coded frame in macro blocks.
2137 The actual width of the frame in pixels is $
\bitvar{FMBW
}*
16$.
2139 Read a
16-bit unsigned integer as
\bitvar{FMBH
}.
2140 This MUST be greater than zero.
2141 This specifies the height of the coded frame in macro blocks.
2142 The actual height of the frame in pixels is $
\bitvar{FMBH
}*
16$.
2144 Read a
24-bit unsigned integer as
\bitvar{PICW
}.
2145 This MUST be no greater than $(
\bitvar{FMBW
}*
16)$.
2146 Note that
24 bits are read, even though only
20 bits are sufficient to specify
2147 any value of the picture width.
2148 This is done to preserve octet alignment in this header, to allow for a
2149 simplified parser implementation.
2151 Read a
24-bit unsigned integer as
\bitvar{PICH
}.
2152 This MUST be no greater than $(
\bitvar{FMBH
}*
16)$.
2153 Together with
\bitvar{PICW
}, this specifies the size of the displayable picture
2154 region within the coded frame.
2155 See Figure~
\ref{fig:pic-frame
}.
2156 Again,
24 bits are read instead of
20.
2158 Read an
8-bit unsigned integer as
\bitvar{PICX
}.
2159 This MUST be no greater than $(
\bitvar{FMBW
}*
16-
\bitvar{PICX
})$.
2161 Read an
8-bit unsigned integer as
\bitvar{PICY
}.
2162 This MUST be no greater than $(
\bitvar{FMBH
}*
16-
\bitvar{PICY
})$.
2163 Together with
\bitvar{PICX
}, this specifies the location of the lower-left
2164 corner of the displayable picture region.
2165 See Figure~
\ref{fig:pic-frame
}.
2167 Read a
32-bit unsigned integer as
\bitvar{FRN
}.
2168 This MUST be greater than zero.
2170 Read a
32-bit unsigned integer as
\bitvar{FRD
}.
2171 This MUST be greater than zero.
2172 Theora is a fixed-frame rate video codec.
2173 Frames are sampled at the constant rate of $
\frac{\bitvar{FRN
}}{\bitvar{FRD
}}$
2175 The presentation time of the first frame is at zero seconds.
2176 No mechanism is provided to specify a non-zero offset for the initial
2179 Read a
24-bit unsigned integer as
\bitvar{PARN
}.
2181 Read a
24-bit unsigned integer as
\bitvar{PARD
}.
2182 Together with
\bitvar{PARN
}, these specify the aspect ratio of the pixels
2183 within a frame, defined as the ratio of the physical width of a pixel to its
2185 This is given by the ratio $
\bitvar{PARN
}:
\bitvar{PARD
}$.
2186 If either of these fields are zero, this indicates that pixel aspect ratio
2187 information was not available to the encoder.
2188 In this case it MAY be specified by the application via an external means, or
2189 a default value of $
1:
1$ MAY be used.
2191 Read an
8-bit unsigned integer as
\bitvar{CS
}.
2192 This is a value from an enumerated list of the available
color spaces, given in
2193 Table~
\ref{tab:colorspaces
}.
2194 The `Undefined' value indicates that
color space information was not available
2196 It MAY be specified by the application via an external means.
2197 If a reserved value is given, a decoder MAY refuse to decode the stream.
2200 \begin{tabular*
}{215pt
}{cl@
{\extracolsep{\fill}}c
}\toprule
2201 Value & Color Space \\
\midrule
2203 $
1$ & Rec.~
470M (see Section~
\ref{sec:
470m
}). \\
2204 $
2$ & Rec.~
470BG (see Section~
\ref{sec:
470bg
}). \\
2208 \bottomrule\end{tabular*
}
2210 \caption{Enumerated List of Color Spaces
}
2211 \label{tab:colorspaces
}
2214 Read a
24-bit unsigned integer as
\bitvar{NOMBR
}.
2215 The
\bitvar{NOMBR
} field is used only as a hint.
2216 For pure VBR streams, this value may be considerably off.
2217 The field MAY be set to zero to indicate that the encoder did not care to
2221 Read a
6-bit unsigned integer as
\bitvar{QUAL
}.
2222 This value is used to provide a hint as to the relative quality of the stream
2223 when compared to others produced by the same encoder.
2224 Larger values indicate higher quality.
2225 This can be used, for example, to select among several streams containing the
2226 same material encoded with different settings.
2228 Read a
5-bit unsigned integer as
\bitvar{KFGSHIFT
}.
2229 The
\bitvar{KFGSHIFT
} is used to partition the granule position associated with
2230 each packet into two different parts.
2231 The frame number of the last key frame, starting from zero, is stored in the
2232 upper $
64-
\bitvar{KFGSHIFT
}$ bits, while the lower
\bitvar{KFGSHIFT
} bits
2233 contain the number of frames since the last keyframe.
2234 Complete details on the granule position mapping are specified in Section~REF.
2236 Read a
2-bit unsigned integer as
\bitvar{PF
}.
2237 The
\bitvar{PF
} field contains a value from an enumerated list of the available
2238 pixel formats, given in Table~
\ref{tab:pixel-formats
}.
2239 If the reserved value $
1$ is given, stop.
2240 This stream is not decodable according to this specification.
2244 \begin{tabular*
}{215pt
}{cl@
{\extracolsep{\fill}}c
}\toprule
2245 Value & Pixel Format \\
\midrule
2246 $
0$ &
4:
2:
0 (see Section~
\ref{sec:
420}). \\
2248 $
2$ &
4:
2:
2 (see Section~
\ref{sec:
422}). \\
2249 $
3$ &
4:
4:
4 (see Section~
\ref{sec:
444}). \\
2250 \bottomrule\end{tabular*
}
2252 \caption{Enumerated List of Pixel Formats
}
2253 \label{tab:pixel-formats
}
2257 Read a
3-bit unsigned integer.
2258 These bits are reserved.
2259 If this value is not zero, then stop.
2260 This stream is not decodable according to this specification.
2262 Assign
\bitvar{NSBS
} a value according to
\bitvar{PF
}, as given by
2263 Table~
\ref{tab:nsbs-for-pf
}.
2267 \begin{tabular
}{cc
}\toprule
2268 \bitvar{PF
} &
\bitvar{NSBS
} \\
\midrule
2269 $
0$ & $
\begin{aligned
}
2270 &((
\bitvar{FMBW
}+
1)//
2)*((
\bitvar{FMBH
}+
1)//
2)\\
2271 & +
2*((
\bitvar{FMBW
}+
3)//
4)*((
\bitvar{FMBH
}+
3)//
4)
2272 \end{aligned
}$ \\
\midrule
2273 $
2$ & $
\begin{aligned
}
2274 &((
\bitvar{FMBW
}+
1)//
2)*((
\bitvar{FMBH
}+
1)//
2)\\
2275 & +
2*((
\bitvar{FMBW
}+
3)//
4)*((
\bitvar{FMBH
}+
1)//
2)
2276 \end{aligned
}$ \\
\midrule
2277 $
3$ & $
3*((
\bitvar{FMBW
}+
1)//
2)*((
\bitvar{FMBH
}+
1)//
2)$ \\
2278 \bottomrule\end{tabular
}
2280 \caption{Number of Super Blocks for each Pixel Format
}
2281 \label{tab:nsbs-for-pf
}
2285 Assign
\bitvar{NBS
} a value according to
\bitvar{PF
}, as given by
2286 Table~
\ref{tab:nbs-for-pf
}.
2290 \begin{tabular
}{cc
}\toprule
2291 \bitvar{PF
} &
\bitvar{NBS
} \\
\midrule
2292 $
0$ & $
6*
\bitvar{FMBW
}*
\bitvar{FMBH
}$ \\
\midrule
2293 $
2$ & $
8*
\bitvar{FMBW
}*
\bitvar{FMBH
}$ \\
\midrule
2294 $
3$ & $
12*
\bitvar{FMBW
}*
\bitvar{FMBH
}$ \\
2295 \bottomrule\end{tabular
}
2297 \caption{Number of Blocks for each Pixel Format
}
2298 \label{tab:nbs-for-pf
}
2302 Assign
\bitvar{NMBS
} the value $(
\bitvar{FMBW
}*
\bitvar{FMBH
})$.
2306 \paragraph{VP3 Compatibility
}
2308 VP3 does not correctly handle frame sizes that are not a multiple of
16.
2309 Thus,
\bitvar{PICW
} and
\bitvar{PICH
} should be set to the frame width and
2310 height in pixels, respectively, and
\bitvar{PICX
} and
\bitvar{PICY
} should be
2312 VP3 headers do not specify a
color space.
2313 VP3 only supports the
4:
2:
0 pixel format.
2315 \section{Comment Header
}
2316 \label{sec:commentheader
}
2318 The Theora comment header is the second of three header packets that begin a
2320 It is meant for short text comments, not aribtrary metadata; arbitrary metadata
2321 belongs in a separate logical stream that provides greater structure and
2322 machine parseability.
2324 %r: I tried to morph this a little more in the direction of our
2326 The comment field is meant to be used much like someone jotting a quick note on
2327 the label of a video.
2328 It should be a little information to remember the disc or tape by and explain it to
2329 others; a short, to-the-point text note that can be more than a couple words,
2330 but isn't going to be more than a short paragraph.
2331 The essentials, in other words, whatever they turn out to be, e.g.:
2335 The comment header is stored as a logical list of eight-bit clean vectors; the
2336 number of vectors is bounded at $
2^
{32}-
1$ and the length of each vector is
2337 limited to $
2^
{32}-
1$ bytes.
2338 The vector length is encoded; the vector contents themselves are not null
2340 In addition to the vector list, there is a single vector for a vendor name,
2341 also eight-bit clean with a length encoded in
32 bits.
2342 %TODO: The 1.0 release of libtheora sets the vendor string to ...
2344 \subsection{Comment Length Decode
}
2345 \label{sub:comment-len
}
2349 \begin{tabular
}{ | c | c |
}
2352 UTF-
8 encoded string ...\\
2356 \caption{Length encoded string layout
}
2357 \label{fig:comment-len
}
2360 \paragraph{Input parameters:
} None.
2362 \paragraph{Output parameters:
}\hfill\\*
2363 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2364 \multicolumn{1}{c
}{Name
} &
2365 \multicolumn{1}{c
}{Type
} &
2366 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2367 \multicolumn{1}{c
}{Signed?
} &
2368 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2369 \bitvar{LEN
} & Integer &
32 & No & A single
32-bit length value. \\
2370 \bottomrule\end{tabularx
}
2372 \paragraph{Variables used:
}\hfill\\*
2373 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2374 \multicolumn{1}{c
}{Name
} &
2375 \multicolumn{1}{c
}{Type
} &
2376 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2377 \multicolumn{1}{c
}{Signed?
} &
2378 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2379 \locvar{LEN0
} & Integer &
8 & No & The first octet of the string length. \\
2380 \locvar{LEN1
} & Integer &
8 & No & The second octet of the string length. \\
2381 \locvar{LEN2
} & Integer &
8 & No & The third octet of the string length. \\
2382 \locvar{LEN3
} & Integer &
8 & No & The fourth octet of the string
2384 \bottomrule\end{tabularx
}
2387 A single comment vector is decoded as follows:
2391 Read an
8-bit unsigned integer as
\locvar{LEN0
}.
2393 Read an
8-bit unsigned integer as
\locvar{LEN1
}.
2395 Read an
8-bit unsigned integer as
\locvar{LEN2
}.
2397 Read an
8-bit unsigned integer as
\locvar{LEN3
}.
2399 Assign
\bitvar{LEN
} the value $(
\locvar{LEN0
}+(
\locvar{LEN1
}<<
8)+
2400 (
\locvar{LEN2
}<<
16)+(
\locvar{LEN3
}<<
24))$.
2401 This construction is used so that on platforms with
8-bit bytes, the memory
2402 organization of the comment header is identical with that of Vorbis I,
2403 allowing for common parsing code despite the different bit packing
2407 \subsection{Comment Header Decode
}
2411 \begin{tabular
}{ | c |
}
2413 vendor string \\
\hline
2414 number of comments \\
\hline
2415 comment string \\
\hline
2416 comment string \\
\hline
2421 \caption{Comment Header Layout
}
2422 \label{fig:commentheader
}
2425 \paragraph{Input parameters:
} None.
2427 \paragraph{Output parameters:
}\hfill\\*
2428 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2429 \multicolumn{1}{c
}{Name
} &
2430 \multicolumn{1}{c
}{Type
} &
2431 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2432 \multicolumn{1}{c
}{Signed?
} &
2433 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2434 \bitvar{VENDOR
} &
\multicolumn{3}{l
}{String
} & The vendor string. \\
2435 \bitvar{NCOMMENTS
} & Integer &
32 & No & The number of user
2437 \bitvar{COMMENTS
} &
\multicolumn{3}{l
}{String Array
} & A list of
2438 \bitvar{NCOMMENTS
} user comment values. \\
2439 \bottomrule\end{tabularx
}
2441 \paragraph{Variables used:
}\hfill\\*
2442 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2443 \multicolumn{1}{c
}{Name
} &
2444 \multicolumn{1}{c
}{Type
} &
2445 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2446 \multicolumn{1}{c
}{Signed?
} &
2447 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2448 \locvar{\ci} & Integer &
32 & No & The index of the current user
2450 \bottomrule\end{tabularx
}
2453 The complete comment header is decoded as follows:
2457 Decode the common header fields according to the procedure described in
2458 Section~
\ref{sub:common-header
}.
2459 If
\bitvar{HEADERTYPE
} returned by this procedure is not
\hex{81}, then stop.
2460 This packet is not the comment header.
2462 Decode the length of the vendor string using the procedure given in
2463 Section~
\ref{sub:comment-len
} into
\bitvar{LEN
}.
2465 Read
\bitvar{LEN
} 8-bit unsigned integers.
2467 Set the string
\bitvar{VENDOR
} to the contents of these octets.
2469 Decode the number of user comments using the procedure given in
2470 Section~
\ref{sub:comment-len
} into
\bitvar{LEN
}.
2472 Assign
\bitvar{NCOMMENTS
} the value stored in
\bitvar{LEN
}.
2474 For each consecutive value of
\locvar{\ci} from $
0$ to
2475 $(
\bitvar{NCOMMENTS
}-
1)$, inclusive:
2478 Decode the length of the current user comment using the procedure given in
2479 Section~
\ref{sub:comment-len
} into
\bitvar{LEN
}.
2481 Read
\bitvar{LEN
} 8-bit unsigned integers.
2483 Set the string $
\bitvar{COMMENTS
}[\locvar{\ci}]$ to the contents of these
2488 The comment header comprises the entirety of the second header packet.
2489 Unlike the first header packet, it is not generally the only packet on the
2490 second page and may span multiple pages.
2491 The length of the comment header packet is (practically) unbounded.
2492 The comment header packet is not optional; it must be present in the stream
2493 even if it is logically empty.
2495 %TODO: \paragraph{VP3 Compatibility}
2497 \subsection{User Comment Format
}
2499 The user comment vectors are structured similarly to a UNIX environment
2501 That is, comment fields consist of a field name and a corresponding value and
2504 \begin{tabular
}{rcl
}
2505 $
\bitvar{COMMENTS
}[0]$ & = & ``TITLE=the look of Theora" \\
2506 $
\bitvar{COMMENTS
}[1]$ & = & ``DIRECTOR=me"
2510 The field name is case-insensitive and MUST consist of ASCII characters
2511 \hex{20} through
\hex{7D
},
\hex{3D
} (`=') excluded.
2512 ASCII
\hex{41} through
\hex{5A
} inclusive (characters `A'--`Z') are to be
2513 considered equivalent to ASCII
\hex{61} through
\hex{7A
} inclusive
2514 (characters `a'--`z').
2515 An entirely empty field name---one that is zero characters long---is not
2518 The field name is immediately followed by ASCII
\hex{3D
} (`='); this equals
2519 sign is used to terminate the field name.
2521 The data immediately after
\hex{3D
} until the end of the vector is the eight-bit
2522 clean value of the field contents encoded as a UTF-
8 string~
\cite{rfc2044
}.
2524 Field names MUST NOT be `internationalized'; this is a concession to
2525 simplicity, not an attempt to exclude the majority of the world that doesn't
2527 Applications MAY wish to present internationalized versions of the standard
2528 field names listed below to the user, but they are not to be stored in the
2530 Field
{\em contents
}, however, use the UTF-
8 character encoding to allow easy
2531 representation of any language.
2533 Individual `vendors' MAY use non-standard field names within reason.
2534 The proper use of comment fields as human-readable notes has already been
2536 Abuse will be discouraged.
2538 There is no vendor-specific prefix to `non-standard' field names.
2539 Vendors SHOULD make some effort to avoid arbitrarily polluting the common
2541 %"and other bodies"?
2542 %If you're going to be that vague, you might as well not say anything at all.
2543 Xiph.org and other bodies will generally collect and rationalize the more
2544 useful tags to help with standardization.
2546 Field names are not restricted to occur only once within a comment header.
2549 \paragraph{Field Names
}
2551 %r should this be an appendix?
2553 Below is a proposed, minimal list of standard field names with a description of
2555 No field names are mandatory; a comment header may contain one or more, all, or
2556 none of the names in this list.
2559 \item{TITLE:
} Video name.
2560 \item{ARTIST:
} Filmmaker or other creator name.
2561 \item{VERSION:
} Subtitle, remix info, or other text distinguishing
2562 versions of a video.
2563 \item{DATE:
} Date associated with the video. Implementations SHOULD attempt
2564 to parse this field as an ISO
8601 date for machine interpretation and
2566 \item{LOCATION:
} Location associated with the video. This is usually the
2567 filming location for non-fiction works.
2568 \item{COPYRIGHT:
} Copyright statement.
2569 \item{LICENSE:
} Copyright and other licensing information.
2570 Implementations wishing to do automatic parsing of e.g
2571 of distribution terms SHOULD look here for a URL uniquely defining
2572 the license. If no instance of this field is present, or if no
2573 instance contains a parseable URL, and implementation MAY look
2574 in the COPYRIGHT field for such a URL.
2575 \item{ORGANIZATION:
} Studio name, Publisher, or other organization
2576 involved in the creation of the video.
2578 \item{DIRECTOR:
} Director or Filmmaker credit, similar to ARTIST.
2579 \item{PRODUCER:
} Producer credit for the video.
2580 \item{COMPOSER:
} Music credit for the video.
2581 \item{ACTOR:
} Acting credit for the video.
2583 \item{TAG:
} subject or category tag, keyword, or other content
2584 classification labels. The value of each instance of this
2585 field SHOULD be treated as a single label, with multiple
2586 instances of the field for multiple tags. The value of
2587 a single field SHOULD NOT be parsed into multiple tags
2588 based on some internal delimeter.
2589 \item{DESCRIPTION:
} General description, summary, or blurb.
2592 \section{Setup Header
}
2593 \label{sec:setupheader
}
2595 The Theora setup header contains the limit values used to drive the loop
2596 filter, the base matrices and scale values used to build the dequantization
2597 tables, and the Huffman tables used to unpack the DCT tokens.
2598 Because the contents of this header are specific to Theora, no concessions have
2599 been made to keep the fields octet-aligned for easy parsing.
2603 \begin{tabular
}{ | c |
}
2605 common header block \\
\hline
2606 loop filter table resolution \\
\hline
2607 loop filter table \\
\hline
2608 scale table resolution \\
\hline
2609 AC scale table \\
\hline
2610 DC scale table \\
\hline
2611 number of base matricies \\
\hline
2612 base quatization matricies \\
\hline
2614 quant range interpolation table \\
\hline
2615 DCT token Huffman tables \\
2619 \caption{Setup Header structure
}
2620 \label{fig:setupheader
}
2623 \subsection{Loop Filter Limit Table Decode
}
2624 \label{sub:loop-filter-limits
}
2626 \paragraph{Input parameters:
} None.
2628 \paragraph{Output parameters:
}\hfill\\*
2629 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2630 \multicolumn{1}{c
}{Name
} &
2631 \multicolumn{1}{c
}{Type
} &
2632 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2633 \multicolumn{1}{c
}{Signed?
} &
2634 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2635 \bitvar{LFLIMS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
2636 7 & No & A
64-element array of loop filter limit
2638 \bottomrule\end{tabularx
}
2640 \paragraph{Variables used:
}\hfill\\*
2641 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2642 \multicolumn{1}{c
}{Name
} &
2643 \multicolumn{1}{c
}{Type
} &
2644 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2645 \multicolumn{1}{c
}{Signed?
} &
2646 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2647 \locvar{\qi} & Integer &
6 & No & The quantization index. \\
2648 \locvar{NBITS
} & Integer &
3 & No & The size of values being read in the
2650 \bottomrule\end{tabularx
}
2653 This procedure decodes the table of loop filter limit values used to drive the
2654 loop filter, which is described in Section~
\ref{sub:loop-filter-limits
}.
2655 It is decoded as follows:
2659 Read a
3-bit unsigned integer as
\locvar{NBITS
}.
2661 For each consecutive value of
\locvar{\qi} from $
0$ to $
63$, inclusive:
2664 Read an
\locvar{NBITS
}-bit unsigned integer as $
\bitvar{LFLIMS
}[\locvar{\qi}]$.
2668 \paragraph{VP3 Compatibility
}
2670 The loop filter limit values are hardcoded in VP3.
2671 The values used are given in Appendix~
\ref{app:vp3-loop-filter-limits
}.
2673 \subsection{Quantization Parameters Decode
}
2674 \label{sub:quant-params
}
2676 \paragraph{Input parameters:
} None.
2678 \paragraph{Output parameters:
}\hfill\\*
2679 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2680 \multicolumn{1}{c
}{Name
} &
2681 \multicolumn{1}{c
}{Type
} &
2682 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2683 \multicolumn{1}{c
}{Signed?
} &
2684 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2685 \bitvar{ACSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
2686 16 & No & A
64-element array of scale values for
2687 AC coefficients for each
\qi\ value. \\
2688 \bitvar{DCSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
2689 16 & No & A
64-element array of scale values for
2690 the DC coefficient for each
\qi\ value. \\
2691 \bitvar{NBMS
} & Integer &
10 & No & The number of base matrices. \\
2692 \bitvar{BMS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
2693 8 & No & A $
\bitvar{NBMS
}\times 64$ array
2694 containing the base matrices. \\
2695 \bitvar{NQRS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
2696 6 & No & A $
2\times 3$ array containing the
2697 number of quant ranges for a given
\qti\ and
\pli, respectively.
2698 This is at most $
63$. \\
2699 \bitvar{QRSIZES
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
2700 6 & No & A $
2\times 3\times 63$ array of the
2701 sizes of each quant range for a given
\qti\ and
\pli, respectively.
2702 Only the first $
\bitvar{NQRS
}[\qti][\pli]$ values are used. \\
2703 \bitvar{QRBMIS
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
2704 9 & No & A $
2\times 3\times 64$ array of the
2705 \bmi's used for each quant range for a given
\qti\ and
\pli, respectively.
2706 Only the first $(
\bitvar{NQRS
}[\qti][\pli]+
1)$ values are used. \\
2707 \bottomrule\end{tabularx
}
2709 \paragraph{Variables used:
}\hfill\\*
2710 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2711 \multicolumn{1}{c
}{Name
} &
2712 \multicolumn{1}{c
}{Type
} &
2713 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2714 \multicolumn{1}{c
}{Signed?
} &
2715 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2716 \locvar{\qti} & Integer &
1 & No & A quantization type index.
2717 See Table~
\ref{tab:quant-types
}.\\
2718 \locvar{\qtj} & Integer &
1 & No & A quantization type index. \\
2719 \locvar{\pli} & Integer &
2 & No & A
color plane index.
2720 See Table~
\ref{tab:
color-planes
}.\\
2721 \locvar{\plj} & Integer &
2 & No & A
color plane index. \\
2722 \locvar{\qi} & Integer &
6 & No & The quantization index. \\
2723 \locvar{\ci} & Integer &
6 & No & The DCT coefficient index. \\
2724 \locvar{\bmi} & Integer &
9 & No & The base matrix index. \\
2725 \locvar{\qri} & Integer &
6 & No & The quant range index. \\
2726 \locvar{NBITS
} & Integer &
5 & No & The size of fields to read. \\
2727 \locvar{NEWQR
} & Integer &
1 & No & Flag that indicates a new set of quant
2728 ranges will be defined. \\
2729 \locvar{RPQR
} & Integer &
1 & No & Flag that indicates the quant ranges to
2730 copy will come from the same
color plane. \\
2731 \bottomrule\end{tabularx
}
2734 The AC scale and DC scale values are defined in two simple tables with
64
2735 values each, one for each
\qi\ value.
2736 The same scale values are used for every quantization type and
color plane.
2738 The base matrices for all quantization types and
color planes are stored in a
2740 These are then referenced by index in several sets of
\term{quant ranges
}.
2741 The purpose of the quant ranges is to specify which base matrices are used for
2744 A set of quant ranges is defined for each quantization type and
color plane.
2745 To save space in the header, bit flags allow a set of quant ranges to be copied
2746 from a previously defined set instead of being specified explicitly.
2747 Every set except the first one can be copied from the immediately preceding
2749 Similarly, if the quantization type is not $
0$, the set can be copied from the
2750 set defined for the same
color plane for the preceding quantization type.
2751 This formulation allows compact representation of, for example, the same
2752 set of quant ranges in both chroma channels, as is done in the original VP3,
2753 or the same set of quant ranges in INTRA and INTER modes.
2755 Each quant range is defined by a size and two base matrix indices, one for each
2757 The base matrix for the end of one range is used as the start of the next
2758 range, so that for $n$ ranges, $n+
1$ base matrices are specified.
2759 The base matrices for the
\qi\ values between the two endpoints of the range
2760 are generated by linear interpolation.
2764 The location of the endpoints of each range is encoded by their size.
2765 The
\qi\ value for the left end-point is the sum of the sizes of all preceding
2766 ranges, and the
\qi\ value for the right end-point adds the size of the
2768 Thus the sum of the sizes of all the ranges MUST be
63, so that the last range
2769 falls on the last possible
\qi\ value.
2771 The complete set of quantization parameters are decoded as follows:
2775 Read a
4-bit unsigned integer.
2776 Assign
\locvar{NBITS
} the value read, plus one.
2778 For each consecutive value of
\locvar{\qi} from $
0$ to $
63$, inclusive:
2781 Read an
\locvar{NBITS
}-bit unsigned integer as
2782 $
\bitvar{ACSCALE
}[\locvar{\qi}]$.
2785 Read a
4-bit unsigned integer.
2786 Assign
\locvar{NBITS
} the value read, plus one.
2788 For each consecutive value of
\locvar{\qi} from $
0$ to $
63$, inclusive:
2791 Read an
\locvar{NBITS
}-bit unsigned integer as
2792 $
\bitvar{DCSCALE
}[\locvar{\qi}]$.
2795 Read a
9-bit unsigned integer.
2796 Assign
\bitvar{NBMS
} the value decoded, plus one.
2797 \bitvar{NBMS
} MUST be no greater than
384.
2799 For each consecutive value of
\locvar{\bmi} from $
0$ to $(
\bitvar{NBMS
}-
1)$,
2803 For each consecutive value of
\locvar{\ci} from $
0$ to $
63$, inclusive:
2806 Read an
8-bit unsigned integer as $
\bitvar{BMS
}[\locvar{\bmi}][\locvar{\ci}]$.
2810 For each consecutive value of
\locvar{\qti} from $
0$ to $
1$, inclusive:
2813 For each consecutive value of
\locvar{\pli} from $
0$ to $
2$, inclusive:
2816 If $
\locvar{\qti}>
0$ or $
\locvar{\pli}>
0$, read a
1-bit unsigned integer as
2819 Else, assign
\locvar{NEWQR
} the value one.
2821 If
\locvar{NEWQR
} is zero, then we are copying a previously defined set of
2826 If $
\locvar{\qti}>
0$, read a
1-bit unsigned integer as
\locvar{RPQR
}.
2828 Else, assign
\locvar{RPQR
} the value zero.
2830 If
\locvar{RPQR
} is one, assign
\locvar{\qtj} the value $(
\locvar{\qti}-
1)$
2831 and assign
\locvar{\plj} the value
\locvar{\pli}.
2832 This selects the set of quant ranges defined for the same
color plane as this
2833 one, but for the previous quantization type.
2835 Else assign
\locvar{\qtj} the value $(
3*
\locvar{\qti}+
\locvar{\pli}-
1)//
3$ and
2836 assign
\locvar{\plj} the value $(
\locvar{\pli}+
2)\%
3$.
2837 This selects the most recent set of quant ranges defined.
2839 Assign $
\bitvar{NQRS
}[\locvar{\qti}][\locvar{\pli}]$ the value
2840 $
\bitvar{NQRS
}[\locvar{\qtj}][\locvar{\plj}]$.
2842 Assign $
\bitvar{QRSIZES
}[\locvar{\qti}][\locvar{\pli}]$ the values in
2843 $
\bitvar{QRSIZES
}[\locvar{\qtj}][\locvar{\plj}]$.
2845 Assign $
\bitvar{QRBMIS
}[\locvar{\qti}][\locvar{\pli}]$ the values in
2846 $
\bitvar{QRBMIS
}[\locvar{\qtj}][\locvar{\plj}]$.
2849 Else,
\locvar{NEWQR
} is one, which indicates that we are defining a new set of
2854 Assign $
\locvar{\qri}$ the value zero.
2856 Assign $
\locvar{\qi}$ the value zero.
2858 Read an $
\ilog(
\bitvar{NBMS
}-
1)$-bit unsigned integer as\\
2859 $
\bitvar{QRBMIS
}[\locvar{\qti}][\locvar{\pli}][\locvar{\qri}]$.
2860 If this is greater than or equal to
\bitvar{NBMS
}, stop.
2861 The stream is undecodable.
2863 \label{step:qr-loop
}
2864 Read an $
\ilog(
62-
\locvar{\qi})$-bit unsigned integer.
2865 Assign\\ $
\bitvar{QRSIZES
}[\locvar{\qti}][\locvar{\pli}][\locvar{\qri}]$ the value
2868 Assign
\locvar{\qi} the value $
\locvar{\qi}+
2869 \bitvar{QRSIZES
}[\locvar{\qti}][\locvar{\pli}][\locvar{\qri}]$.
2871 Assign
\locvar{\qri} the value $
\locvar{\qri}+
1$.
2873 Read an $
\ilog(
\bitvar{NBMS
}-
1)$-bit unsigned integer as\\
2874 $
\bitvar{QRBMIS
}[\locvar{\qti}][\locvar{\pli}][\locvar{\qri}]$.
2876 If
\locvar{\qi} is less than
63, go back to step~
\ref{step:qr-loop
}.
2878 If
\locvar{\qi} is greater than
63, stop.
2879 The stream is undecodable.
2881 Assign $
\bitvar{NQRS
}[\locvar{\qti}][\locvar{\pli}]$ the value
\locvar{\qri}.
2887 \paragraph{VP3 Compatibility
}
2889 The quantization parameters are hardcoded in VP3.
2890 The values used are given in Appendix~
\ref{app:vp3-quant-params
}.
2892 \subsection{Computing a Quantization Matrix
}
2893 \label{sub:quant-mat
}
2895 \paragraph{Input parameters:
}\hfill\\*
2896 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2897 \multicolumn{1}{c
}{Name
} &
2898 \multicolumn{1}{c
}{Type
} &
2899 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2900 \multicolumn{1}{c
}{Signed?
} &
2901 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2902 \bitvar{ACSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
2903 16 & No & A
64-element array of scale values for
2904 AC coefficients for each
\qi\ value. \\
2905 \bitvar{DCSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
2906 16 & No & A
64-element array of scale values for
2907 the DC coefficient for each
\qi\ value. \\
2908 \bitvar{BMS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
2909 8 & No & A $
\bitvar{NBMS
}\times 64$ array
2910 containing the base matrices. \\
2911 \bitvar{NQRS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
2912 6 & No & A $
2\times 3$ array containing the
2913 number of quant ranges for a given
\qti\ and
\pli, respectively.
2914 This is at most $
63$. \\
2915 \bitvar{QRSIZES
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
2916 6 & No & A $
2\times 3\times 63$ array of the
2917 sizes of each quant range for a given
\qti\ and
\pli, respectively.
2918 Only the first $
\bitvar{NQRS
}[\qti][\pli]$ values are used. \\
2919 \bitvar{QRBMIS
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
2920 9 & No & A $
2\times 3\times 64$ array of the
2921 \bmi's used for each quant range for a given
\qti\ and
\pli, respectively.
2922 Only the first $(
\bitvar{NQRS
}[\qti][\pli]+
1)$ values are used. \\
2923 \bitvar{\qti} & Integer &
1 & No & A quantization type index.
2924 See Table~
\ref{tab:quant-types
}.\\
2925 \bitvar{\pli} & Integer &
2 & No & A
color plane index.
2926 See Table~
\ref{tab:
color-planes
}.\\
2927 \bitvar{\qi} & Integer &
6 & No & The quantization index. \\
2928 \bottomrule\end{tabularx
}
2930 \paragraph{Output parameters:
}\hfill\\*
2931 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2932 \multicolumn{1}{c
}{Name
} &
2933 \multicolumn{1}{c
}{Type
} &
2934 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2935 \multicolumn{1}{c
}{Signed?
} &
2936 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2937 \bitvar{QMAT
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
2938 16 & No & A
64-element array of quantization
2939 values for each DCT coefficient in natural order. \\
2940 \bottomrule\end{tabularx
}
2942 \paragraph{Variables used:
}\hfill\\*
2943 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
2944 \multicolumn{1}{c
}{Name
} &
2945 \multicolumn{1}{c
}{Type
} &
2946 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
2947 \multicolumn{1}{c
}{Signed?
} &
2948 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
2949 \locvar{\ci} & Integer &
6 & No & The DCT coefficient index. \\
2950 \locvar{\bmi} & Integer &
9 & No & The base matrix index. \\
2951 \locvar{\bmj} & Integer &
9 & No & The base matrix index. \\
2952 \locvar{\qri} & Integer &
6 & No & The quant range index. \\
2953 \locvar{QISTART
} & Integer &
6 & No & The left end-point of the
\qi\ range. \\
2954 \locvar{QIEND
} & Integer &
6 & No & The right end-point of the
\qi\ range. \\
2955 \locvar{BM
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
2956 8 & No & A
64-element array containing the
2957 interpolated base matrix. \\
2958 \locvar{QMIN
} & Integer &
16 & No & The minimum quantization value allowed
2959 for the current coefficient. \\
2960 \locvar{QSCALE
} & Integer &
16 & No & The current scale value. \\
2961 \bottomrule\end{tabularx
}
2964 The following procedure can be used to generate a single quantization matrix
2965 for a given quantization type,
color plane, and
\qi\ value, given the
2966 quantization parameters decoded in Section~
\ref{sub:quant-params
}.
2968 Note that the product of the scale value and the base matrix value is in units
2969 of $
100$ths of a pixel value, and thus is divided by $
100$ to return it to
2970 units of a single pixel value.
2971 This value is then scaled by four, to match the scaling of the DCT output,
2972 which is also a factor of four larger than the orthonormal version of the
2977 Assign
\locvar{\qri} the index of a quant range such that
2979 \sum_{\qrj=
0}^
{\locvar{\qri}-
1}
2980 \bitvar{\qi} \ge \bitvar{QRSIZES
}[\bitvar{\qti}][\bitvar{\pli}][\qrj],
2984 \sum_{\qrj=
0}^
{\locvar{\qri}}
2985 \bitvar{\qi} \le \bitvar{QRSIZES
}[\bitvar{\qti}][\bitvar{\pli}][\qrj],
2987 where summation from $
0$ to $-
1$ is defined to be zero.
2988 If there is more than one such value of $
\locvar{\qri}$, i.e., if
\bitvar{\qi}
2989 lies on the boundary between two quant ranges, then the output will be the
2990 same regardless of which one is chosen.
2992 Assign
\locvar{QISTART
} the value
2994 \sum_{\qrj=
0}^
{\qri-
1} \bitvar{QRSIZES
}[\bitvar{\qti}][\bitvar{\pli}][\qrj].
2997 Assign
\locvar{QIEND
} the value
2999 \sum_{\qrj=
0}^
{\qri} \bitvar{QRSIZES
}[\bitvar{\qti}][\bitvar{\pli}][\qrj].
3002 Assign
\locvar{\bmi} the value
3003 $
\bitvar{QRBMIS
}[\bitvar{\qti}][\bitvar{\pli}][\qri]$.
3005 Assign
\locvar{\bmj} the value
3006 $
\bitvar{QRBMIS
}[\bitvar{\qti}][\bitvar{\pli}][\qri+
1]$.
3008 For each consecutive value of
\locvar{\ci} from $
0$ to $
63$, inclusive:
3011 Assign $
\locvar{BM
}[\locvar{\ci}]$ the value
3014 (&
2*(
\locvar{QIEND
}-
\bitvar{\qi})*
\bitvar{BMS
}[\locvar{\bmi}][\locvar{\ci}]\\
3016 \locvar{QISTART
})*
\bitvar{BMS
}[\locvar{\bmj}][\locvar{\ci}]\\
3017 &+
\bitvar{QRSIZES
}[\bitvar{\qti}][\bitvar{\pli}][\locvar{\qri}])//
3018 (
2*
\bitvar{QRSIZES
}[\bitvar{\qti}][\bitvar{\pli}][\locvar{\qri}])
3022 Assign
\locvar{QMIN
} the value given by Table~
\ref{tab:qmin
} according to
3023 \bitvar{\qti} and
\locvar{\ci}.
3027 \begin{tabular
}{clr
}\toprule
3028 Coefficient &
\multicolumn{1}{c
}{\bitvar{\qti}}
3029 &
\locvar{QMIN
} \\
\midrule
3030 $
\locvar{\ci}=
0$ & $
0$ (Intra) & $
16$ \\
3031 $
\locvar{\ci}>
0$ & $
0$ (Intra) & $
8$ \\
3032 $
\locvar{\ci}=
0$ & $
1$ (Inter) & $
32$ \\
3033 $
\locvar{\ci}>
0$ & $
1$ (Inter) & $
16$ \\
3034 \bottomrule\end{tabular
}
3036 \caption{Minimum Quantization Values
}
3041 If
\locvar{\ci} equals zero, assign $
\locvar{QSCALE
}$ the value
3042 $
\bitvar{DCSCALE
}[\bitvar{\qi}]$.
3044 Else, assign $
\locvar{QSCALE
}$ the value
3045 $
\bitvar{ACSCALE
}[\bitvar{\qi}]$.
3047 Assign $
\bitvar{QMAT
}[\locvar{\ci}]$ the value
3050 \min((
\locvar{QSCALE
}*
\locvar{BM
}[\locvar{\ci}]//
100)*
4,
4096)).
3055 \subsection{DCT Token Huffman Tables
}
3056 \label{sub:huffman-tables
}
3058 \paragraph{Input parameters:
} None.
3060 \paragraph{Output parameters:
}\hfill\\*
3061 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3062 \multicolumn{1}{c
}{Name
} &
3063 \multicolumn{1}{c
}{Type
} &
3064 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3065 \multicolumn{1}{c
}{Signed?
} &
3066 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3067 \bitvar{HTS
} &
\multicolumn{3}{l
}{Huffman table array
}
3068 & An
80-element array of Huffman tables
3069 with up to
32 entries each. \\
3070 \bottomrule\end{tabularx
}
3072 \paragraph{Variables used:
}\hfill\\*
3073 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3074 \multicolumn{1}{c
}{Name
} &
3075 \multicolumn{1}{c
}{Type
} &
3076 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3077 \multicolumn{1}{c
}{Signed?
} &
3078 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3079 \locvar{HBITS
} & Bit string &
32 & No & A string of up to
32 bits. \\
3080 \locvar{TOKEN
} & Integer &
5 & No & A single DCT token value. \\
3081 \locvar{ISLEAF
} & Integer &
1 & No & Flag that indicates if the current
3082 node of the tree being decoded is a leaf node. \\
3083 \bottomrule\end{tabularx
}
3086 The Huffman tables used to decode DCT tokens are stored in the setup header in
3087 the form of a binary tree.
3088 This enforces the requirements that the code be full---so that any sequence of
3089 bits will produce a valid sequence of tokens---and that the code be
3090 prefix-free so that there is no ambiguity when decoding.
3092 One more restriction is placed on the tables that is not explicitly enforced by
3093 the bitstream syntax, but nevertheless must be obeyed by compliant encoders.
3094 There must be no more than
32 entries in a single table.
3095 Note that this restriction along with the fullness requirement limit the
3096 maximum size of a single Huffman code to
32 bits.
3097 It is probably a good idea to enforce this latter consequence explicitly when
3098 implementing the decoding procedure as a recursive algorithm, so as to prevent
3099 a possible stack overflow given an invalid bitstream.
3101 Although there are
32 different DCT tokens, and thus a normal table will have
3102 exactly
32 entries, this is not explicitly required.
3103 It is allowable to use a Huffman code that omits some---but not all---of the
3104 possible token values.
3105 It is also allowable, if not particularly useful, to specify multiple codes for
3106 the same token value in a single table.
3107 Note also that token values may appear in the tree in any order.
3108 In particular, it is not safe to assume that token value zero (which ends a
3109 single block), has a Huffman code of all zeros.
3111 The tree is decoded as follows:
3115 For each consecutive value of
\locvar{\hti} from $
0$ to $
79$, inclusive:
3118 Set
\locvar{HBITS
} to the empty string.
3120 \label{step:huff-tree-loop
}
3121 If
\locvar{HBITS
} is longer than
32 bits in length, stop.
3122 The stream is undecodable.
3124 Read a
1-bit unsigned integer as
\locvar{ISLEAF
}.
3126 If
\locvar{ISLEAF
} is one:
3129 If the number of entries in table $
\bitvar{HTS
}[\locvar{\hti}]$ is already
32,
3131 The stream is undecodable.
3133 Read a
5-bit unsigned integer as
\locvar{TOKEN
}.
3135 Add the pair $(
\locvar{HBITS
},
\locvar{TOKEN
})$ to Huffman table
3136 $
\bitvar{HTS
}[\locvar{\hti}]$.
3142 Add a `
0' to the end of
\locvar{HBITS
}.
3144 Decode the `
0' sub-tree using this procedure, starting from
3145 step~
\ref{step:huff-tree-loop
}.
3147 Remove the `
0' from the end of
\locvar{HBITS
} and add a `
1' to the end of
3150 Decode the `
1' sub-tree using this procedure, starting from
3151 step~
\ref{step:huff-tree-loop
}.
3153 Remove the `
1' from the end of
\locvar{HBITS
}.
3158 \paragraph{VP3 Compatibility
}
3160 The DCT token Huffman tables are hardcoded in VP3.
3161 The values used are given in Appendix~
\ref{app:vp3-huffman-tables
}.
3163 \subsection{Setup Header Decode
}
3165 \paragraph{Input parameters:
} None.
3167 \paragraph{Output parameters:
}\hfill\\*
3168 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3169 \multicolumn{1}{c
}{Name
} &
3170 \multicolumn{1}{c
}{Type
} &
3171 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3172 \multicolumn{1}{c
}{Signed?
} &
3173 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3174 \bitvar{LFLIMS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
3175 7 & No & A
64-element array of loop filter limit
3177 \bitvar{ACSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
3178 16 & No & A
64-element array of scale values for
3179 AC coefficients for each
\qi\ value. \\
3180 \bitvar{DCSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
3181 16 & No & A
64-element array of scale values for
3182 the DC coefficient for each
\qi\ value. \\
3183 \bitvar{NBMS
} & Integer &
10 & No & The number of base matrices. \\
3184 \bitvar{BMS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
3185 8 & No & A $
\bitvar{NBMS
}\times 64$ array
3186 containing the base matrices. \\
3187 \bitvar{NQRS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
3188 6 & No & A $
2\times 3$ array containing the
3189 number of quant ranges for a given
\qti\ and
\pli, respectively.
3190 This is at most $
63$. \\
3191 \bitvar{QRSIZES
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
3192 6 & No & A $
2\times 3\times 63$ array of the
3193 sizes of each quant range for a given
\qti\ and
\pli, respectively.
3194 Only the first $
\bitvar{NQRS
}[\qti][\pli]$ values will be used. \\
3195 \bitvar{QRBMIS
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
3196 9 & No & A $
2\times 3\times 64$ array of the
3197 \bmi's used for each quant range for a given
\qti\ and
\pli, respectively.
3198 Only the first $(
\bitvar{NQRS
}[\qti][\pli]+
1)$ values will be used. \\
3199 \bitvar{HTS
} &
\multicolumn{3}{l
}{Huffman table array
}
3200 & An
80-element array of Huffman tables
3201 with up to
32 entries each. \\
3202 \bottomrule\end{tabularx
}
3204 \paragraph{Variables used:
} None.
3207 The complete setup header is decoded as follows:
3211 Decode the common header fields according to the procedure described in
3212 Section~
\ref{sub:common-header
}.
3213 If
\bitvar{HEADERTYPE
} returned by this procedure is not
\hex{82}, then stop.
3214 This packet is not the setup header.
3216 Decode the loop filter limit value table using the procedure given in
3217 Section~
\ref{sub:loop-filter-limits
} into
\bitvar{LFLIMS
}.
3219 Decode the quantization parameters using the procedure given in
3220 Section~
\ref{sub:quant-params
}.
3221 The results are stored in
\bitvar{ACSCALE
},
\bitvar{DCSCALE
},
\bitvar{NBMS
},
3222 \bitvar{BMS
},
\bitvar{NQRS
},
\bitvar{QRSIZES
}, and
\bitvar{QRBMIS
}.
3224 Decode the DCT token Huffman tables using the procedure given in
3225 Section~
\ref{sub:huffman-tables
} into
\bitvar{HTS
}.
3228 \chapter{Frame Decode
}
3230 This section describes the complete procedure necessary to decode a single
3232 This begins with the frame header, followed by coded block flags, macro block
3233 modes, motion vectors, block-level
\qi\ values, and finally the DCT residual
3234 tokens, which are used to reconstruct the frame.
3236 \section{Frame Header Decode
}
3237 \label{sub:frame-header
}
3239 \paragraph{Input parameters:
} None.
3241 \paragraph{Output parameters:
}\hfill\\*
3242 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3243 \multicolumn{1}{c
}{Name
} &
3244 \multicolumn{1}{c
}{Type
} &
3245 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3246 \multicolumn{1}{c
}{Signed?
} &
3247 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3248 \bitvar{FTYPE
} & Integer &
1 & No & The frame type. \\
3249 \bitvar{NQIS
} & Integer &
2 & No & The number of
\qi\ values. \\
3250 \bitvar{QIS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
3251 6 & No & An
\bitvar{NQIS
}-element array of
3253 \bottomrule\end{tabularx
}
3255 \paragraph{Variables used:
}\hfill\\*
3256 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3257 \multicolumn{1}{c
}{Name
} &
3258 \multicolumn{1}{c
}{Type
} &
3259 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3260 \multicolumn{1}{c
}{Signed?
} &
3261 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3262 \locvar{MOREQIS
} & Integer &
1 & No & A flag indicating there are more
3263 \qi\ values to be decoded. \\
3264 \bottomrule\end{tabularx
}
3267 The frame header selects which type of frame is being decoded, intra or inter,
3268 and contains the list of
\qi\ values that will be used in this frame.
3269 The first
\qi\ value will be used for
{\em all
} DC coefficients in all blocks.
3270 This is done to ensure that DC prediction, which is done in the quantized
3271 domain, works as expected.
3272 The AC coefficients, however, can be dequantized using any
\qi\ value on the
3273 list, selected on a block-by-block basis.
3277 Read a
1-bit unsigned integer.
3278 If the value read is not zero, stop.
3279 This is not a data packet.
3281 Read a
1-bit unsigned integer as
\bitvar{FTYPE
}.
3282 This is the type of frame being decoded, as given in
3283 Table~
\ref{tab:frame-type
}.
3284 If this is the first frame being decoded, this MUST be zero.
3288 \begin{tabular
}{cl
}\toprule
3289 \bitvar{FTYPE
} & Frame Type \\
\midrule
3290 $
0$ & Intra frame \\
3291 $
1$ & Inter frame \\
3292 \bottomrule\end{tabular
}
3294 \caption{Frame Type Values
}
3295 \label{tab:frame-type
}
3299 Read in a
6-bit unsigned integer as $
\bitvar{QIS
}[0]$.
3301 Read a
1-bit unsigned integer as
\locvar{MOREQIS
}.
3303 If
\locvar{MOREQIS
} is zero, set
\bitvar{NQIS
} to
1.
3308 Read in a
6-bit unsigned integer as $
\bitvar{QIS
}[1]$.
3310 Read a
1-bit unsigned integer as
\locvar{MOREQIS
}.
3312 If
\locvar{MOREQIS
} is zero, set
\bitvar{NQIS
} to
2.
3317 Read in a
6-bit unsigned integer as $
\bitvar{QIS
}[2]$.
3319 Set
\bitvar{NQIS
} to
3.
3323 If
\bitvar{FTYPE
} is
0, read a
3-bit unsigned integer.
3324 These bits are reserved.
3325 If this value is not zero, stop.
3326 This frame is not decodable according to this specification.
3329 \paragraph{VP3 Compatibility
}
3331 The precise format of the frame header is substantially different in Theora
3333 The original VP3 format includes a larger number of unused, reserved bits that
3334 are required to be zero.
3335 The original VP3 frame header also can contain only a single
\qi\ value,
3336 because VP3 does not support block-level
\qi\ values and uses the same
3337 \qi\ value for all the coefficients in a frame.
3339 \section{Run-Length Encoded Bit Strings
}
3341 Two variations of run-length encoding are used to store sequences of bits for
3342 the block coded flags and the block-level
\qi\ values.
3343 The procedures to decode these bit sequences are specified in the following two
3346 \subsection{Long-Run Bit String Decode
}
3347 \label{sub:long-run
}
3349 \paragraph{Input parameters:
}\hfill\\*
3350 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3351 \multicolumn{1}{c
}{Name
} &
3352 \multicolumn{1}{c
}{Type
} &
3353 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3354 \multicolumn{1}{c
}{Signed?
} &
3355 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3356 \bitvar{NBITS
} & Integer &
36 & No & The number of bits to decode. \\
3357 \bottomrule\end{tabularx
}
3359 \paragraph{Output parameters:
}\hfill\\*
3360 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3361 \multicolumn{1}{c
}{Name
} &
3362 \multicolumn{1}{c
}{Type
} &
3363 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3364 \multicolumn{1}{c
}{Signed?
} &
3365 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3366 \bitvar{BITS
} & Bit string & & & The decoded bits. \\
3367 \bottomrule\end{tabularx
}
3369 \paragraph{Variables used:
}\hfill\\*
3370 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3371 \multicolumn{1}{c
}{Name
} &
3372 \multicolumn{1}{c
}{Type
} &
3373 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3374 \multicolumn{1}{c
}{Signed?
} &
3375 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3376 \locvar{LEN
} & Integer &
36 & No & The number of bits decoded so far. \\
3377 \locvar{BIT
} & Integer &
1 & No & The value associated with the current
3379 \locvar{RLEN
} & Integer &
13 & No & The length of the current run. \\
3380 \locvar{RBITS
} & Integer &
4 & No & The number of extra bits needed to
3381 decode the run length. \\
3382 \locvar{RSTART
} & Integer &
6 & No & The start of the possible run-length
3383 values for a given Huffman code. \\
3384 \locvar{ROFFS
} & Integer &
12 & No & The offset from
\locvar{RSTART
} of the
3386 \bottomrule\end{tabularx
}
3389 There is no practical limit to the number of consecutive
0's and
1's that can
3390 be decoded with this procedure.
3391 In reality, the run length is limited by the number of blocks in a single
3392 frame, because more will never be requested.
3393 A separate procedure described in Section~
\ref{sub:short-run
} is used when
3394 there is a known limit on the maximum size of the runs.
3396 For the first run, a single bit value is read, and then a Huffman-coded
3397 representation of a run length is decoded, and that many copies of the bit
3398 value are appended to the bit string.
3399 For each consecutive run, the value of the bit is toggled instead of being read
3402 The only exception is if the length of the previous run was
4129, the maximum
3403 possible length encodable by the Huffman-coded representation.
3404 In this case another bit value is read from the stream, to allow for
3405 consecutive runs of
0's or
1's longer than this maximum.
3407 Note that in both cases---for the first run and after a run of length
4129---if
3408 no more bits are needed, then no bit value is read.
3410 The complete decoding procedure is as follows:
3414 Assign
\locvar{LEN
} the value
0.
3416 Assign
\bitvar{BITS
} the empty string.
3418 If
\locvar{LEN
} equals
\bitvar{NBITS
}, return the completely decoded string
3421 Read a
1-bit unsigned integer as
\locvar{BIT
}.
3423 \label{step:long-run-loop
}
3424 Read a bit at a time until one of the Huffman codes given in
3425 Table~
\ref{tab:long-run
} is recognized.
3429 \begin{tabular
}{lrrl
}\toprule
3430 Huffman Code &
\locvar{RSTART
} &
\locvar{RBITS
} & Run Lengths \\
\midrule
3431 \bin{0} & $
1$ & $
0$ & $
1$ \\
3432 \bin{10} & $
2$ & $
1$ & $
2\ldots 3$ \\
3433 \bin{110} & $
4$ & $
1$ & $
4\ldots 5$ \\
3434 \bin{1110} & $
6$ & $
2$ & $
6\ldots 9$ \\
3435 \bin{11110} & $
10$ & $
3$ & $
10\ldots 17$ \\
3436 \bin{111110} & $
18$ & $
4$ & $
18\ldots 33$ \\
3437 \bin{111111} & $
34$ & $
12$ & $
34\ldots 4129$ \\
3438 \bottomrule\end{tabular
}
3440 \caption{Huffman Codes for Long Run Lengths
}
3441 \label{tab:long-run
}
3445 Assign
\locvar{RSTART
} and
\locvar{RBITS
} the values given in
3446 Table~
\ref{tab:long-run
} according to the Huffman code read.
3448 Read an
\locvar{RBITS
}-bit unsigned integer as
\locvar{ROFFS
}.
3450 Assign
\locvar{RLEN
} the value $(
\locvar{RSTART
}+
\locvar{ROFFS
})$.
3452 Append
\locvar{RLEN
} copies of
\locvar{BIT
} to
\bitvar{BITS
}.
3454 Add
\locvar{RLEN
} to the value
\locvar{LEN
}.
3455 \locvar{LEN
} MUST be less than or equal to
\bitvar{NBITS
}.
3457 If
\locvar{LEN
} equals
\bitvar{NBITS
}, return the completely decoded string
3460 If
\locvar{RLEN
} equals
4129, read a
1-bit unsigned integer as
\locvar{BIT
}.
3462 Otherwise, assign
\locvar{BIT
} the value $(
1-
\locvar{BIT
})$.
3464 Continue decoding runs from step~
\ref{step:long-run-loop
}.
3467 \paragraph{VP3 Compatibility
}
3469 VP3 does not read a new bit value after decoding a run length of
4129.
3470 This limits the maximum number of consecutive
0's or
1's to
4129 in
3471 VP3-compatible streams.
3472 For reasonable video sizes of $
1920\times 1080$ or less in
4:
2:
0 format---the
3473 only pixel format VP3 supports---this does not pose any problems because runs
3474 longer than
4129 are not needed.
3476 \subsection{Short-Run Bit String Decode
}
3477 \label{sub:short-run
}
3479 \paragraph{Input parameters:
}\hfill\\*
3480 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3481 \multicolumn{1}{c
}{Name
} &
3482 \multicolumn{1}{c
}{Type
} &
3483 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3484 \multicolumn{1}{c
}{Signed?
} &
3485 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3486 \bitvar{NBITS
} & Integer &
36 & No & The number of bits to decode. \\
3487 \bottomrule\end{tabularx
}
3489 \paragraph{Output parameters:
}\hfill\\*
3490 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3491 \multicolumn{1}{c
}{Name
} &
3492 \multicolumn{1}{c
}{Type
} &
3493 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3494 \multicolumn{1}{c
}{Signed?
} &
3495 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3496 \bitvar{BITS
} & Bit string & & & The decoded bits. \\
3497 \bottomrule\end{tabularx
}
3499 \paragraph{Variables used:
}\hfill\\*
3500 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3501 \multicolumn{1}{c
}{Name
} &
3502 \multicolumn{1}{c
}{Type
} &
3503 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3504 \multicolumn{1}{c
}{Signed?
} &
3505 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3506 \locvar{LEN
} & Integer &
36 & No & The number of bits decoded so far. \\
3507 \locvar{BIT
} & Integer &
1 & No & The value associated with the current
3509 \locvar{RLEN
} & Integer &
13 & No & The length of the current run. \\
3510 \locvar{RBITS
} & Integer &
4 & No & The number of extra bits needed to
3511 decode the run length. \\
3512 \locvar{RSTART
} & Integer &
6 & No & The start of the possible run-length
3513 values for a given Huffman code. \\
3514 \locvar{ROFFS
} & Integer &
12 & No & The offset from
\locvar{RSTART
} of the
3516 \bottomrule\end{tabularx
}
3519 This procedure is similar to the procedure outlined in
3520 Section~
\ref{sub:long-run
}, except that the maximum number of consecutive
0's
3521 or
1's is limited to
30.
3522 This is the maximum run length needed when encoding a bit for each of the
16
3523 blocks in a super block when it is known that not all the bits in a super
3526 The complete decoding procedure is as follows:
3530 Assign
\locvar{LEN
} the value
0.
3532 Assign
\bitvar{BITS
} the empty string.
3534 If
\locvar{LEN
} equals
\bitvar{NBITS
}, return the completely decoded string
3537 Read a
1-bit unsigned integer as
\locvar{BIT
}.
3539 \label{step:short-run-loop
}
3540 Read a bit at a time until one of the Huffman codes given in
3541 Table~
\ref{tab:short-run
} is recognized.
3545 \begin{tabular
}{lrrl
}\toprule
3546 Huffman Code &
\locvar{RSTART
} &
\locvar{RBITS
} & Run Lengths \\
\midrule
3547 \bin{0} & $
1$ & $
1$ & $
1\ldots 2$ \\
3548 \bin{10} & $
3$ & $
1$ & $
3\ldots 4$ \\
3549 \bin{110} & $
5$ & $
1$ & $
5\ldots 6$ \\
3550 \bin{1110} & $
7$ & $
2$ & $
7\ldots 10$ \\
3551 \bin{11110} & $
11$ & $
2$ & $
11\ldots 14$ \\
3552 \bin{11111} & $
15$ & $
4$ & $
15\ldots 30$ \\
3553 \bottomrule\end{tabular
}
3555 \caption{Huffman Codes for Short Run Lengths
}
3556 \label{tab:short-run
}
3560 Assign
\locvar{RSTART
} and
\locvar{RBITS
} the values given in
3561 Table~
\ref{tab:short-run
} according to the Huffman code read.
3563 Read an
\locvar{RBITS
}-bit unsigned integer as
\locvar{ROFFS
}.
3565 Assign
\locvar{RLEN
} the value $(
\locvar{RSTART
}+
\locvar{ROFFS
})$.
3567 Append
\locvar{RLEN
} copies of
\locvar{BIT
} to
\bitvar{BITS
}.
3569 Add
\locvar{RLEN
} to the value
\locvar{LEN
}.
3570 \locvar{LEN
} MUST be less than or equal to
\bitvar{NBITS
}.
3572 If
\locvar{LEN
} equals
\bitvar{NBITS
}, return the completely decoded string
3575 Assign
\locvar{BIT
} the value $(
1-
\locvar{BIT
})$.
3577 Continue decoding runs from step~
\ref{step:short-run-loop
}.
3580 \section{Coded Block Flags Decode
}
3581 \label{sub:coded-blocks
}
3583 \paragraph{Input parameters:
}\hfill\\*
3584 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3585 \multicolumn{1}{c
}{Name
} &
3586 \multicolumn{1}{c
}{Type
} &
3587 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3588 \multicolumn{1}{c
}{Signed?
} &
3589 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3590 \bitvar{FTYPE
} & Integer &
1 & No & The frame type. \\
3591 \bitvar{NSBS
} & Integer &
32 & No & The total number of super blocks in a
3593 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
3595 \bottomrule\end{tabularx
}
3597 \paragraph{Output parameters:
}\hfill\\*
3598 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3599 \multicolumn{1}{c
}{Name
} &
3600 \multicolumn{1}{c
}{Type
} &
3601 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3602 \multicolumn{1}{c
}{Signed?
} &
3603 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3604 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
3605 1 & No & An
\bitvar{NBS
}-element array of flags
3606 indicating which blocks are coded. \\
3607 \bottomrule\end{tabularx
}
3609 \paragraph{Variables used:
}\hfill\\*
3610 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3611 \multicolumn{1}{c
}{Name
} &
3612 \multicolumn{1}{c
}{Type
} &
3613 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3614 \multicolumn{1}{c
}{Signed?
} &
3615 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3616 \locvar{NBITS
} & Integer &
36 & No & The length of a bit string to decode. \\
3617 \locvar{BITS
} & Bit string & & & A decoded set of flags. \\
3618 \locvar{SBPCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
3619 1 & No & An
\bitvar{NSBS
}-element array of flags
3620 indicating whether or not each super block is partially coded. \\
3621 \locvar{SBFCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
3622 1 & No & An
\bitvar{NSBS
}-element array of flags
3623 indicating whether or not each non-partially coded super block is fully
3625 \locvar{\sbi} & Integer &
32 & No & The index of the current super
3627 \locvar{\bi} & Integer &
36 & No & The index of the current block in coded
3629 \bottomrule\end{tabularx
}
3632 This procedure determines which blocks are coded in a given frame.
3633 In an intra frame, it marks all blocks coded.
3634 In an inter frame, however, any or all of the blocks may remain uncoded.
3635 The output is a list of bit flags, one for each block, marking it coded or not
3638 It is important to note that flags are still decoded for any blocks which lie
3639 entirely outside the picture region, even though they are not displayed.
3640 Encoders MAY choose to code such blocks.
3641 Decoders MUST faithfully reconstruct such blocks, because their contents can be
3642 used for predictors in future frames.
3643 Flags are
\textit{not
} decoded for portions of a super block which lie outside
3644 the full frame, as there are no blocks in those regions.
3646 The complete procedure is as follows:
3650 If
\bitvar{FTYPE
} is zero (intra frame):
3653 For each consecutive value of
\locvar{\bi} from
0 to $(
\locvar{NBS
}-
1)$, assign
3654 $
\bitvar{BCODED
}[\locvar{\bi}]$ the value one.
3657 Otherwise (inter frame):
3660 Assign
\locvar{NBITS
} the value
\bitvar{NSBS
}.
3662 Read an
\locvar{NBITS
}-bit bit string into
\locvar{BITS
}, using the procedure
3663 described in Section~
\ref{sub:long-run
}.
3664 This represents the list of partially coded super blocks.
3666 For each consecutive value of
\locvar{\sbi} from
0 to $(
\locvar{NSBS
}-
1)$,
3667 remove the bit at the head of the string
\locvar{BITS
} and assign it to
3668 $
\locvar{SBPCODED
}[\locvar{\sbi}]$.
3670 Assign
\locvar{NBITS
} the total number of super blocks such that \\
3671 $
\locvar{SBPCODED
}[\locvar{\sbi}]$ equals zero.
3673 Read an
\locvar{NBITS
}-bit bit string into
\locvar{BITS
}, using the procedure
3674 described in Section~
\ref{sub:long-run
}.
3675 This represents the list of fully coded super blocks.
3677 For each consecutive value of
\locvar{\sbi} from
0 to $(
\locvar{NSBS
}-
1)$ such
3678 that $
\locvar{SBPCODED
}[\locvar{\sbi}]$ equals zero, remove the bit at the
3679 head of the string
\locvar{BITS
} and assign it to
3680 $
\locvar{SBFCODED
}[\locvar{\sbi}]$.
3682 Assign
\locvar{NBITS
} the number of blocks contained in super blocks where
3683 $
\locvar{SBPCODED
}[\locvar{\sbi}]$ equals one.
3684 Note that this might
{\em not
} be equal to
16 times the number of partially
3685 coded super blocks, since super blocks which overlap the edge of the frame
3686 will have fewer than
16 blocks in them.
3688 Read an
\locvar{NBITS
}-bit bit string into
\locvar{BITS
}, using the procedure
3689 described in Section~
\ref{sub:short-run
}.
3691 For each block in coded order---indexed by
\locvar{\bi}:
3694 Assign
\locvar{\sbi} the index of the super block containing block
3697 If $
\locvar{SBPCODED
}[\locvar{\sbi}]$ is zero, assign
3698 $
\bitvar{BCODED
}[\locvar{\bi}]$ the value $
\locvar{SBFCODED
}[\locvar{\sbi}]$.
3700 Otherwise, remove the bit at the head of the string
\locvar{BITS
} and assign it
3701 to $
\bitvar{BCODED
}[\locvar{\bi}]$.
3706 \section{Macro Block Coding Modes
}
3707 \label{sub:mb-modes
}
3709 \paragraph{Input parameters:
}\hfill\\*
3710 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3711 \multicolumn{1}{c
}{Name
} &
3712 \multicolumn{1}{c
}{Type
} &
3713 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3714 \multicolumn{1}{c
}{Signed?
} &
3715 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3716 \bitvar{FTYPE
} & Integer &
1 & No & The frame type. \\
3717 \bitvar{NMBS
} & Integer &
32 & No & The total number of macro blocks in a
3719 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
3721 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
3722 1 & No & An
\bitvar{NBS
}-element array of flags
3723 indicating which blocks are coded. \\
3724 \bottomrule\end{tabularx
}
3726 \paragraph{Output parameters:
}\hfill\\*
3727 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3728 \multicolumn{1}{c
}{Name
} &
3729 \multicolumn{1}{c
}{Type
} &
3730 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3731 \multicolumn{1}{c
}{Signed?
} &
3732 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3733 \bitvar{MBMODES
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
3734 3 & No & An
\bitvar{NMBS
}-element array of coding
3735 modes for each macro block. \\
3736 \bottomrule\end{tabularx
}
3738 \paragraph{Variables used:
}\hfill\\*
3739 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3740 \multicolumn{1}{c
}{Name
} &
3741 \multicolumn{1}{c
}{Type
} &
3742 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3743 \multicolumn{1}{c
}{Signed?
} &
3744 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3745 \locvar{MSCHEME
} & Integer &
3 & No & The mode coding scheme. \\
3746 \locvar{MALPHABET
} &
\multicolumn{1}{p
{40pt
}}{Integer array
}
3747 &
3 & No & The list of modes corresponding to each
3749 \locvar{\mbi} & Integer &
32 & No & The index of the current macro
3751 \locvar{\bi} & Integer &
36 & No & The index of the current block in
3753 \locvar{\mi} & Integer &
32 & No & The index of a Huffman code from
3754 Table~
\ref{tab:mode-codes
}, starting from $
0$. \\
3755 \bottomrule\end{tabularx
}
3758 In an intra frame, every macro block marked as coded in INTRA mode.
3759 In an inter frame, however, a macro block can be coded in one of eight coding
3760 modes, given in Table~
\ref{tab:coding-modes
}.
3761 All of the blocks in all
color planes contained in a macro block will be
3762 assigned the coding mode of that macro block.
3766 \begin{tabular
}{cl
}\toprule
3767 Index & Coding Mode \\
\midrule
3768 $
0$ & INTER
\_NOMV \\
3771 $
3$ & INTER
\_MV\_LAST \\
3772 $
4$ & INTER
\_MV\_LAST2 \\
3773 $
5$ & INTER
\_GOLDEN\_NOMV \\
3774 $
6$ & INTER
\_GOLDEN\_MV \\
3775 $
7$ & INTER
\_MV\_FOUR \\
3776 \bottomrule\end{tabular
}
3778 \caption{Coding Modes
}
3779 \label{tab:coding-modes
}
3782 An important thing to note is that a coding mode is only stored in the
3783 bitstream for a macro block if it has at least one
{\em luma
} block coded.
3784 A macro block that contains coded blocks in the chroma planes, but not in the
3785 luma plane, MUST be coded in INTER
\_NOMV mode.
3786 Thus, no coding mode needs to be decoded for such a macro block.
3788 Coding modes are encoded using one of eight different schemes.
3789 Schemes
0 through
6 use the same simple Huffman code to represent the mode
3790 numbers, as given in Table~
\ref{tab:mode-codes
}.
3791 The difference in the schemes is the mode number assigned to each code.
3792 Scheme
0 uses an assignment specified in the bitstream, while schemes
1--
6 use
3793 a fixed assignment, also given in Table~
\ref{tab:mode-codes
}.
3794 Scheme
7 simply codes each mode directly in the bitstream using three bits.
3798 \begin{tabular
}{lcccccc
}\toprule
3799 Scheme & $
1$ & $
2$ & $
3$ & $
4$ & $
5$ & $
6$ \\
\cmidrule{2-
7}
3800 Huffman Code &
\multicolumn{6}{c
}{Coding Mode
} \\
\midrule
3801 \bin{0} & $
3$ & $
3$ & $
3$ & $
3$ & $
0$ & $
0$ \\
3802 \bin{10} & $
4$ & $
4$ & $
2$ & $
2$ & $
3$ & $
5$ \\
3803 \bin{110} & $
2$ & $
0$ & $
4$ & $
0$ & $
4$ & $
3$ \\
3804 \bin{1110} & $
0$ & $
2$ & $
0$ & $
4$ & $
2$ & $
4$ \\
3805 \bin{11110} & $
1$ & $
1$ & $
1$ & $
1$ & $
1$ & $
2$ \\
3806 \bin{111110} & $
5$ & $
5$ & $
5$ & $
5$ & $
5$ & $
1$ \\
3807 \bin{1111110} & $
6$ & $
6$ & $
6$ & $
6$ & $
6$ & $
6$ \\
3808 \bin{1111111} & $
7$ & $
7$ & $
7$ & $
7$ & $
7$ & $
7$ \\
3809 \bottomrule\end{tabular
}
3811 \caption{Coding Modes
}
3812 \label{tab:mode-codes
}
3817 If
\bitvar{FTYPE
} is
0 (intra frame):
3820 For each consecutive value of
\locvar{\mbi} from
0 to $(
\bitvar{NMBS
}-
1)$,
3821 inclusive, assign $
\bitvar{MBMODES
}[\mbi]$ the value
1 (INTRA).
3824 Otherwise (inter frame):
3827 Read a
3-bit unsigned integer as
\locvar{MSCHEME
}.
3829 If
\locvar{MSCHEME
} is
0:
3832 For each consecutive value of
\locvar{MODE
} from
0 to
7, inclusive:
3835 Read a
3-bit unsigned integer as
\locvar{\mi}.
3837 Assign $
\locvar{MALPHABET
}[\mi]$ the value
\locvar{MODE
}.
3841 Otherwise, if
\locvar{MSCHEME
} is not
7, assign the entries of
3842 \locvar{MALPHABET
} the values in the corresponding column of
3843 Table~
\ref{tab:mode-codes
}.
3845 For each consecutive macro block in coded order (cf.
3846 Section~
\ref{sec:mbs
})---indexed by
\locvar{\mbi}:
3849 If a block
\locvar{\bi} in the luma plane of macro block
\locvar{\mbi} exists
3850 such that $
\bitvar{BCODED
}[\locvar{\bi}]$ is
1:
3853 If
\locvar{MSCHEME
} is not
7, read one bit at a time until one of the Huffman
3854 codes in Table~
\ref{tab:mode-codes
} is recognized, and assign
3855 $
\bitvar{MBMODES
}[\locvar{\mbi}]$ the value
3856 $
\locvar{MALPHABET
}[\locvar{\mi}]$, where
\locvar{\mi} is the index of the
3857 Huffman code decoded.
3859 Otherwise, if no luma-plane blocks in the macro block are coded, read a
3-bit
3860 unsigned integer as $
\bitvar{MBMODES
}[\locvar{\mbi}]$.
3863 Otherwise, assign $
\bitvar{MBMODE
}[\locvar{\mbi}]$ the value
0 (INTER
\_NOMV).
3868 \section{Motion Vectors
}
3870 In an intra frame, no motion vectors are used, and so motion vector decoding is
3872 In an inter frame, however, many of the inter coding modes require a motion
3873 vector in order to specify an offset into the reference frame from which to
3875 These procedures assigns such a motion vector to every block.
3877 \subsection{Motion Vector Decode
}
3878 \label{sub:mv-decode
}
3880 \paragraph{Input parameters:
}\hfill\\*
3881 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3882 \multicolumn{1}{c
}{Name
} &
3883 \multicolumn{1}{c
}{Type
} &
3884 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3885 \multicolumn{1}{c
}{Signed?
} &
3886 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3887 \bitvar{MVMODE
} & Integer &
1 & No & The motion vector decoding method. \\
3888 \bottomrule\end{tabularx
}
3890 \paragraph{Output parameters:
}\hfill\\*
3891 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3892 \multicolumn{1}{c
}{Name
} &
3893 \multicolumn{1}{c
}{Type
} &
3894 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3895 \multicolumn{1}{c
}{Signed?
} &
3896 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3897 \bitvar{MVX
} & Integer &
6 & Yes & The X component of the motion
3899 \bitvar{MVY
} & Integer &
6 & Yes & The Y component of the motion
3901 \bottomrule\end{tabularx
}
3903 \paragraph{Variables used:
}\hfill\\*
3904 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
3905 \multicolumn{1}{c
}{Name
} &
3906 \multicolumn{1}{c
}{Type
} &
3907 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
3908 \multicolumn{1}{c
}{Signed?
} &
3909 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
3910 \locvar{MVSIGN
} & Integer &
1 & No & The sign of the motion vector component
3912 \bottomrule\end{tabularx
}
3915 The individual components of a motion vector can be coded using one of two
3917 The first uses a variable length Huffman code, given in
3918 Table~
\ref{tab:mv-huff-codes
}.
3919 The second encodes the magnitude of the component directly in
5 bits, and the
3921 Note that in this case there are two representations for the value zero.
3922 For compatibility with VP3, a sign bit is read even if the magnitude read is
3924 One scheme is chosen and used for the entire frame.
3926 Each component can take on integer values from $-
31\ldots 31$, inclusive, at
3927 half-pixel resolution, i.e. $-
15.5\ldots 15.5$ pixels in the luma plane.
3928 For each subsampled axis in the chroma planes, the corresponding motion vector
3929 component is interpreted as being at quarter-pixel resolution, i.e.
3930 $-
7.75\ldots 7.75$ pixels.
3931 The precise details of how these vectors are used to compute predictors for
3932 each block are described in Section~
\ref{sec:predictors
}.
3936 \begin{tabular
}{lrlr
}\toprule
3937 Huffman Code & Value & Huffman Code & Value \\
\midrule
3939 \bin{001} & $
1$ &
\bin{010} & $-
1$ \\
3940 \bin{0110} & $
2$ &
\bin{0111} & $-
2$ \\
3941 \bin{1000} & $
3$ &
\bin{1001} & $-
3$ \\
3942 \bin{101000} & $
4$ &
\bin{101001} & $-
4$ \\
3943 \bin{101010} & $
5$ &
\bin{101011} & $-
5$ \\
3944 \bin{101100} & $
6$ &
\bin{101101} & $-
6$ \\
3945 \bin{101110} & $
7$ &
\bin{101111} & $-
7$ \\
3946 \bin{1100000} & $
8$ &
\bin{1100001} & $-
8$ \\
3947 \bin{1100010} & $
9$ &
\bin{1100011} & $-
9$ \\
3948 \bin{1100100} & $
10$ &
\bin{1100101} & $-
10$ \\
3949 \bin{1100110} & $
11$ &
\bin{1100111} & $-
11$ \\
3950 \bin{1101000} & $
12$ &
\bin{1101001} & $-
12$ \\
3951 \bin{1101010} & $
13$ &
\bin{1101011} & $-
13$ \\
3952 \bin{1101100} & $
14$ &
\bin{1101101} & $-
14$ \\
3953 \bin{1101110} & $
15$ &
\bin{1101111} & $-
15$ \\
3954 \bin{11100000} & $
16$ &
\bin{11100001} & $-
16$ \\
3955 \bin{11100010} & $
17$ &
\bin{11100011} & $-
17$ \\
3956 \bin{11100100} & $
18$ &
\bin{11100101} & $-
18$ \\
3957 \bin{11100110} & $
19$ &
\bin{11100111} & $-
19$ \\
3958 \bin{11101000} & $
20$ &
\bin{11101001} & $-
20$ \\
3959 \bin{11101010} & $
21$ &
\bin{11101011} & $-
21$ \\
3960 \bin{11101100} & $
22$ &
\bin{11101101} & $-
22$ \\
3961 \bin{11101110} & $
23$ &
\bin{11101111} & $-
23$ \\
3962 \bin{11110000} & $
24$ &
\bin{11110001} & $-
24$ \\
3963 \bin{11110010} & $
25$ &
\bin{11110011} & $-
25$ \\
3964 \bin{11110100} & $
26$ &
\bin{11110101} & $-
26$ \\
3965 \bin{11110110} & $
27$ &
\bin{11110111} & $-
27$ \\
3966 \bin{11111000} & $
28$ &
\bin{11111001} & $-
28$ \\
3967 \bin{11111010} & $
29$ &
\bin{11111011} & $-
29$ \\
3968 \bin{11111100} & $
30$ &
\bin{11111101} & $-
30$ \\
3969 \bin{11111110} & $
31$ &
\bin{11111111} & $-
31$ \\
3970 \bottomrule\end{tabular
}
3972 \caption{Huffman Codes for Motion Vector Components
}
3973 \label{tab:mv-huff-codes
}
3976 A single motion vector is decoded is follows:
3980 If
\bitvar{MVMODE
} is
0:
3983 Read
1 bit at a time until one of the Huffman codes in
3984 Table~
\ref{tab:mv-huff-codes
} is recognized, and assign the value to
3987 Read
1 bit at a time until one of the Huffman codes in
3988 Table~
\ref{tab:mv-huff-codes
} is recognized, and assign the value to
3995 Read a
5-bit unsigned integer as
\bitvar{MVX
}.
3997 Read a
1-bit unsigned integer as
\locvar{MVSIGN
}.
3999 If
\locvar{MVSIGN
} is
1, assign
\bitvar{MVX
} the value $-
\bitvar{MVX
}$.
4001 Read a
5-bit unsigned integer as
\bitvar{MVY
}.
4003 Read a
1-bit unsigned integer as
\locvar{MVSIGN
}.
4005 If
\locvar{MVSIGN
} is
1, assign
\bitvar{MVY
} the value $-
\bitvar{MVY
}$.
4009 \subsection{Macro Block Motion Vector Decode
}
4010 \label{sub:mb-mv-decode
}
4012 \paragraph{Input parameters:
}\hfill\\*
4013 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4014 \multicolumn{1}{c
}{Name
} &
4015 \multicolumn{1}{c
}{Type
} &
4016 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4017 \multicolumn{1}{c
}{Signed?
} &
4018 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4019 \bitvar{PF
} & Integer &
2 & No & The pixel format. \\
4020 \bitvar{NMBS
} & Integer &
32 & No & The total number of macro blocks in a
4022 \bitvar{MBMODES
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4023 3 & No & An
\bitvar{NMBS
}-element array of coding
4024 modes for each macro block. \\
4025 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
4027 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4028 1 & No & An
\bitvar{NBS
}-element array of flags
4029 indicating which blocks are coded. \\
4030 \bottomrule\end{tabularx
}
4032 \paragraph{Output parameters:
}\hfill\\*
4033 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4034 \multicolumn{1}{c
}{Name
} &
4035 \multicolumn{1}{c
}{Type
} &
4036 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4037 \multicolumn{1}{c
}{Signed?
} &
4038 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4039 \bitvar{MVECTS
} &
\multicolumn{1}{p
{50pt
}}{Array of
2D Integer Vectors
} &
4040 6 & Yes & An
\bitvar{NBS
}-element array of
4041 motion vectors for each block. \\
4042 \bottomrule\end{tabularx
}
4044 \paragraph{Variables used:
}\hfill\\*
4045 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4046 \multicolumn{1}{c
}{Name
} &
4047 \multicolumn{1}{c
}{Type
} &
4048 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4049 \multicolumn{1}{c
}{Signed?
} &
4050 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4051 \locvar{LAST1
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Vector
} &
4052 6 & Yes & The last motion vector. \\
4053 \locvar{LAST2
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Vector
} &
4054 6 & Yes & The second to last motion vector. \\
4055 \locvar{MVX
} & Integer &
6 & Yes & The X component of a motion vector. \\
4056 \locvar{MVY
} & Integer &
6 & Yes & The Y component of a motion vector. \\
4057 \locvar{\mbi} & Integer &
32 & No & The index of the current macro
4059 \locvar{A
} & Integer &
36 & No & The index of the lower-left luma block
4060 in the macro block. \\
4061 \locvar{B
} & Integer &
36 & No & The index of the lower-right luma
4062 block in the macro block. \\
4063 \locvar{C
} & Integer &
36 & No & The index of the upper-left luma block
4064 in the macro block. \\
4065 \locvar{D
} & Integer &
36 & No & The index of the upper-right luma
4066 block in the macro block. \\
4067 \locvar{E
} & Integer &
36 & No & The index of a chroma block in the
4068 macro block, depending on the pixel format. \\
4069 \locvar{F
} & Integer &
36 & No & The index of a chroma block in the
4070 macro block, depending on the pixel format. \\
4071 \locvar{G
} & Integer &
36 & No & The index of a chroma block in the
4072 macro block, depending on the pixel format. \\
4073 \locvar{H
} & Integer &
36 & No & The index of a chroma block in the
4074 macro block, depending on the pixel format. \\
4075 \locvar{I
} & Integer &
36 & No & The index of a chroma block in the
4076 macro block, depending on the pixel format. \\
4077 \locvar{J
} & Integer &
36 & No & The index of a chroma block in the
4078 macro block, depending on the pixel format. \\
4079 \locvar{K
} & Integer &
36 & No & The index of a chroma block in the
4080 macro block, depending on the pixel format. \\
4081 \locvar{L
} & Integer &
36 & No & The index of a chroma block in the
4082 macro block, depending on the pixel format. \\
4083 \bottomrule\end{tabularx
}
4086 Motion vectors are stored for each macro block.
4087 In every mode except for INTER
\_MV\_FOUR, every block in all the
color planes
4088 are assigned the same motion vector.
4089 In INTER
\_MV\_FOUR mode, all four blocks in the luma plane are assigned their
4090 own motion vector, and motion vectors for blocks in the chroma planes are
4091 computed from these, using averaging appropriate to the pixel format.
4093 For INTER
\_MV and INTER
\_GOLDEN\_MV modes, a single motion vector is decoded
4094 and applied to each block.
4095 For INTER
\_MV\_FOUR macro blocks, a motion vector is decoded for each coded
4097 Uncoded luma blocks receive the default $(
0,
0)$ vector for the purposes of
4098 computing the chroma motion vectors.
4100 None of the remaining macro block coding modes require decoding motion vectors
4102 INTRA mode does not use a motion-compensated predictor, and so requires no
4103 motion vector, and INTER
\_NOMV and INTER
\_GOLDEN\_NOMV modes use the default
4104 vector $(
0,
0)$ for each block.
4105 This also includes all macro blocks with no coded luma blocks, as they are
4106 coded in INTER
\_NOMV mode by definition.
4108 The modes INTER
\_MV\_LAST and INTER
\_MV\_LAST2 use the motion vector from the
4109 last macro block (in coded order) and the second to last macro block,
4110 respectively, that contained a motion vector pointing to the previous frame.
4111 Thus no explicit motion vector needs to be decoded for these modes.
4112 Macro blocks coded in INTRA mode or one of the GOLDEN modes are not considered
4114 If an insufficient number of macro blocks have been coded in one of the INTER
4115 modes, then the $(
0,
0)$ vector is used instead.
4116 For macro blocks coded in INTER
\_MV\_FOUR mode, the vector from the upper-right
4117 luma block is used, even if the upper-right block is not coded.
4119 The motion vectors are decoded from the stream as follows:
4123 Assign
\locvar{LAST1
} and
\locvar{LAST2
} both the value $(
0,
0)$.
4125 Read a
1-bit unsigned integer as
\locvar{MVMODE
}.
4126 Note that this value is read even if no macro blocks require a motion vector to
4129 For each consecutive value of
\locvar{\mbi} from
0 to $(
\bitvar{NMBS
}-
1)$:
4132 If $
\bitvar{MBMODES
}[\locvar{\mbi}]$ is
7 (INTER
\_MV\_FOUR):
4135 Let
\locvar{A
},
\locvar{B
},
\locvar{C
}, and
\locvar{D
} be the indices in coded
4136 order
\locvar{\bi} of the luma blocks in macro block
\locvar{\mbi}, arranged
4138 Thus,
\locvar{A
} is the index in coded order of the block in the lower left,
4139 \locvar{B
} the lower right,
\locvar{C
} the upper left, and
\locvar{D
} the
4140 upper right.
% TODO: as shown in Figure~REF.
4142 If $
\bitvar{BCODED
}[\locvar{A
}]$ is non-zero, decode a single motion vector
4143 into
\locvar{MVX
} and
\locvar{MVY
} using the procedure described in
4144 Section~
\ref{sub:mv-decode
}.
4146 Otherwise, assign
\locvar{MVX
} and
\locvar{MVY
} both the value zero.
4148 Assign $
\bitvar{MVECTS
}[\locvar{A
}]$ the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4150 If $
\bitvar{BCODED
}[\locvar{B
}]$ is non-zero, decode a single motion vector
4151 into
\locvar{MVX
} and
\locvar{MVY
} using the procedure described in
4152 Section~
\ref{sub:mv-decode
}.
4154 Otherwise, assign
\locvar{MVX
} and
\locvar{MVY
} both the value zero.
4156 Assign $
\bitvar{MVECTS
}[\locvar{B
}]$ the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4158 If $
\bitvar{BCODED
}[\locvar{C
}]$ is non-zero, decode a single motion vector
4159 into
\locvar{MVX
} and
\locvar{MVY
} using the procedure described in
4160 Section~
\ref{sub:mv-decode
}.
4162 Otherwise, assign
\locvar{MVX
} and
\locvar{MVY
} both the value zero.
4164 Assign $
\bitvar{MVECTS
}[\locvar{C
}]$ the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4166 If $
\bitvar{BCODED
}[\locvar{D
}]$ is non-zero, decode a single motion vector
4167 into
\locvar{MVX
} and
\locvar{MVY
} using the procedure described in
4168 Section~
\ref{sub:mv-decode
}.
4170 Otherwise, assign
\locvar{MVX
} and
\locvar{MVY
} both the value zero.
4172 Assign $
\bitvar{MVECTS
}[\locvar{D
}]$ the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4173 Note that
\locvar{MVX
} and
\locvar{MVY
} retain this last value.
4175 If
\bitvar{PF
} is
0 (
4:
2:
0):
4178 Let
\locvar{E
} and
\locvar{F
} be the index in coded order of the one block in
4179 the macro block from the $C_b$ and $C_r$ planes, respectively.
4181 Assign $
\bitvar{MVECTS
}[\locvar{E
}]$ and $
\bitvar{MVECTS
}[\locvar{F
}]$ the
4184 (
\round\biggl(
\frac{\begin{aligned
}
4185 \bitvar{MVECTS
}[\locvar{A
}]_x+
\bitvar{MVECTS
}[\locvar{B
}]_x+\\
4186 \bitvar{MVECTS
}[\locvar{C
}]_x+
\bitvar{MVECTS
}[\locvar{D
}]_x
4187 \end{aligned
}}{4}\biggr), \\
4188 \round\biggl(
\frac{\begin{aligned
}
4189 \bitvar{MVECTS
}[\locvar{A
}]_y+
\bitvar{MVECTS
}[\locvar{B
}]_y+\\
4190 \bitvar{MVECTS
}[\locvar{C
}]_y+
\bitvar{MVECTS
}[\locvar{D
}]_y
4191 \end{aligned
}}{4}\biggr))
4195 If
\bitvar{PF
} is
2 (
4:
2:
2):
4198 Let
\locvar{E
} and
\locvar{F
} be the indices in coded order of the top and
4199 bottom blocks in the macro block from the $C_b$ plane, respectively, and
4200 \locvar{G
} and
\locvar{H
} be the indices in coded order of the top and bottom
4201 blocks in the $C_r$ plane, respectively.
%TODO: as shown in Figure~REF.
4203 Assign $
\bitvar{MVECTS
}[\locvar{E
}]$ and $
\bitvar{MVECTS
}[\locvar{G
}]$ the
4207 \bitvar{MVECTS
}[\locvar{A
}]_x+
\bitvar{MVECTS
}[\locvar{B
}]_x
}{4}\right), \\
4209 \bitvar{MVECTS
}[\locvar{A
}]_y+
\bitvar{MVECTS
}[\locvar{B
}]_y
}{4}\right))
4212 Assign $
\bitvar{MVECTS
}[\locvar{F
}]$ and $
\bitvar{MVECTS
}[\locvar{H
}]$ the
4216 \bitvar{MVECTS
}[\locvar{C
}]_x+
\bitvar{MVECTS
}[\locvar{D
}]_x
}{4}\right), \\
4218 \bitvar{MVECTS
}[\locvar{C
}]_y+
\bitvar{MVECTS
}[\locvar{D
}]_y
}{4}\right))
4222 If
\bitvar{PF
} is
3 (
4:
4:
4):
4225 Let
\locvar{E
},
\locvar{F
},
\locvar{G
}, and
\locvar{H
} be the indices
4226 \locvar{\bi} in coded order of the $C_b$ plane blocks in macro block
4227 \locvar{\mbi}, arranged into raster order, and
\locvar{I
},
\locvar{J
},
4228 \locvar{K
}, and
\locvar{L
} be the indices
\locvar{\bi} in coded order of the
4229 $C_r$ plane blocks in macro block
\locvar{\mbi}, arranged into raster order.
4230 %TODO: as shown in Figure~REF.
4232 Assign $
\bitvar{MVECTS
}[\locvar{E
}]$ and $
\bitvar{MVECTS
}[\locvar{I
}]$ the
4233 value \\ $
\bitvar{MVECTS
}[\locvar{A
}]$.
4235 Assign $
\bitvar{MVECTS
}[\locvar{F
}]$ and $
\bitvar{MVECTS
}[\locvar{J
}]$ the
4236 value \\ $
\bitvar{MVECTS
}[\locvar{B
}]$.
4238 Assign $
\bitvar{MVECTS
}[\locvar{G
}]$ and $
\bitvar{MVECTS
}[\locvar{K
}]$ the
4239 value \\ $
\bitvar{MVECTS
}[\locvar{C
}]$.
4241 Assign $
\bitvar{MVECTS
}[\locvar{H
}]$ and $
\bitvar{MVECTS
}[\locvar{L
}]$ the
4242 value \\ $
\bitvar{MVECTS
}[\locvar{D
}]$.
4245 Assign
\locvar{LAST2
} the value
\locvar{LAST1
}.
4247 Assign
\locvar{LAST1
} the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4250 Otherwise, if $
\bitvar{MBMODES
}[\locvar{\mbi}]$ is
6 (INTER
\_GOLDEN\_MV),
4251 decode a single motion vector into
\locvar{MVX
} and
\locvar{MVY
} using the
4252 procedure described in Section~
\ref{sub:mv-decode
}.
4254 Otherwise, if $
\bitvar{MBMODES
}[\locvar{\mbi}]$ is
4 (INTER
\_MV\_LAST2):
4257 Assign $(
\locvar{MVX
},
\locvar{MVY
})$ the value
\locvar{LAST2
}.
4259 Assign
\locvar{LAST2
} the value
\locvar{LAST1
}.
4261 Assign
\locvar{LAST1
} the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4264 Otherwise, if $
\bitvar{MBMODES
}[\locvar{\mbi}]$ is
3 (INTER
\_MV\_LAST), assign
4265 $(
\locvar{MVX
},
\locvar{MVY
})$ the value
\locvar{LAST1
}.
4267 Otherwise, if $
\bitvar{MBMODES
}[\locvar{\mbi}]$ is
2 (INTER
\_MV):
4270 Decode a single motion vector into
\locvar{MVX
} and
\locvar{MVY
} using the
4271 procedure described in Section~
\ref{sub:mv-decode
}.
4273 Assign
\locvar{LAST2
} the value
\locvar{LAST1
}.
4275 Assign
\locvar{LAST1
} the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4278 Otherwise ($
\bitvar{MBMODES
}[\locvar{\mbi}]$ is
5:~INTER
\_GOLDEN\_NOMV,
4279 1:~INTRA, or
0:~INTER
\_NOMV), assign
\locvar{MVX
} and
\locvar{MVY
} the value
4282 If $
\bitvar{MBMODES
}[\locvar{\mbi}]$ is not
7 (not INTER
\_MV\_FOUR), then for
4283 each coded block
\locvar{\bi} in macro block
\locvar{\mbi}:
4286 Assign $
\bitvar{MVECTS
}[\locvar{\bi}]$ the value $(
\locvar{MVX
},
\locvar{MVY
})$.
4291 \paragraph{VP3 Compatibility
}
4293 Unless all four luma blocks in the macro block are coded, the VP3 encoder does
4294 not select mode INTER
\_MV\_FOUR.
4295 Theora removes this restriction by treating the motion vector for an uncoded
4296 luma block as the default $(
0,
0)$ vector.
4297 This is consistent with the premise that the block has not changed since the
4298 previous frame and that chroma information can be largely ignored when
4301 No modification is required for INTER
\_MV\_FOUR macro blocks in VP3 streams to
4302 be decoded correctly by a Theora decoder.
4303 However, regardless of how many of the luma blocks are actually coded, the VP3
4304 decoder always reads four motion vectors from the stream for INTER
\_MV\_FOUR
4306 The motion vectors read are used to calculate the motion vectors for the chroma
4307 blocks, but are otherwise ignored.
4308 Thus, care should be taken when creating Theora streams meant to be backwards
4309 compatible with VP3 to only use INTER
\_MV\_FOUR mode when all four luma
4312 \section{Block-Level
\qi\ Decode
}
4313 \label{sub:block-qis
}
4315 \paragraph{Input parameters:
}\hfill\\*
4316 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4317 \multicolumn{1}{c
}{Name
} &
4318 \multicolumn{1}{c
}{Type
} &
4319 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4320 \multicolumn{1}{c
}{Signed?
} &
4321 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4322 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
4324 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4325 1 & No & An
\bitvar{NBS
}-element array of flags
4326 indicating which blocks are coded. \\
4327 \bitvar{NQIS
} & Integer &
2 & No & The number of
\qi\ values. \\
4328 \bottomrule\end{tabularx
}
4330 \paragraph{Output parameters:
}\hfill\\*
4331 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4332 \multicolumn{1}{c
}{Name
} &
4333 \multicolumn{1}{c
}{Type
} &
4334 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4335 \multicolumn{1}{c
}{Signed?
} &
4336 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4337 \bitvar{QIIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4338 2 & No & An
\bitvar{NBS
}-element array of
4339 \locvar{\qii} values for each block. \\
4340 \bottomrule\end{tabularx
}
4342 \paragraph{Variables used:
}\hfill\\*
4343 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4344 \multicolumn{1}{c
}{Name
} &
4345 \multicolumn{1}{c
}{Type
} &
4346 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4347 \multicolumn{1}{c
}{Signed?
} &
4348 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4349 \locvar{NBITS
} & Integer &
36 & No & The length of a bit string to decode. \\
4350 \locvar{BITS
} & Bit string & & & A decoded set of flags. \\
4351 \locvar{\bi} & Integer &
36 & No & The index of the current block in
4353 \locvar{\qii} & Integer &
2 & No & The index of
\qi\ value in the list of
4354 \qi\ values defined for this frame. \\
4355 \bottomrule\end{tabularx
}
4358 This procedure selects the
\qi\ value to be used for dequantizing the AC
4359 coefficients of each block.
4360 DC coefficients all use the same
\qi\ value, so as to avoid interference with
4361 the DC prediction mechanism, which occurs in the quantized domain.
4363 The value is actually represented by an index
\locvar{\qii} into the list of
4364 \qi\ values defined for the frame.
4365 The decoder makes multiple passes through the list of coded blocks, one for
4366 each
\qi\ value except the last one.
4367 In each pass, an RLE-coded bitmask is decoded to divide the blocks into two
4368 groups: those that use the current
\qi\ value in the list, and those that use
4369 a value from later in the list.
4370 Each subsequent pass is restricted to the blocks in the second group.
4374 For each value of
\locvar{\bi} from
0 to $(
\bitvar{NBS
}-
1)$, assign
4375 $
\bitvar{QIIS
}[\locvar{\bi}]$ the value zero.
4377 For each consecutive value of
\locvar{\qii} from
0 to $(
\bitvar{NQIS
}-
2)$:
4380 Assign
\locvar{NBITS
} be the number of blocks
\locvar{\bi} such that
4381 $
\bitvar{BCODED
}[\locvar{\bi}]$ is non-zero and $
\bitvar{QIIS
}[\locvar{\bi}]$
4382 equals $
\locvar{\qii}$.
4384 Read an
\locvar{NBITS
}-bit bit string into
\locvar{BITS
}, using the procedure
4385 described in Section~
\ref{sub:long-run
}.
4386 This represents the list of blocks that use
\qi\ value
\locvar{\qii} or higher.
4388 For each consecutive value of
\locvar{\bi} from
0 to $(
\bitvar{NBS
}-
1)$ such
4389 that $
\bitvar{BCODED
}[\locvar{\bi}]$ is non-zero and
4390 $
\bitvar{QIIS
}[\locvar{\bi}]$ equals $
\locvar{\qii}$:
4393 Remove the bit at the head of the string
\locvar{BITS
} and add its value to
4394 $
\bitvar{QIIS
}[\locvar{\bi}]$.
4399 \paragraph{VP3 Compatibility
}
4401 For VP3 compatible streams, only one
\qi\ value can be specified in the frame
4402 header, so the main loop of the above procedure, which would iterate from $
0$
4403 to $-
1$, is never executed.
4404 Thus, no bits are read, and each block uses the one
\qi\ value defined for the
4409 \section{DCT Coefficients
}
4410 \label{sec:dct-decode
}
4412 The quantized DCT coefficients are decoded by making
64 passes through the list
4413 of coded blocks, one for each token index in zig-zag order.
4414 For the DC tokens, two Huffman tables are chosen from among the first
16, one
4415 for the luma plane and one for the chroma planes.
4416 The AC tokens, however, are divided into four different groups.
4417 Again, two
4-bit indices are decoded, one for the luma plane, and one for the
4418 chroma planes, but these select the codebooks for
{\em all four
} groups.
4419 AC coefficients in group one use codebooks $
16\ldots 31$, while group two uses
4421 Note that this second set of indices is decoded even if there are no non-zero
4422 AC coefficients in the frame.
4424 Tokens are divided into two major types: EOB tokens, which fill the remainder
4425 of one or more blocks with zeros, and coefficient tokens, which fill in one or
4426 more coefficients within a single block.
4427 A decoding procedure for the first is given in Section~
\ref{sub:eob-token
}, and
4428 for the second in Section~
\ref{sub:coeff-token
}.
4429 The decoding procedure for the complete set of quantized coefficients is given
4430 in Section~
\ref{sub:dct-coeffs
}.
4432 \subsection{EOB Token Decode
}
4433 \label{sub:eob-token
}
4435 \paragraph{Input parameters:
}\hfill\\*
4436 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4437 \multicolumn{1}{c
}{Name
} &
4438 \multicolumn{1}{c
}{Type
} &
4439 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4440 \multicolumn{1}{c
}{Signed?
} &
4441 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4442 \bitvar{TOKEN
} & Integer &
5 & No & The token being decoded.
4443 This must be in the range $
0\ldots 6$. \\
4444 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
4446 \bitvar{TIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4447 7 & No & An
\bitvar{NBS
}-element array of the
4448 current token index for each block. \\
4449 \bitvar{NCOEFFS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4450 7 & No & An
\bitvar{NBS
}-element array of the
4451 coefficient count for each block. \\
4452 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
4453 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
4454 quantized DCT coefficient values for each block in zig-zag order. \\
4455 \bitvar{\bi} & Integer &
36 & No & The index of the current block in
4457 \bitvar{\ti} & Integer &
6 & No & The current token index. \\
4458 \bottomrule\end{tabularx
}
4460 \paragraph{Output parameters:
}\hfill\\*
4461 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4462 \multicolumn{1}{c
}{Name
} &
4463 \multicolumn{1}{c
}{Type
} &
4464 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4465 \multicolumn{1}{c
}{Signed?
} &
4466 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4467 \bitvar{TIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4468 7 & No & An
\bitvar{NBS
}-element array of the
4469 current token index for each block. \\
4470 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
4471 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
4472 quantized DCT coefficient values for each block in zig-zag order. \\
4473 \bitvar{EOBS
} & Integer &
36 & No & The remaining length of the current
4475 \bottomrule\end{tabularx
}
4477 \paragraph{Variables used:
}\hfill\\*
4478 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4479 \multicolumn{1}{c
}{Name
} &
4480 \multicolumn{1}{c
}{Type
} &
4481 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4482 \multicolumn{1}{c
}{Signed?
} &
4483 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4484 \locvar{\bj} & Integer &
36 & No & Another index of a block in coded
4486 \locvar{\tj} & Integer &
6 & No & Another token index. \\
4487 \bottomrule\end{tabularx
}
4490 A summary of the EOB tokens is given in Table~
\ref{tab:eob-tokens
}.
4491 An important thing to note is that token
6 does not add an offset to the
4492 decoded run value, even though in general it should only be used for runs of
4494 If a value of zero is decoded for this run, it is treated as an EOB run the
4495 size of the remaining coded blocks.
4499 \begin{tabular
}{ccl
}\toprule
4500 Token Value & Extra Bits & EOB Run Lengths \\
\midrule
4504 $
3$ & $
2$ & $
4\ldots 7$ \\
4505 $
4$ & $
3$ & $
8\ldots 15$ \\
4506 $
5$ & $
4$ & $
16\ldots 31$ \\
4507 $
6$ & $
12$ & $
1\ldots 4095$, or all remaining blocks \\
4508 \bottomrule\end{tabular
}
4510 \caption{EOB Token Summary
}
4511 \label{tab:eob-tokens
}
4514 There is no restriction that one EOB token cannot be immediately followed by
4515 another, so no special cases are necessary to extend the range of the maximum
4516 run length as were required in Section~
\ref{sub:long-run
}.
4517 Indeed, depending on the lengths of the Huffman codes, it may even cheaper to
4518 encode, by way of example, an EOB run of length
31 followed by an EOB run of
4519 length
1 than to encode an EOB run of length
32 directly.
4520 There is also no restriction that an EOB run stop at the end of a
color plane
4522 The run MUST, however, end at or before the end of the frame.
4526 If
\bitvar{TOKEN
} is
0, assign
\bitvar{EOBS
} the value
1.
4528 Otherwise, if
\bitvar{TOKEN
} is
1, assign
\bitvar{EOBS
} the value
2.
4530 Otherwise, if
\bitvar{TOKEN
} is
2, assign
\bitvar{EOBS
} the value
3.
4532 Otherwise, if
\bitvar{TOKEN
} is
3:
4535 Read a
2-bit unsigned integer as
\bitvar{EOBS
}.
4537 Assign
\bitvar{EOBS
} the value $(
\bitvar{EOBS
}+
4)$.
4540 Otherwise, if
\bitvar{TOKEN
} is
4:
4543 Read a
3-bit unsigned integer as
\bitvar{EOBS
}.
4545 Assign
\bitvar{EOBS
} the value $(
\bitvar{EOBS
}+
8)$.
4548 Otherwise, if
\bitvar{TOKEN
} is
5:
4551 Read a
4-bit unsigned integer as
\bitvar{EOBS
}.
4553 Assign
\bitvar{EOBS
} the value $(
\bitvar{EOBS
}+
16)$.
4556 Otherwise,
\bitvar{TOKEN
} is
6:
4559 Read a
12-bit unsigned integer as
\bitvar{EOBS
}.
4561 If
\bitvar{EOBS
} is zero, assign
\bitvar{EOBS
} to be the number of coded blocks
4562 \locvar{\bj} such that $
\bitvar{TIS
}[\locvar{\bj}]$ is less than
64.
4565 For each value of
\locvar{\tj} from $
\bitvar{\ti}$ to
63, assign
4566 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
4568 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4570 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value
64.
4572 Assign
\bitvar{EOBS
} the value $(
\bitvar{EOBS
}-
1)$.
4575 \paragraph{VP3 Compatibility
}
4577 The VP3 encoder does not use the special interpretation of a zero-length EOB
4578 run, though its decoder
{\em does
} support it.
4579 That may be due more to a happy accident in the way the decoder was written
4580 than intentional design, however, and other VP3 implementations might not
4581 reproduce it faithfully.
4582 For backwards compatibility, it may be wise to avoid it, especially as for most
4583 frame sizes there are fewer than
4095 blocks, making it unnecessary.
4585 \subsection{Coefficient Token Decode
}
4586 \label{sub:coeff-token
}
4588 \paragraph{Input parameters:
}\hfill\\*
4589 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4590 \multicolumn{1}{c
}{Name
} &
4591 \multicolumn{1}{c
}{Type
} &
4592 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4593 \multicolumn{1}{c
}{Signed?
} &
4594 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4595 \bitvar{TOKEN
} & Integer &
5 & No & The token being decoded.
4596 This must be in the range $
7\ldots 31$. \\
4597 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
4599 \bitvar{TIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4600 7 & No & An
\bitvar{NBS
}-element array of the
4601 current token index for each block. \\
4602 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
4603 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
4604 quantized DCT coefficient values for each block in zig-zag order. \\
4605 \bitvar{\bi} & Integer &
36 & No & The index of the current block in
4607 \bitvar{\ti} & Integer &
6 & No & The current token index. \\
4608 \bottomrule\end{tabularx
}
4610 \paragraph{Output parameters:
}\hfill\\*
4611 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4612 \multicolumn{1}{c
}{Name
} &
4613 \multicolumn{1}{c
}{Type
} &
4614 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4615 \multicolumn{1}{c
}{Signed?
} &
4616 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4617 \bitvar{TIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4618 7 & No & An
\bitvar{NBS
}-element array of the
4619 current token index for each block. \\
4620 \bitvar{NCOEFFS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
4621 7 & No & An
\bitvar{NBS
}-element array of the
4622 coefficient count for each block. \\
4623 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
4624 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
4625 quantized DCT coefficient values for each block in zig-zag order. \\
4626 \bottomrule\end{tabularx
}
4628 \paragraph{Variables used:
}\hfill\\*
4629 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
4630 \multicolumn{1}{c
}{Name
} &
4631 \multicolumn{1}{c
}{Type
} &
4632 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
4633 \multicolumn{1}{c
}{Signed?
} &
4634 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
4635 \locvar{SIGN
} & Integer &
1 & No & A flag indicating the sign of the
4636 current coefficient. \\
4637 \locvar{MAG
} & Integer &
10 & No & The magnitude of the current
4639 \locvar{RLEN
} & Integer &
6 & No & The length of the current zero run. \\
4640 \locvar{\tj} & Integer &
6 & No & Another token index. \\
4641 \bottomrule\end{tabularx
}
4644 Each of these tokens decodes one or more coefficients in the current block.
4645 A summary of the meanings of the token values is presented in
4646 Table~
\ref{tab:coeff-tokens
}.
4647 There are often several different ways to tokenize a given coefficient list.
4648 Which one is optimal depends on the exact lengths of the Huffman codes used to
4649 represent each token.
4650 Note that we do not update the coefficient count for the block if we decode a
4655 \begin{tabularx
}{\textwidth}{cclX
}\toprule
4656 Token Value & Extra Bits &
\multicolumn{1}{p
{55pt
}}{Number of Coefficients
}
4657 & Description \\
\midrule
4658 $
7$ & $
3$ & $
1\ldots 8$ & Short zero run. \\
4659 $
8$ & $
6$ & $
1\ldots 64$ & Zero run. \\
4660 $
9$ & $
0$ & $
1$ & $
1$. \\
4661 $
10$ & $
0$ & $
1$ & $-
1$. \\
4662 $
11$ & $
0$ & $
1$ & $
2$. \\
4663 $
12$ & $
0$ & $
1$ & $-
2$. \\
4664 $
13$ & $
1$ & $
1$ & $
\pm 3$. \\
4665 $
14$ & $
1$ & $
1$ & $
\pm 4$. \\
4666 $
15$ & $
1$ & $
1$ & $
\pm 5$. \\
4667 $
16$ & $
1$ & $
1$ & $
\pm 6$. \\
4668 $
17$ & $
2$ & $
1$ & $
\pm 7\ldots 8$. \\
4669 $
18$ & $
3$ & $
1$ & $
\pm 9\ldots 12$. \\
4670 $
19$ & $
4$ & $
1$ & $
\pm 13\ldots 20$. \\
4671 $
20$ & $
5$ & $
1$ & $
\pm 21\ldots 36$. \\
4672 $
21$ & $
6$ & $
1$ & $
\pm 37\ldots 68$. \\
4673 $
22$ & $
10$ & $
1$ & $
\pm 69\ldots 580$. \\
4674 $
23$ & $
1$ & $
2$ & One zero followed by $
\pm 1$. \\
4675 $
24$ & $
1$ & $
3$ & Two zeros followed by $
\pm 1$. \\
4676 $
25$ & $
1$ & $
4$ & Three zeros followed by
4678 $
26$ & $
1$ & $
5$ & Four zeros followed by
4680 $
27$ & $
1$ & $
6$ & Five zeros followed by
4682 $
28$ & $
3$ & $
7\ldots 10$ & $
6\ldots 9$ zeros followed by
4684 $
29$ & $
4$ & $
11\ldots 18$ & $
10\ldots 17$ zeros followed by
4686 $
30$ & $
2$ & $
2$ & One zero followed by
4688 $
31$ & $
3$ & $
3\ldots 4$ & $
2\ldots 3$ zeros followed by
4690 \bottomrule\end{tabularx
}
4692 \caption{Coefficient Token Summary
}
4693 \label{tab:coeff-tokens
}
4696 For tokens which represent more than one coefficient, they MUST NOT bring the
4697 total number of coefficients in the block to more than
64.
4698 Care should be taken in a decoder to check for this, as otherwise it may permit
4699 buffer overflows from invalidly formed packets.
4701 {\bf Note:
} One way to achieve this efficiently is to combine the inverse
4702 zig-zag mapping (described later in Section~
\ref{sub:dequant
}) with
4703 coefficient decode, and use a table look-up to map zig-zag indices greater
4704 than
63 to a safe location.
4709 If
\bitvar{TOKEN
} is
7:
4712 Read in a
3-bit unsigned integer as
\locvar{RLEN
}.
4714 Assign
\locvar{RLEN
} the value $(
\locvar{RLEN
}+
1)$.
4716 For each value of
\locvar{\tj} from
\bitvar{\ti} to
4717 $(
\bitvar{\ti}+
\locvar{RLEN
}-
1)$, assign
4718 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
4720 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value
4721 $
\bitvar{TIS
}[\bitvar{\bi}]+
\locvar{RLEN
}$.
4724 Otherwise, if
\bitvar{TOKEN
} is
8:
4727 Read in a
6-bit unsigned integer as
\locvar{RLEN
}.
4729 Assign
\locvar{RLEN
} the value $(
\locvar{RLEN
}+
1)$.
4731 For each value of
\locvar{\tj} from
\bitvar{\ti} to
4732 $(
\bitvar{\ti}+
\locvar{RLEN
}-
1)$, assign
4733 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
4735 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value
4736 $
\bitvar{TIS
}[\bitvar{\bi}]+
\locvar{RLEN
}$.
4739 Otherwise, if
\bitvar{TOKEN
} is
9:
4742 Assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $
1$.
4744 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4746 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4749 Otherwise, if
\bitvar{TOKEN
} is
10:
4752 Assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $-
1$.
4754 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4756 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4759 Otherwise, if
\bitvar{TOKEN
} is
11:
4762 Assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $
2$.
4764 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4766 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4769 Otherwise, if
\bitvar{TOKEN
} is
12:
4772 Assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $-
2$.
4774 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4776 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4779 Otherwise, if
\bitvar{TOKEN
} is
13:
4782 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4784 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4787 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $-
3$.
4789 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4791 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4794 Otherwise, if
\bitvar{TOKEN
} is
14:
4797 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4799 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4802 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $-
4$.
4804 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4806 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4809 Otherwise, if
\bitvar{TOKEN
} is
15:
4812 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4814 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4817 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $-
5$.
4819 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4821 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4824 Otherwise, if
\bitvar{TOKEN
} is
16:
4827 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4829 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4832 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value $-
6$.
4834 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4836 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4839 Otherwise, if
\bitvar{TOKEN
} is
17:
4842 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4844 Read a
1-bit unsigned integer as
\locvar{MAG
}.
4846 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
7)$.
4848 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4849 the value $
\locvar{MAG
}$.
4851 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value
4854 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4856 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4859 Otherwise, if
\bitvar{TOKEN
} is
18:
4862 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4864 Read a
2-bit unsigned integer as
\locvar{MAG
}.
4866 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
9)$.
4868 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4869 the value $
\locvar{MAG
}$.
4871 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value
4874 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4876 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4879 Otherwise, if
\bitvar{TOKEN
} is
19:
4882 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4884 Read a
3-bit unsigned integer as
\locvar{MAG
}.
4886 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
13)$.
4888 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4889 the value $
\locvar{MAG
}$.
4891 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value
4894 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4896 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4899 Otherwise, if
\bitvar{TOKEN
} is
20:
4902 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4904 Read a
4-bit unsigned integer as
\locvar{MAG
}.
4906 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
21)$.
4908 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4909 the value $
\locvar{MAG
}$.
4911 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value
4914 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4916 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4919 Otherwise, if
\bitvar{TOKEN
} is
21:
4922 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4924 Read a
5-bit unsigned integer as
\locvar{MAG
}.
4926 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
37)$.
4928 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4929 the value $
\locvar{MAG
}$.
4931 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value
4934 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4936 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4939 Otherwise, if
\bitvar{TOKEN
} is
22:
4942 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
4944 Read a
9-bit unsigned integer as
\locvar{MAG
}.
4946 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
69)$.
4948 If
\locvar{SIGN
} is zero, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$
4949 the value $
\locvar{MAG
}$.
4951 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value
4954 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
1$.
4956 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4959 Otherwise, if
\bitvar{TOKEN
} is
23:
4962 Assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}]$ the value zero.
4964 Read a
1-bit unsigned integer as SIGN.
4966 If
\locvar{SIGN
} is zero, assign
4967 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
1]$ the value $
1$.
4969 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
1]$ the value
4972 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
2$.
4974 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4977 Otherwise, if
\bitvar{TOKEN
} is
24:
4980 For each value of
\locvar{\tj} from
\bitvar{\ti} to $(
\bitvar{\ti}+
1)$, assign
4981 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
4983 Read a
1-bit unsigned integer as SIGN.
4985 If
\locvar{SIGN
} is zero, assign
4986 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
2]$ the value $
1$.
4988 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
2]$ the value
4991 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
3$.
4993 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
4996 Otherwise, if
\bitvar{TOKEN
} is
25:
4999 For each value of
\locvar{\tj} from
\bitvar{\ti} to $(
\bitvar{\ti}+
2)$, assign
5000 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
5002 Read a
1-bit unsigned integer as SIGN.
5004 If
\locvar{SIGN
} is zero, assign
5005 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
3]$ the value $
1$.
5007 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
3]$ the value
5010 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
4$.
5012 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
5015 Otherwise, if
\bitvar{TOKEN
} is
26:
5018 For each value of
\locvar{\tj} from
\bitvar{\ti} to $(
\bitvar{\ti}+
3)$, assign
5019 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
5021 Read a
1-bit unsigned integer as SIGN.
5023 If
\locvar{SIGN
} is zero, assign
5024 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
4]$ the value $
1$.
5026 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
4]$ the value
5029 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
5$.
5031 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
5034 Otherwise, if
\bitvar{TOKEN
} is
27:
5037 For each value of
\locvar{\tj} from
\bitvar{\ti} to $(
\bitvar{\ti}+
4)$, assign
5038 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
5040 Read a
1-bit unsigned integer as SIGN.
5042 If
\locvar{SIGN
} is zero, assign
5043 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
5]$ the value $
1$.
5045 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
5]$ the value
5048 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
6$.
5050 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
5053 Otherwise, if
\bitvar{TOKEN
} is
28:
5056 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
5058 Read a
2-bit unsigned integer as
\locvar{RLEN
}.
5060 Assign
\locvar{RLEN
} the value $(
\locvar{RLEN
}+
6)$.
5062 For each value of
\locvar{\tj} from
\bitvar{\ti} to
5063 $(
\bitvar{\ti}+
\locvar{RLEN
}-
1)$, assign
5064 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
5066 If
\locvar{SIGN
} is zero, assign
5067 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
\locvar{RLEN
}]$ the value $
1$.
5069 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
\locvar{RLEN
}]$
5072 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value
5073 $
\bitvar{TIS
}[\bitvar{\bi}]+
\locvar{RLEN
}+
1$.
5075 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
5078 Otherwise, if
\bitvar{TOKEN
} is
29:
5081 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
5083 Read a
3-bit unsigned integer as
\locvar{RLEN
}.
5085 Assign
\locvar{RLEN
} the value $(
\locvar{RLEN
}+
10)$.
5087 For each value of
\locvar{\tj} from
\bitvar{\ti} to
5088 $(
\bitvar{\ti}+
\locvar{RLEN
}-
1)$, assign
5089 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
5091 If
\locvar{SIGN
} is zero, assign
5092 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
\locvar{RLEN
}]$ the value $
1$.
5094 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
\locvar{RLEN
}]$
5097 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value
5098 $
\bitvar{TIS
}[\bitvar{\bi}]+
\locvar{RLEN
}+
1$.
5099 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
5102 Otherwise, if
\bitvar{TOKEN
} is
30:
5105 Assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\ti}]$ the value zero.
5107 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
5109 Read a
1-bit unsigned integer as
\locvar{MAG
}.
5111 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
2)$.
5113 If
\locvar{SIGN
} is zero, assign
5114 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
1]$ the value $
\locvar{MAG
}$.
5116 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
1]$ the value
5119 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]+
2$.
5120 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
5123 Otherwise, if
\bitvar{TOKEN
} is
31:
5126 Read a
1-bit unsigned integer as
\locvar{SIGN
}.
5128 Read a
1-bit unsigned integer as
\locvar{MAG
}.
5130 Assign
\locvar{MAG
} the value $(
\locvar{MAG
}+
2)$.
5132 Read a
1-bit unsigned integer as
\locvar{RLEN
}.
5134 Assign
\locvar{RLEN
} the value $(
\locvar{RLEN
}+
2)$.
5136 For each value of
\locvar{\tj} from
\bitvar{\ti} to
5137 $(
\bitvar{\ti}+
\locvar{RLEN
}-
1)$, assign
5138 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\tj}]$ the value zero.
5140 If
\locvar{SIGN
} is zero, assign
5141 $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
\locvar{RLEN
}]$ the value
5144 Otherwise, assign $
\bitvar{COEFFS
}[\bitvar{\bi}][\bitvar{\ti}+
\locvar{RLEN
}]$
5145 the value $-
\locvar{MAG
}$.
5147 Assign $
\bitvar{TIS
}[\bitvar{\bi}]$ the value
5148 $
\bitvar{TIS
}[\bitvar{\bi}]+
\locvar{RLEN
}+
1$.
5149 Assign $
\bitvar{NCOEFFS
}[\bitvar{\bi}]$ the value $
\bitvar{TIS
}[\bitvar{\bi}]$.
5153 \subsection{DCT Coefficient Decode
}
5154 \label{sub:dct-coeffs
}
5156 \paragraph{Input parameters:
}\hfill\\*
5157 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5158 \multicolumn{1}{c
}{Name
} &
5159 \multicolumn{1}{c
}{Type
} &
5160 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5161 \multicolumn{1}{c
}{Signed?
} &
5162 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5163 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
5165 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5166 1 & No & An
\bitvar{NBS
}-element array of flags
5167 indicating which blocks are coded. \\
5168 \bitvar{NMBS
} & Integer &
32 & No & The total number of macro blocks in a
5170 \bitvar{HTS
} &
\multicolumn{3}{l
}{Huffman table array
}
5171 & An
80-element array of Huffman tables
5172 with up to
32 entries each. \\
5173 \bottomrule\end{tabularx
}
5175 \paragraph{Output parameters:
}\hfill\\*
5176 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5177 \multicolumn{1}{c
}{Name
} &
5178 \multicolumn{1}{c
}{Type
} &
5179 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5180 \multicolumn{1}{c
}{Signed?
} &
5181 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5182 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5183 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
5184 quantized DCT coefficient values for each block in zig-zag order. \\
5185 \bitvar{NCOEFFS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5186 7 & No & An
\bitvar{NBS
}-element array of the
5187 coefficient count for each block. \\
5188 \bottomrule\end{tabularx
}
5190 \paragraph{Variables used:
}\hfill\\*
5191 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5192 \multicolumn{1}{c
}{Name
} &
5193 \multicolumn{1}{c
}{Type
} &
5194 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5195 \multicolumn{1}{c
}{Signed?
} &
5196 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5197 \locvar{NLBS
} & Integer &
34 & No & The number of blocks in the luma
5199 \locvar{TIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5200 7 & No & An
\bitvar{NBS
}-element array of the
5201 current token index for each block. \\
5202 \locvar{EOBS
} & Integer &
36 & No & The remaining length of the current
5204 \locvar{TOKEN
} & Integer &
5 & No & The current token being decoded. \\
5205 \locvar{HG
} & Integer &
3 & No & The current Huffman table group. \\
5206 \locvar{\cbi} & Integer &
36 & No & The index of the current block in the
5207 coded block list. \\
5208 \locvar{\bi} & Integer &
36 & No & The index of the current block in
5210 \locvar{\bj} & Integer &
36 & No & Another index of a block in coded
5212 \locvar{\ti} & Integer &
6 & No & The current token index. \\
5213 \locvar{\tj} & Integer &
6 & No & Another token index. \\
5214 \locvar{\hti_L} & Integer &
4 & No & The index of the current Huffman table
5215 to use for the luma plane within a group. \\
5216 \locvar{\hti_C} & Integer &
4 & No & The index of the current Huffman table
5217 to use for the chroma planes within a group. \\
5218 \locvar{\hti} & Integer &
7 & No & The index of the current Huffman table
5220 \bottomrule\end{tabularx
}
5223 This procedure puts the above two procedures to work to decode the entire set
5224 of DCT coefficients for the frame.
5225 At the end of this procedure,
\locvar{EOBS
} MUST be zero, and
5226 $
\locvar{TIS
}[\locvar{\bi}]$ MUST be
64 for every coded
\locvar{\bi}.
5228 Note that we update the coefficient count of every block before continuing an
5229 EOB run or decoding a token, despite the fact that it is already up to date
5230 unless the previous token was a pure zero run.
5231 This is done intentionally to mimic the VP3 accounting rules.
5232 Thus the only time the coefficient count does not include the coefficients in a
5233 pure zero run is when when that run reaches all the way to coefficient
63.
5234 Note, however, that regardless of the coefficient count, any additional
5235 coefficients are still set to zero.
5236 The only use of the count is in determining if a special case of the inverse
5237 DCT can be used in Section~
\ref{sub:
2d-idct
}.
5241 Assign
\locvar{NLBS
} the value $(
\bitvar{NMBS
}*
4)$.
5243 For each consecutive value of
\locvar{\bi} from
0 to $(
\bitvar{NBS
}-
1)$,
5244 assign $
\locvar{TIS
}[\locvar{\bi}]$ the value zero.
5246 Assign
\locvar{EOBS
} the value
0.
5248 For each consecutive value of
\locvar{\ti} from
0 to
63:
5251 If
\locvar{\ti} is $
0$ or $
1$:
5254 Read a
4-bit unsigned integer as
\locvar{\hti_L}.
5256 Read a
4-bit unsigned integer as
\locvar{\hti_C}.
5259 For each consecutive value of
\locvar{\bi} from
0 to $(
\bitvar{NBS
}-
1)$ for
5260 which $
\bitvar{BCODED
}[\locvar{\bi}]$ is non-zero and
5261 $
\locvar{TIS
}[\locvar{\bi}]$ equals
\locvar{\ti}:
5264 Assign $
\bitvar{NCOEFFS
}[\locvar{\bi}]$ the value
\locvar{\ti}.
5266 If
\locvar{EOBS
} is greater than zero:
5269 For each value of
\locvar{\tj} from $
\locvar{\ti}$ to
63, assign
5270 $
\bitvar{COEFFS
}[\locvar{\bi}][\locvar{\tj}]$ the value zero.
5272 Assign $
\locvar{TIS
}[\locvar{\bi}]$ the value
64.
5274 Assign
\locvar{EOBS
} the value $(
\locvar{EOBS
}-
1)$.
5280 Assign
\locvar{HG
} a value based on
\locvar{\ti} from
5281 Table~
\ref{tab:huff-groups
}.
5285 \begin{tabular
}{lc
}\toprule
5286 \locvar{\ti} &
\locvar{HG
} \\
\midrule
5288 $
1\ldots 5$ & $
1$ \\
5289 $
6\ldots 14$ & $
2$ \\
5290 $
15\ldots 27$ & $
3$ \\
5291 $
28\ldots 63$ & $
4$ \\
5292 \bottomrule\end{tabular
}
5294 \caption{Huffman Table Groups
}
5295 \label{tab:huff-groups
}
5299 If
\locvar{\bi} is less than
\locvar{NLBS
}, assign
\locvar{\hti} the value
5300 $(
16*
\locvar{HG
}+
\locvar{\hti_L})$.
5302 Otherwise, assign
\locvar{\hti} the value
5303 $(
16*
\locvar{HG
}+
\locvar{\hti_C})$.
5305 Read one bit at a time until one of the codes in $
\bitvar{HTS
}[\locvar{\hti}]$
5306 is recognized, and assign the value to
\locvar{TOKEN
}.
5308 If
\locvar{TOKEN
} is less than
7, expand an EOB token using the procedure given
5309 in Section~
\ref{sub:eob-token
} to update $
\locvar{TIS
}[\locvar{\bi}]$,
5310 $
\bitvar{COEFFS
}[\locvar{\bi}]$, and
\locvar{EOBS
}.
5312 Otherwise, expand a coefficient token using the procedure given in
5313 Section~
\ref{sub:coeff-token
} to update $
\locvar{TIS
}[\locvar{\bi}]$,
5314 $
\bitvar{COEFFS
}[\locvar{\bi}]$, and $
\bitvar{NCOEFFS
}[\locvar{\bi}]$.
5320 \section{Undoing DC Prediction
}
5322 The actual value of a DC coefficient decoded by Section~
\ref{sec:dct-decode
} is
5323 the residual from a predicted value computed by the encoder.
5324 This prediction is only applied to DC coefficients.
5325 Quantized AC coefficients are encoded directly.
5327 This section describes how to undo this prediction to recover the original
5329 The predicted DC value for a block is computed from the DC values of its
5330 immediate neighbors which precede the block in raster order.
5331 Thus, reversing this prediction must procede in raster order, instead of coded
5334 Note that this step comes before dequantizing the coefficients.
5335 For this reason, DC coefficients are all quantized with the same
\qi\ value,
5336 regardless of the block-level
\qi\ values decoded in
5337 Section~
\ref{sub:block-qis
}.
5338 Those
\qi\ values are applied only to the AC coefficients.
5340 \subsection{Computing the DC Predictor
}
5343 \paragraph{Input parameters:
}\hfill\\*
5344 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5345 \multicolumn{1}{c
}{Name
} &
5346 \multicolumn{1}{c
}{Type
} &
5347 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5348 \multicolumn{1}{c
}{Signed?
} &
5349 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5350 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5351 1 & No & An
\bitvar{NBS
}-element array of flags
5352 indicating which blocks are coded. \\
5353 \bitvar{MBMODES
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5354 3 & No & An
\bitvar{NMBS
}-element array of
5355 coding modes for each macro block. \\
5356 \bitvar{LASTDC
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5357 16 & Yes & A
3-element array containing the
5358 most recently decoded DC value, one for inter mode and for each reference
5360 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5361 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
5362 quantized DCT coefficient values for each block in zig-zag order. \\
5363 \bitvar{\bi} & Integer &
36 & No & The index of the current block in
5365 \bottomrule\end{tabularx
}
5367 \paragraph{Output parameters:
}\hfill\\*
5368 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5369 \multicolumn{1}{c
}{Name
} &
5370 \multicolumn{1}{c
}{Type
} &
5371 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5372 \multicolumn{1}{c
}{Signed?
} &
5373 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5374 \bitvar{DCPRED
} & Integer &
16 & Yes & The predicted DC value for the current
5376 \bottomrule\end{tabularx
}
5378 \paragraph{Variables used:
}\hfill\\*
5379 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5380 \multicolumn{1}{c
}{Name
} &
5381 \multicolumn{1}{c
}{Type
} &
5382 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5383 \multicolumn{1}{c
}{Signed?
} &
5384 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5385 \locvar{P
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5386 1 & No & A
4-element array indicating which
5387 neighbors can be used for DC prediction. \\
5388 \locvar{PBI
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5389 36 & No & A
4-element array containing the
5390 coded-order block index of the current block's neighbors. \\
5391 \locvar{W
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5392 7 & Yes & A
4-element array of the weights to
5393 apply to each neighboring DC value. \\
5394 \locvar{PDIV
} & Integer &
8 & No & The valud to divide the weighted sum
5396 \locvar{\bj} & Integer &
36 & No & The index of a neighboring block in
5398 \locvar{\mbi} & Integer &
32 & No & The index of the macro block
5399 containing block
\locvar{\bi}. \\
5400 \locvar{\mbi} & Integer &
32 & No & The index of the macro block
5401 containing block
\locvar{\bj}. \\
5402 \locvar{\rfi} & Integer &
2 & No & The index of the reference frame
5403 indicated by the coding mode for macro block
\locvar{\mbi}. \\
5404 \bottomrule\end{tabularx
}
5407 This procedure outlines how a predictor is formed for a single block.
5409 The predictor is computed as a weighted sum of the neighboring DC values from
5410 coded blocks which use the same reference frame.
5411 This latter condition is determined only by checking the coding mode for the
5413 Even if the golden frame and the previous frame are in fact the same, e.g. for
5414 the first inter frame after an intra frame, they are still treated as being
5415 different for the purposes of DC prediction.
5416 The weighted sum is divided by a power of two, with truncation towards zero,
5417 and the result is checked for outranging if necessary.
5419 If there are no neighboring coded blocks which use the same reference frame as
5420 the current block, then the most recent DC value of any block that used that
5421 reference frame is used instead.
5422 If no such block exists, then the predictor is set to zero.
5426 Assign
\locvar{\mbi} the index of the macro block containing block
5429 Assign
\locvar{\rfi} the value of the Reference Frame Index column of
5430 Table~
\ref{tab:cm-refs
} corresponding to $
\bitvar{MBMODES
}[\locvar{\mbi}]$.
5434 \begin{tabular
}{ll
}\toprule
5435 Coding Mode & Reference Frame Index \\
\midrule
5436 $
0$ (INTER
\_NOMV) & $
1$ (Previous) \\
5437 $
1$ (INTRA) & $
0$ (None) \\
5438 $
2$ (INTER
\_MV) & $
1$ (Previous) \\
5439 $
3$ (INTER
\_MV\_LAST) & $
1$ (Previous) \\
5440 $
4$ (INTER
\_MV\_LAST2) & $
1$ (Previous) \\
5441 $
5$ (INTER
\_GOLDEN\_NOMV) & $
2$ (Golden) \\
5442 $
6$ (INTER
\_GOLDEN\_MV) & $
2$ (Golden) \\
5443 $
7$ (INTER
\_MV\_FOUR) & $
1$ (Previous) \\
5444 \bottomrule\end{tabular
}
5446 \caption{Reference Frames for Each Coding Mode
}
5451 If block
\locvar{\bi} is not along the left edge of the coded frame:
5454 Assign
\locvar{\bj} the coded-order index of block
\locvar{\bi}'s left
5455 neighbor, i.e., in the same row but one column to the left.
5457 If $
\bitvar{BCODED
}[\bj]$ is not zero:
5460 Assign
\locvar{\mbj} the index of the macro block containing block
5463 If the value of the Reference Frame Index column of Table~
\ref{tab:cm-refs
}
5464 corresonding to $
\bitvar{MBMODES
}[\locvar{\mbj}]$ equals
\locvar{\rfi}:
5467 Assign $
\locvar{P
}[0]$ the value $
1$.
5469 Assign $
\locvar{PBI
}[0]$ the value
\locvar{\bj}.
5472 Otherwise, assign $
\locvar{P
}[0]$ the value zero.
5475 Otherwise, assign $
\locvar{P
}[0]$ the value zero.
5478 Otherwise, assign $
\locvar{P
}[0]$ the value zero.
5481 If block
\locvar{\bi} is not along the left edge nor the bottom edge of the
5485 Assign
\locvar{\bj} the coded-order index of block
\locvar{\bi}'s lower-left
5486 neighbor, i.e., one row down and one column to the left.
5488 If $
\bitvar{BCODED
}[\bj]$ is not zero:
5491 Assign
\locvar{\mbj} the index of the macro block containing block
5494 If the value of the Reference Frame Index column of Table~
\ref{tab:cm-refs
}
5495 corresonding to $
\bitvar{MBMODES
}[\locvar{\mbj}]$ equals
\locvar{\rfi}:
5498 Assign $
\locvar{P
}[1]$ the value $
1$.
5500 Assign $
\locvar{PBI
}[1]$ the value
\locvar{\bj}.
5503 Otherwise, assign $
\locvar{P
}[1]$ the value zero.
5506 Otherwise, assign $
\locvar{P
}[1]$ the value zero.
5509 Otherwise, assign $
\locvar{P
}[1]$ the value zero.
5512 If block
\locvar{\bi} is not along the the bottom edge of the coded frame:
5515 Assign
\locvar{\bj} the coded-order index of block
\locvar{\bi}'s lower
5516 neighbor, i.e., in the same column but one row down.
5518 If $
\bitvar{BCODED
}[\bj]$ is not zero:
5521 Assign
\locvar{\mbj} the index of the macro block containing block
5524 If the value of the Reference Frame Index column of Table~
\ref{tab:cm-refs
}
5525 corresonding to $
\bitvar{MBMODES
}[\locvar{\mbj}]$ equals
\locvar{\rfi}:
5528 Assign $
\locvar{P
}[2]$ the value $
1$.
5530 Assign $
\locvar{PBI
}[2]$ the value
\locvar{\bj}.
5533 Otherwise, assign $
\locvar{P
}[2]$ the value zero.
5536 Otherwise, assign $
\locvar{P
}[2]$ the value zero.
5539 Otherwise, assign $
\locvar{P
}[2]$ the value zero.
5542 If block
\locvar{\bi} is not along the right edge nor the bottom edge of the
5546 Assign
\locvar{\bj} the coded-order index of block
\locvar{\bi}'s lower-right
5547 neighbor, i.e., one row down and one column to the right.
5549 If $
\bitvar{BCODED
}[\bj]$ is not zero:
5552 Assign
\locvar{\mbj} the index of the macro block containing block
5555 If the value of the Reference Frame Index column of Table~
\ref{tab:cm-refs
}
5556 corresonding to $
\bitvar{MBMODES
}[\locvar{\mbj}]$ equals
\locvar{\rfi}:
5559 Assign $
\locvar{P
}[3]$ the value $
1$.
5561 Assign $
\locvar{PBI
}[3]$ the value
\locvar{\bj}.
5564 Otherwise, assign $
\locvar{P
}[3]$ the value zero.
5567 Otherwise, assign $
\locvar{P
}[3]$ the value zero.
5570 Otherwise, assign $
\locvar{P
}[3]$ the value zero.
5573 If none of the values $
\locvar{P
}[0]$, $
\locvar{P
}[1]$, $
\locvar{P
}[2]$, nor
5574 $
\locvar{P
}[3]$ are non-zero, then assign
\bitvar{DCPRED
} the value
5575 $
\bitvar{LASTDC
}[\locvar{\rfi}]$.
5580 Assign the array
\locvar{W
} and the variable
\locvar{PDIV
} the values from the
5581 row of Table~
\ref{tab:dc-weights
} corresonding to the values of each
5582 $
\locvar{P
}[\idx{i
}]$.
5586 \begin{tabular
}{ccccrrrrr
}\toprule
5587 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{P
}[0]$ (L)
} &
5588 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{P
}[1]$ (DL)
} &
5589 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{P
}[2]$ (D)
} &
5590 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{P
}[3]$ (DR)
} &
5591 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{W
}[3]$ (L)
} &
5592 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{W
}[1]$ (DL)
} &
5593 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{W
}[2]$ (D)
} &
5594 \multicolumn{1}{p
{25pt
}}{\centering$
\locvar{W
}[3]$ (DR)
} &
5595 \locvar{PDIV
} \\
\midrule
5596 $
1$ & $
0$ & $
0$ & $
0$ & $
1$ & $
0$ & $
0$ & $
0$ & $
1$ \\
5597 $
0$ & $
1$ & $
0$ & $
0$ & $
0$ & $
1$ & $
0$ & $
0$ & $
1$ \\
5598 $
1$ & $
1$ & $
0$ & $
0$ & $
1$ & $
0$ & $
0$ & $
0$ & $
1$ \\
5599 $
0$ & $
0$ & $
1$ & $
0$ & $
0$ & $
0$ & $
1$ & $
0$ & $
1$ \\
5600 $
1$ & $
0$ & $
1$ & $
0$ & $
1$ & $
0$ & $
1$ & $
0$ & $
2$ \\
5601 $
0$ & $
1$ & $
1$ & $
0$ & $
0$ & $
0$ & $
1$ & $
0$ & $
1$ \\
5602 $
1$ & $
1$ & $
1$ & $
0$ & $
29$ & $-
26$ & $
29$ & $
0$ & $
32$ \\
5603 $
0$ & $
0$ & $
0$ & $
1$ & $
0$ & $
0$ & $
0$ & $
1$ & $
1$ \\
5604 $
1$ & $
0$ & $
0$ & $
1$ & $
75$ & $
0$ & $
0$ & $
53$ & $
128$ \\
5605 $
0$ & $
1$ & $
0$ & $
1$ & $
0$ & $
1$ & $
0$ & $
1$ & $
2$ \\
5606 $
1$ & $
1$ & $
0$ & $
1$ & $
75$ & $
0$ & $
0$ & $
53$ & $
128$ \\
5607 $
0$ & $
0$ & $
1$ & $
1$ & $
0$ & $
0$ & $
1$ & $
0$ & $
1$ \\
5608 $
1$ & $
0$ & $
1$ & $
1$ & $
75$ & $
0$ & $
0$ & $
53$ & $
128$ \\
5609 $
0$ & $
1$ & $
1$ & $
1$ & $
0$ & $
3$ & $
10$ & $
3$ & $
16$ \\
5610 $
1$ & $
1$ & $
1$ & $
1$ & $
29$ & $-
26$ & $
29$ & $
0$ & $
32$ \\
5611 \bottomrule\end{tabular
}
5613 \caption{Weights and Divisors for Each Set of Available DC Predictors
}
5614 \label{tab:dc-weights
}
5618 Assign
\bitvar{DCPRED
} the value zero.
5620 If $
\locvar{P
}[0]$ is non-zero, assign
\bitvar{DCPRED
} the value
5621 $(
\bitvar{DCPRED
}+
\locvar{W
}[0]*
\bitvar{COEFFS
}[\locvar{PBI
}[0]][0])$.
5623 If $
\locvar{P
}[1]$ is non-zero, assign
\bitvar{DCPRED
} the value
5624 $(
\bitvar{DCPRED
}+
\locvar{W
}[1]*
\bitvar{COEFFS
}[\locvar{PBI
}[1]][0])$.
5626 If $
\locvar{P
}[2]$ is non-zero, assign
\bitvar{DCPRED
} the value
5627 $(
\bitvar{DCPRED
}+
\locvar{W
}[2]*
\bitvar{COEFFS
}[\locvar{PBI
}[2]][0])$.
5629 If $
\locvar{P
}[3]$ is non-zero, assign
\bitvar{DCPRED
} the value
5630 $(
\bitvar{DCPRED
}+
\locvar{W
}[3]*
\bitvar{COEFFS
}[\locvar{PBI
}[3]][0])$.
5632 Assign
\bitvar{DCPRED
} the value $(
\bitvar{DCPRED
}//
\locvar{PDIV
})$.
5634 If $
\locvar{P
}[0]$, $
\locvar{P
}[1]$, and $
\locvar{P
}[2]$ are all non-zero:
5637 If $|
\bitvar{DCPRED
}-
\bitvar{COEFFS
}[\locvar{PBI
}[2]][0]|$ is greater than
5638 $
128$, assign
\bitvar{DCPRED
} the value $
\bitvar{COEFFS
}[\locvar{PBI
}[2]][0]$.
5640 Otherwise, if $|
\bitvar{DCPRED
}-
\bitvar{COEFFS
}[\locvar{PBI
}[0]][0]|$ is
5641 greater than $
128$, assign
\bitvar{DCPRED
} the value
5642 $
\bitvar{COEFFS
}[\locvar{PBI
}[0]][0]$.
5644 Otherwise, if $|
\bitvar{DCPRED
}-
\bitvar{COEFFS
}[\locvar{PBI
}[1]][0]|$ is
5645 greater than $
128$, assign
\bitvar{DCPRED
} the value
5646 $
\bitvar{COEFFS
}[\locvar{PBI
}[1]][0]$.
5651 \subsection{Inverting the DC Prediction Process
}
5652 \label{sub:dc-pred-undo
}
5654 \paragraph{Input parameters:
}\hfill\\*
5655 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5656 \multicolumn{1}{c
}{Name
} &
5657 \multicolumn{1}{c
}{Type
} &
5658 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5659 \multicolumn{1}{c
}{Signed?
} &
5660 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5661 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5662 1 & No & An
\bitvar{NBS
}-element array of flags
5663 indicating which blocks are coded. \\
5664 \bitvar{MBMODES
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5665 3 & No & An
\bitvar{NMBS
}-element array of
5666 coding modes for each macro block. \\
5667 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5668 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
5669 quantized DCT coefficient values for each block in zig-zag order. \\
5670 \bottomrule\end{tabularx
}
5672 \paragraph{Output parameters:
}\hfill\\*
5673 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5674 \multicolumn{1}{c
}{Name
} &
5675 \multicolumn{1}{c
}{Type
} &
5676 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5677 \multicolumn{1}{c
}{Signed?
} &
5678 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5679 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5680 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
5681 quantized DCT coefficient values for each block in zig-zag order. The DC
5682 value of each block will be updated. \\
5683 \bottomrule\end{tabularx
}
5685 \paragraph{Variables used:
}\hfill\\*
5686 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5687 \multicolumn{1}{c
}{Name
} &
5688 \multicolumn{1}{c
}{Type
} &
5689 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5690 \multicolumn{1}{c
}{Signed?
} &
5691 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5692 \locvar{LASTDC
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
5693 16 & Yes & A
3-element array containing the
5694 most recently decoded DC value, one for inter mode and for each reference
5696 \locvar{DCPRED
} & Integer &
11 & Yes & The predicted DC value for the current
5698 \locvar{DC
} & Integer &
17 & Yes & The actual DC value for the current
5700 \locvar{\bi} & Integer &
36 & No & The index of the current block in
5702 \locvar{\mbi} & Integer &
32 & No & The index of the macro block
5703 containing block
\locvar{\bi}. \\
5704 \locvar{\rfi} & Integer &
2 & No & The index of the reference frame
5705 indicated by the coding mode for macro block
\locvar{\mbi}. \\
5706 \locvar{\pli} & Integer &
2 & No & A
color plane index. \\
5707 \bottomrule\end{tabularx
}
5710 This procedure describes the complete process of undoing the DC prediction to
5711 recover the original DC values.
5712 Because it is possible to add a value as large as $
580$ to the predicted DC
5713 coefficient value at every block, which will then be used to increase the
5714 predictor for the next block, the reconstructed DC value could overflow a
5716 This is handled by truncating the result to a
16-bit signed representation,
5717 simply throwing away any higher bits in the two's complement representation of
5722 For each consecutive value of
\locvar{\pli} from $
0$ to $
2$:
5725 Assign $
\locvar{LASTDC
}[0]$ the value zero.
5727 Assign $
\locvar{LASTDC
}[1]$ the value zero.
5729 Assign $
\locvar{LASTDC
}[2]$ the value zero.
5731 For each block of
color plane
\locvar{\pli} in
{\em raster
} order, with
5732 coded-order index
\locvar{\bi}:
5735 If $
\bitvar{BCODED
}[\locvar{\bi}]$ is non-zero:
5738 Compute the value
\locvar{DCPRED
} using the procedure outlined in
5739 Section~
\ref{sub:dc-pred
}.
5741 Assign
\locvar{DC
} the value
5742 $(
\bitvar{COEFFS
}[\locvar{\bi}][0]+
\locvar{DCPRED
})$.
5744 Truncate
\locvar{DC
} to a
16-bit representation by dropping any higher-order
5747 Assign $
\bitvar{COEFFS
}[\locvar{\bi}][0]$ the value
\locvar{DC
}.
5749 Assign
\locvar{\mbi} the index of the macro block containing block
5752 Assign
\locvar{\rfi} the value of the Reference Frame Index column of
5753 Table~
\ref{tab:cm-refs
} corresponding to $
\bitvar{MBMODES
}[\locvar{\mbi}]$.
5755 Assign $
\locvar{LASTDC
}[\rfi]$ the value $
\locvar{DC
}$.
5761 \section{Reconstruction
}
5763 At this stage, the complete contents of the data packet have been decoded.
5764 All that remains is to reconstruct the contents of the new frame.
5765 This is applied on a block by block basis, and as each block is independent,
5766 the order they are processed in does not matter.
5768 \subsection{Predictors
}
5769 \label{sec:predictors
}
5771 For each block, a predictor is formed based on its coding mode and motion
5773 There are three basic types of predictors: the intra predictor, the whole-pixel
5774 predictor, and the half-pixel predictor.
5775 The former is used for all blocks coded in INTRA mode, while all other blocks
5776 use one of the latter two.
5777 The whole-pixel predictor is used if the fractional part of both motion vector
5778 components is zero, otherwise the half-pixel predictor is used.
5780 \subsubsection{The Intra Predictor
}
5781 \label{sub:predintra
}
5783 \paragraph{Input parameters:
} None.
5785 \paragraph{Output parameters:
}\hfill\\*
5786 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5787 \multicolumn{1}{c
}{Name
} &
5788 \multicolumn{1}{c
}{Type
} &
5789 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5790 \multicolumn{1}{c
}{Signed?
} &
5791 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5792 \bitvar{PRED
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5793 8 & No & An $
8\times 8$ array of predictor
5794 values to use for INTRA coded blocks. \\
5795 \bottomrule\end{tabularx
}
5797 \paragraph{Variables used:
}\hfill\\*
5798 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5799 \multicolumn{1}{c
}{Name
} &
5800 \multicolumn{1}{c
}{Type
} &
5801 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5802 \multicolumn{1}{c
}{Signed?
} &
5803 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5804 \locvar{\idx{bx
}} & Integer &
3 & No & The horizontal pixel index in the
5806 \locvar{\idx{by
}} & Integer &
3 & No & The vertical pixel index in the
5808 \bottomrule\end{tabularx
}
5811 The intra predictor is nothing more than the constant value $
128$.
5812 This is applied for the sole purpose of centering the range of possible DC
5813 values for INTRA blocks around zero.
5817 For each value of
\locvar{\idx{by
}} from $
0$ to $
7$, inclusive:
5820 For each value of
\locvar{\idx{bx
}} from $
0$ to $
7$, inclusive:
5823 Assign $
\bitvar{PRED
}[\locvar{\idx{by
}}][\locvar{\idx{bx
}}]$ the value $
128$.
5828 \subsubsection{The Whole-Pixel Predictor
}
5829 \label{sub:predfullpel
}
5831 \paragraph{Input parameters:
}\hfill\\*
5832 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5833 \multicolumn{1}{c
}{Name
} &
5834 \multicolumn{1}{c
}{Type
} &
5835 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5836 \multicolumn{1}{c
}{Signed?
} &
5837 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5838 \bitvar{RPW
} & Integer &
20 & No & The width of the current plane of the
5839 reference frame in pixels. \\
5840 \bitvar{RPH
} & Integer &
20 & No & The height of the current plane of the
5841 reference frame in pixels. \\
5842 \bitvar{REFP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5843 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
5844 array containing the contents of the current plane of the reference frame. \\
5845 \bitvar{BX
} & Integer &
20 & No & The horizontal pixel index of the
5846 lower-left corner of the current block. \\
5847 \bitvar{BY
} & Integer &
20 & No & The vertical pixel index of the
5848 lower-left corner of the current block. \\
5849 \bitvar{MVX
} & Integer &
5 & No & The horizontal component of the block
5851 This is always a whole-pixel value. \\
5852 \bitvar{MVY
} & Integer &
5 & No & The vertical component of the block
5854 This is always a whole-pixel value. \\
5855 \bottomrule\end{tabularx
}
5857 \paragraph{Output parameters:
}\hfill\\*
5858 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5859 \multicolumn{1}{c
}{Name
} &
5860 \multicolumn{1}{c
}{Type
} &
5861 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5862 \multicolumn{1}{c
}{Signed?
} &
5863 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5864 \bitvar{PRED
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5865 8 & No & An $
8\times 8$ array of predictor
5866 values to use for INTER coded blocks. \\
5867 \bottomrule\end{tabularx
}
5869 \paragraph{Variables used:
}\hfill\\*
5870 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5871 \multicolumn{1}{c
}{Name
} &
5872 \multicolumn{1}{c
}{Type
} &
5873 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5874 \multicolumn{1}{c
}{Signed?
} &
5875 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5876 \locvar{\idx{bx
}} & Integer &
3 & Yes & The horizontal pixel index in the
5878 \locvar{\idx{by
}} & Integer &
3 & Yes & The vertical pixel index in the
5880 \locvar{\idx{rx
}} & Integer &
20 & No & The horizontal pixel index in the
5882 \locvar{\idx{ry
}} & Integer &
20 & No & The vertical pixel index in the
5884 \bottomrule\end{tabularx
}
5887 The whole pixel predictor simply copies verbatim the contents of the reference
5888 frame pointed to by the block's motion vector.
5889 If the vector points outside the reference frame, then the closest value on the
5890 edge of the reference frame is used instead.
5891 In practice, this is usually implemented by expanding the size of the reference
5892 frame by $
8$ or $
16$ pixels on each side---depending on whether or not the
5893 corresponding axis is subsampled in the current plane---and copying the border
5894 pixels into this region.
5898 For each value of
\locvar{\idx{by
}} from $
0$ to $
7$, inclusive:
5901 Assign
\locvar{\idx{ry
}} the value
5902 $(
\bitvar{BY
}+
\bitvar{MVY
}+
\locvar{\idx{by
}})$.
5904 If
\locvar{\idx{ry
}} is greater than $(
\bitvar{RPH
}-
1)$, assign
5905 \locvar{\idx{ry
}} the value $(
\bitvar{RPH
}-
1)$.
5907 If
\locvar{\idx{ry
}} is less than zero, assign
\locvar{\idx{ry
}} the value
5910 For each value of
\locvar{\idx{bx
}} from $
0$ to $
7$, inclusive:
5913 Assign
\locvar{\idx{rx
}} the value
5914 $(
\bitvar{BX
}+
\bitvar{MVX
}+
\locvar{\idx{bx
}})$.
5916 If
\locvar{\idx{rx
}} is greater than $(
\bitvar{RPW
}-
1)$, assign
5917 \locvar{\idx{rx
}} the value $(
\bitvar{RPW
}-
1)$.
5919 If
\locvar{\idx{rx
}} is less than zero, assign
\locvar{\idx{rx
}} the value
5922 Assign $
\bitvar{PRED
}[\locvar{\idx{by
}}][\locvar{\idx{bx
}}]$ the value
5923 $
\bitvar{REFP
}[\locvar{\idx{ry
}}][\locvar{\idx{rx
}}]$.
5928 \subsubsection{The Half-Pixel Predictor
}
5929 \label{sub:predhalfpel
}
5931 \paragraph{Input parameters:
}\hfill\\*
5932 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5933 \multicolumn{1}{c
}{Name
} &
5934 \multicolumn{1}{c
}{Type
} &
5935 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5936 \multicolumn{1}{c
}{Signed?
} &
5937 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5938 \bitvar{RPW
} & Integer &
20 & No & The width of the current plane of the
5939 reference frame in pixels. \\
5940 \bitvar{RPH
} & Integer &
20 & No & The height of the current plane of the
5941 reference frame in pixels. \\
5942 \bitvar{REFP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5943 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
5944 array containing the contents of the current plane of the reference frame. \\
5945 \bitvar{BX
} & Integer &
20 & No & The horizontal pixel index of the
5946 lower-left corner of the current block. \\
5947 \bitvar{BY
} & Integer &
20 & No & The vertical pixel index of the
5948 lower-left corner of the current block. \\
5949 \bitvar{MVX
} & Integer &
5 & No & The horizontal component of the first
5950 whole-pixel motion vector. \\
5951 \bitvar{MVY
} & Integer &
5 & No & The vertical component of the first
5952 whole-pixel motion vector. \\
5953 \bitvar{MVX2
} & Integer &
5 & No & The horizontal component of the second
5954 whole-pixel motion vector. \\
5955 \bitvar{MVY2
} & Integer &
5 & No & The vertical component of the second
5956 whole-pixel motion vector. \\
5957 \bottomrule\end{tabularx
}
5959 \paragraph{Output parameters:
}\hfill\\*
5960 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5961 \multicolumn{1}{c
}{Name
} &
5962 \multicolumn{1}{c
}{Type
} &
5963 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5964 \multicolumn{1}{c
}{Signed?
} &
5965 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5966 \bitvar{PRED
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
5967 8 & No & An $
8\times 8$ array of predictor
5968 values to use for INTER coded blocks. \\
5969 \bottomrule\end{tabularx
}
5971 \paragraph{Variables used:
}\hfill\\*
5972 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
5973 \multicolumn{1}{c
}{Name
} &
5974 \multicolumn{1}{c
}{Type
} &
5975 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
5976 \multicolumn{1}{c
}{Signed?
} &
5977 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
5978 \locvar{\idx{bx
}} & Integer &
3 & Yes & The horizontal pixel index in the
5980 \locvar{\idx{by
}} & Integer &
3 & Yes & The vertical pixel index in the
5982 \locvar{\idx{rx1
}} & Integer &
20 & No & The first horizontal pixel index in
5983 the reference frame. \\
5984 \locvar{\idx{ry1
}} & Integer &
20 & No & The first vertical pixel index in the
5986 \locvar{\idx{rx2
}} & Integer &
20 & No & The second horizontal pixel index in
5987 the reference frame. \\
5988 \locvar{\idx{ry2
}} & Integer &
20 & No & The second vertical pixel index in
5989 the reference frame. \\
5990 \bottomrule\end{tabularx
}
5993 If one or both of the components of the block motion vector is not a
5994 whole-pixel value, then the half-pixel predictor is used.
5995 The half-pixel predictor converts the fractional motion vector into two
5996 whole-pixel motion vectors.
5997 The first is formed by truncating the values of each component towards zero,
5998 and the second is formed by truncating them away from zero.
5999 The contributions from the reference frame at the locations pointed to by each
6000 vector are averaged, truncating towards negative infinity.
6002 Only two samples from the reference frame contribute to each predictor value,
6003 even if both components of the motion vector have non-zero fractional
6005 Motion vector components with quarter-pixel accuracy in the chroma planes are
6006 treated exactly the same as those with half-pixel accuracy.
6007 Any non-zero fractional part gets rounded one way in the first vector, and the
6008 other way in the second.
6012 For each value of
\locvar{\idx{by
}} from $
0$ to $
7$, inclusive:
6015 Assign
\locvar{\idx{ry1
}} the value
6016 $(
\bitvar{BY
}+
\bitvar{MVY1
}+
\locvar{\idx{by
}})$.
6018 If
\locvar{\idx{ry1
}} is greater than $(
\bitvar{RPH
}-
1)$, assign
6019 \locvar{\idx{ry1
}} the value $(
\bitvar{RPH
}-
1)$.
6021 If
\locvar{\idx{ry1
}} is less than zero, assign
\locvar{\idx{ry1
}} the value
6024 Assign
\locvar{\idx{ry2
}} the value
6025 $(
\bitvar{BY
}+
\bitvar{MVY2
}+
\locvar{\idx{by
}})$.
6027 If
\locvar{\idx{ry2
}} is greater than $(
\bitvar{RPH
}-
1)$, assign
6028 \locvar{\idx{ry2
}} the value $(
\bitvar{RPH
}-
1)$.
6030 If
\locvar{\idx{ry2
}} is less than zero, assign
\locvar{\idx{ry2
}} the value
6033 For each value of
\locvar{\idx{bx
}} from $
0$ to $
7$, inclusive:
6036 Assign
\locvar{\idx{rx1
}} the value
6037 $(
\bitvar{BX
}+
\bitvar{MVX1
}+
\locvar{\idx{bx
}})$.
6039 If
\locvar{\idx{rx1
}} is greater than $(
\bitvar{RPW
}-
1)$, assign
6040 \locvar{\idx{rx1
}} the value $(
\bitvar{RPW
}-
1)$.
6042 If
\locvar{\idx{rx1
}} is less than zero, assign
\locvar{\idx{rx1
}} the value
6045 Assign
\locvar{\idx{rx2
}} the value
6046 $(
\bitvar{BX
}+
\bitvar{MVX2
}+
\locvar{\idx{bx
}})$.
6048 If
\locvar{\idx{rx2
}} is greater than $(
\bitvar{RPW
}-
1)$, assign
6049 \locvar{\idx{rx2
}} the value $(
\bitvar{RPW
}-
1)$.
6051 If
\locvar{\idx{rx2
}} is less than zero, assign
\locvar{\idx{rx2
}} the value
6054 Assign $
\bitvar{PRED
}[\locvar{\idx{by
}}][\locvar{\idx{bx
}}]$ the value
6056 (
\bitvar{REFP
}[\locvar{\idx{ry1
}}][\locvar{\idx{rx1
}}]+
6057 \bitvar{REFP
}[\locvar{\idx{ry2
}}][\locvar{\idx{rx2
}}])>>
1.
6063 \subsection{Dequantization
}
6066 \paragraph{Input parameters:
}\hfill\\*
6067 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6068 \multicolumn{1}{c
}{Name
} &
6069 \multicolumn{1}{c
}{Type
} &
6070 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6071 \multicolumn{1}{c
}{Signed?
} &
6072 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6073 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6074 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
6075 quantized DCT coefficient values for each block in zig-zag order. \\
6076 \bitvar{ACSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
6077 16 & No & A
64-element array of scale values for
6078 AC coefficients for each
\qi\ value. \\
6079 \bitvar{DCSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
6080 16 & No & A
64-element array of scale values for
6081 the DC coefficient for each
\qi\ value. \\
6082 \bitvar{BMS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
6083 8 & No & A $
\bitvar{NBMS
}\times 64$ array
6084 containing the base matrices. \\
6085 \bitvar{NQRS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
6086 6 & No & A $
2\times 3$ array containing the
6087 number of quant ranges for a given
\qti\ and
\pli, respectively.
6088 This is at most $
63$. \\
6089 \bitvar{QRSIZES
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
6090 6 & No & A $
2\times 3\times 63$ array of the
6091 sizes of each quant range for a given
\qti\ and
\pli, respectively.
6092 Only the first $
\bitvar{NQRS
}[\qti][\pli]$ values are used. \\
6093 \bitvar{QRBMIS
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
6094 9 & No & A $
2\times 3\times 64$ array of the
6095 \bmi's used for each quant range for a given
\qti\ and
\pli, respectively.
6096 Only the first $(
\bitvar{NQRS
}[\qti][\pli]+
1)$ values are used. \\
6097 \bitvar{\qti} & Integer &
1 & No & A quantization type index.
6098 See Table~
\ref{tab:quant-types
}.\\
6099 \bitvar{\pli} & Integer &
2 & No & A
color plane index.
6100 See Table~
\ref{tab:
color-planes
}.\\
6101 \bitvar{\idx{qi0
}} & Integer &
6 & No & The quantization index of the DC
6103 \bitvar{\qi} & Integer &
6 & No & The quantization index of the AC
6105 \bitvar{\bi} & Integer &
36 & No & The index of the current block in
6107 \bottomrule\end{tabularx
}
6109 \paragraph{Output parameters:
}\hfill\\*
6110 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6111 \multicolumn{1}{c
}{Name
} &
6112 \multicolumn{1}{c
}{Type
} &
6113 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6114 \multicolumn{1}{c
}{Signed?
} &
6115 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6116 \bitvar{DQC
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6117 14 & Yes & A $
64$-element array of dequantized
6118 DCT coefficients in natural order (cf. Section~
\ref{sec:dct-coeffs
}). \\
6119 \bottomrule\end{tabularx
}
6121 \paragraph{Variables used:
}\hfill\\*
6122 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6123 \multicolumn{1}{c
}{Name
} &
6124 \multicolumn{1}{c
}{Type
} &
6125 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6126 \multicolumn{1}{c
}{Signed?
} &
6127 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6128 \locvar{QMAT
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
6129 16 & No & A
64-element array of quantization
6130 values for each DCT coefficient in natural order. \\
6131 \locvar{\ci} & Integer &
6 & No & The DCT coefficient index in natural
6133 \locvar{\zzi} & Integer &
6 & No & The DCT coefficient index in zig-zag
6135 \locvar{C
} & Integer &
29 & Yes & A single dequantized coefficient. \\
6136 \bottomrule\end{tabularx
}
6139 This procedure takes the quantized DCT coefficient values in zig-zag order for
6140 a single block---after DC prediction has been undone---and returns the
6141 dequantized values in natural order.
6142 If large coefficient values are decoded for coarsely quantized coefficients,
6143 the resulting dequantized value can be significantly larger than
16 bits.
6144 Such a coefficient is truncated to a signed
16-bit representation by discarding
6145 the higher-order bits of its twos-complement representation.
6147 Although this procedure recomputes the quantization matrices from the
6148 parameters in the setup header for each block, there are at most six different
6149 ones used for each
color plane.
6150 An efficient implementation could compute them once in advance.
6154 Using
\bitvar{ACSCALE
},
\bitvar{DCSCALE
},
\bitvar{BMS
},
\bitvar{NQRS
},
6155 \bitvar{QRSIZES
},
\bitvar{QRBMIS
},
\bitvar{\qti},
\bitvar{\pli}, and
6156 \bitvar{\idx{qi0
}}, use the procedure given in Section~
\ref{sub:quant-mat
} to
6157 compute the DC quantization matrix
\locvar{QMAT
}.
6159 Assign
\locvar{C
} the value
6160 $
\bitvar{COEFFS
}[\bitvar{\bi}][0]*
\locvar{QMAT
}[0]$.
6162 Truncate
\locvar{C
} to a
16-bit representation by dropping any higher-order
6165 Assign $
\bitvar{DQC
}[0]$ the value
\locvar{C
}.
6167 Using
\bitvar{ACSCALE
},
\bitvar{DCSCALE
},
\bitvar{BMS
},
\bitvar{NQRS
},
6168 \bitvar{QRSIZES
},
\bitvar{QRBMIS
},
\bitvar{\qti},
\bitvar{\pli}, and
6169 \bitvar{\qi}, use the procedure given in Section~
\ref{sub:quant-mat
} to
6170 compute the AC quantization matrix
\locvar{QMAT
}.
6172 For each value of
\locvar{\ci} from
1 to
63, inclusive:
6175 Assign
\locvar{\zzi} the index in zig-zag order corresponding to
\locvar{\ci}.
6176 E.g., the value at row $(
\locvar{\ci}//
8)$ and column $(
\locvar{\ci}\%
8)$ in
6177 Figure~
\ref{tab:zig-zag
}
6179 Assign
\locvar{C
} the value
6180 $
\bitvar{COEFFS
}[\bitvar{\bi}][\locvar{\zzi}]*
\locvar{QMAT
}[\locvar{\ci}]$.
6182 Truncate
\locvar{C
} to a
16-bit representation by dropping any higher-order
6185 Assign $
\bitvar{DQC
}[\locvar{\ci}]$ the value
\locvar{C
}.
6189 \subsection{The Inverse DCT
}
6191 The
2D inverse DCT is separated into two applications of the
1D inverse DCT.
6192 The transform is first applied to each row, and then applied to each column of
6195 Each application of the
1D inverse DCT scales the values by a factor of two
6196 relative to the orthonormal version of the transform, for a total scale factor
6197 of four for the
2D transform.
6198 It is assumed that a similar scale factor is applied during the forward DCT
6199 used in the encoder, so that a division by
16 is required after the transform
6200 has been applied in both directions.
6201 The inclusion of this scale factor allows the integerized transform to operate
6202 with increased precision.
6203 All divisions throughout the transform are implemented with right shifts.
6204 Only the final division by $
16$ is rounded, with ties rounded towards positive
6207 All intermediate values are truncated to a
32-bit signed representation by
6208 discarding any higher-order bits in their two's complement representation.
6209 The final output of each
1D transform is truncated to
16-bits in the same
6211 In practice, if the high word of a $
16\times 16$ bit multiplication can be
6212 obtained directly,
16 bits is sufficient for every calculation except scaling
6214 Here we specify truncating to
16 bits before the multiplication to simplify
6215 implementations using hardware or common SIMD instruction sets.
6217 Note that if
16-bit register are used, overflow in the additions and
6218 subtractions should be handled using
\textit{unsaturated
} arithmetic.
6219 That is, the high-order bits should be discarded and the low-order bits
6220 retained, instead of clamping the result to the maximum or minimum value.
6221 This allows the maximum flexibility in re-ordering these instructions without
6222 deviating from this specification.
6224 The
1D transform can only overflow if input coefficients larger than $
\pm 6201$
6226 However, the result of applying the
2D forward transform on pixel values in the
6227 range $-
255\ldots 255$ can be as large as $
\pm 8157$ due to the scale factor
6228 of four that is applied, and quantization errors could make this even larger.
6229 Therefore, the coefficients cannot simply be clamped into a valid range before
6232 \subsubsection{The
1D Inverse DCT
}
6235 \paragraph{Input parameters:
}\hfill\\*
6236 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6237 \multicolumn{1}{c
}{Name
} &
6238 \multicolumn{1}{c
}{Type
} &
6239 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6240 \multicolumn{1}{c
}{Signed?
} &
6241 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6242 \bitvar{Y
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6243 16 & Yes & An
8-element array of DCT
6245 \bottomrule\end{tabularx
}
6247 \paragraph{Output parameters:
}\hfill\\*
6248 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6249 \multicolumn{1}{c
}{Name
} &
6250 \multicolumn{1}{c
}{Type
} &
6251 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6252 \multicolumn{1}{c
}{Signed?
} &
6253 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6254 \bitvar{X
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6255 16 & Yes & An
8-element array of output values. \\
6256 \bottomrule\end{tabularx
}
6258 \paragraph{Variables used:
}\hfill\\*
6259 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6260 \multicolumn{1}{c
}{Name
} &
6261 \multicolumn{1}{c
}{Type
} &
6262 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6263 \multicolumn{1}{c
}{Signed?
} &
6264 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6265 \locvar{T
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6266 32 & Yes & An
8-element array containing the
6267 current value of each signal line. \\
6268 \locvar{R
} & Integer &
32 & Yes & A temporary value. \\
6269 \bottomrule\end{tabularx
}
6272 A compliant decoder MUST use the exact implementation of the inverse DCT
6273 defined in this specification.
6274 Some operations may be re-ordered, but the result must be precisely equivalent.
6275 This is a design decision that limits some avenues of decoder optimization, but
6276 prevents any drift in the prediction loop.
6277 Theora uses a
16-bit integerized approximation of of the
8-point
1D inverse DCT
6278 based on the Chen factorization
\cite{CSF77
}.
6279 It requires
16 multiplications and
26 additions and subtractions.
6281 \begin{figure
}[htbp
]
6283 \includegraphics[width=
\textwidth]{idct
}
6285 \caption{Signal Flow Graph for the
1D Inverse DCT
}
6289 A signal flow graph of the transformation is presented in
6290 Figure~
\ref{fig:idct
}.
6291 This graph provides a good visualization of which parts of the transform are
6293 Time increases from left to right.
6295 Each signal line is involved in an operation where the line is marked with a
6296 dot $
\cdot$ or a circled plus sign $
\oplus$.
6297 The constants $
\locvar{C
}i$ and $
\locvar{S
}j$ are the
16-bit integer
6298 approximations of $
\cos(
\frac{i
\pi}{16})$ and $
\sin(
\frac{j
\pi}{16})$ listed
6299 in Table~
\ref{tab:dct-consts
}.
6300 When they appear next to a signal line, the value on that line is scaled by the
6302 A circled minus sign $
\ominus$ next to a signal line indicates that the value
6303 on that line is negated.
6305 Operations on a single signal path through the graph cannot be reordered, but
6306 operations on different paths may be, or may be executed in parallel.
6307 Different graphs may be obtainable using the associative, commutative, and
6308 distributive properties of unsaturated arithmetic.
6309 The column of numbers on the left represents an initial permutation of the
6310 input DCT coefficients.
6311 The column on the right represents the unpermuted output.
6312 One can be obtained by bit-reversing the
3-bit binary representation of the
6317 \begin{tabular
}{llr
}\toprule
6318 $
\locvar{C
}i$ & $
\locvar{S
}j$ & Value \\
\midrule
6319 $
\locvar{C1
}$ & $
\locvar{S7
}$ & $
64277$ \\
6320 $
\locvar{C2
}$ & $
\locvar{S6
}$ & $
60547$ \\
6321 $
\locvar{C3
}$ & $
\locvar{S5
}$ & $
54491$ \\
6322 $
\locvar{C4
}$ & $
\locvar{S4
}$ & $
46341$ \\
6323 $
\locvar{C5
}$ & $
\locvar{S3
}$ & $
36410$ \\
6324 $
\locvar{C6
}$ & $
\locvar{S2
}$ & $
25080$ \\
6325 $
\locvar{C7
}$ & $
\locvar{S1
}$ & $
12785$ \\
6326 \bottomrule\end{tabular
}
6328 \caption{16-bit Approximations of Sines and Cosines
}
6329 \label{tab:dct-consts
}
6334 Assign $
\locvar{T
}[0]$ the value $
\bitvar{Y
}[0]+
\bitvar{Y
}[4]$.
6336 Truncate $
\locvar{T
}[0]$ to a
16-bit representation by dropping any
6339 Assign $
\locvar{T
}[0]$ the value
6340 $
\locvar{C4
}*
\locvar{T
}[0]>>
16$.
6342 Assign $
\locvar{T
}[1]$ the value $
\bitvar{Y
}[0]-
\bitvar{Y
}[4]$.
6344 Truncate $
\locvar{T
}[1]$ to a
16-bit representation by dropping any
6347 Assign $
\locvar{T
}[1]$ the value $
\locvar{C4
}*
\locvar{T
}[1]>>
16$.
6349 Assign $
\locvar{T
}[2]$ the value $(
\locvar{C6
}*
\bitvar{Y
}[2]>>
16)-
6350 (
\locvar{S6
}*
\bitvar{Y
}[6]>>
16)$.
6352 Assign $
\locvar{T
}[3]$ the value $(
\locvar{S6
}*
\bitvar{Y
}[2]>>
16)+
6353 (
\locvar{C6
}*
\bitvar{Y
}[6]>>
16)$.
6355 Assign $
\locvar{T
}[4]$ the value $(
\locvar{C7
}*
\bitvar{Y
}[1]>>
16)-
6356 (
\locvar{S7
}*
\bitvar{Y
}[7]>>
16)$.
6358 Assign $
\locvar{T
}[5]$ the value $(
\locvar{C3
}*
\bitvar{Y
}[5]>>
16)-
6359 (
\locvar{S3
}*
\bitvar{Y
}[3]>>
16)$.
6361 Assign $
\locvar{T
}[6]$ the value $(
\locvar{S3
}*
\bitvar{Y
}[5]>>
16)+
6362 (
\locvar{C3
}*
\bitvar{Y
}[3]>>
16)$.
6364 Assign $
\locvar{T
}[7]$ the value $(
\locvar{S7
}*
\bitvar{Y
}[1]>>
16)+
6365 (
\locvar{C7
}*
\bitvar{Y
}[7]>>
16)$.
6367 Assign
\locvar{R
} the value $
\locvar{T
}[4]+
\locvar{T
}[5]$.
6369 Assign $
\locvar{T
}[5]$ the value $
\locvar{T
}[4]-
\locvar{T
}[5]$.
6371 Truncate $
\locvar{T
}[5]$ to a
16-bit representation by dropping any
6374 Assign $
\locvar{T
}[5]$ the value $
\locvar{C4
}*(-
\locvar{T
}[5])>>
16$.
6376 Assign $
\locvar{T
}[4]$ the value $
\locvar{R
}$.
6378 Assign
\locvar{R
} the value $
\locvar{T
}[7]+
\locvar{T
}[6]$.
6380 Assign $
\locvar{T
}[6]$ the value $
\locvar{T
}[7]-
\locvar{T
}[6]$.
6382 Truncate $
\locvar{T
}[6]$ to a
16-bit representation by dropping any
6385 Assign $
\locvar{T
}[6]$ the value $
\locvar{C4
}*
\locvar{T
}[6]>>
16$.
6387 Assign $
\locvar{T
}[7]$ the value $
\locvar{R
}$.
6389 Assign
\locvar{R
} the value $
\locvar{T
}[0]+
\locvar{T
}[3]$.
6391 Assign $
\locvar{T
}[3]$ the value $
\locvar{T
}[0]-
\locvar{T
}[3]$.
6393 Assign $
\locvar{T
}[0]$ the value
\locvar{R
}.
6395 Assign
\locvar{R
} the value $
\locvar{T
}[1]+
\locvar{T
}[2]$
6397 Assign $
\locvar{T
}[2]$ the value $
\locvar{T
}[1]-
\locvar{T
}[2]$
6399 Assign $
\locvar{T
}[1]$ the value
\locvar{R
}.
6401 Assign
\locvar{R
} the value $
\locvar{T
}[6]+
\locvar{T
}[5]$.
6403 Assign $
\locvar{T
}[5]$ the value $
\locvar{T
}[6]-
\locvar{T
}[5]$.
6405 Assign $
\locvar{T
}[6]$ the value
\locvar{R
}.
6407 Assign
\locvar{R
} the value $
\locvar{T
}[0]+
\locvar{T
}[7]$.
6409 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6412 Assign $
\bitvar{X
}[0]$ the value
\locvar{R
}.
6414 Assign
\locvar{R
} the value $
\locvar{T
}[1]+
\locvar{T
}[6]$.
6416 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6419 Assign $
\bitvar{X
}[1]$ the value
\locvar{R
}.
6421 Assign
\locvar{R
} the value $
\locvar{T
}[2]+
\locvar{T
}[5]$.
6423 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6426 Assign $
\bitvar{X
}[2]$ the value
\locvar{R
}.
6428 Assign
\locvar{R
} the value $
\locvar{T
}[3]+
\locvar{T
}[4]$.
6430 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6433 Assign $
\bitvar{X
}[3]$ the value
\locvar{R
}.
6435 Assign
\locvar{R
} the value $
\locvar{T
}[3]-
\locvar{T
}[4]$.
6437 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6440 Assign $
\bitvar{X
}[4]$ the value
\locvar{R
}.
6442 Assign
\locvar{R
} the value $
\locvar{T
}[2]-
\locvar{T
}[5]$.
6444 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6447 Assign $
\bitvar{X
}[5]$ the value
\locvar{R
}.
6449 Assign
\locvar{X
} the value $
\locvar{T
}[1]-
\locvar{T
}[6]$.
6451 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6454 Assign $
\bitvar{X
}[6]$ the value
\locvar{R
}.
6456 Assign
\locvar{R
} the value $
\locvar{T
}[0]-
\locvar{T
}[7]$.
6458 Truncate
\locvar{R
} to a
16-bit representation by dropping any higher-order
6461 Assign $
\bitvar{X
}[7]$ the value
\locvar{R
}.
6464 \subsubsection{The
2D Inverse DCT
}
6467 \paragraph{Input parameters:
}\hfill\\*
6468 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6469 \multicolumn{1}{c
}{Name
} &
6470 \multicolumn{1}{c
}{Type
} &
6471 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6472 \multicolumn{1}{c
}{Signed?
} &
6473 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6474 \bitvar{DQC
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6475 14 & Yes & A $
64$-element array of dequantized
6476 DCT coefficients in natural order (cf. Section~
\ref{sec:dct-coeffs
}). \\
6477 \bottomrule\end{tabularx
}
6479 \paragraph{Output parameters:
}\hfill\\*
6480 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6481 \multicolumn{1}{c
}{Name
} &
6482 \multicolumn{1}{c
}{Type
} &
6483 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6484 \multicolumn{1}{c
}{Signed?
} &
6485 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6486 \bitvar{RES
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6487 16 & Yes & An $
8\times 8$ array containing the
6488 decoded residual for the current block. \\
6489 \bottomrule\end{tabularx
}
6491 \paragraph{Variables used:
}\hfill\\*
6492 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6493 \multicolumn{1}{c
}{Name
} &
6494 \multicolumn{1}{c
}{Type
} &
6495 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6496 \multicolumn{1}{c
}{Signed?
} &
6497 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6498 \locvar{\ci} & Integer &
3 & No & The column index. \\
6499 \locvar{\ri} & Integer &
3 & No & The row index. \\
6500 \locvar{Y
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6501 16 & Yes & An
8-element array of
1D iDCT input
6503 \locvar{X
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6504 16 & Yes & An
8-element array of
1D iDCT output
6506 \bottomrule\end{tabularx
}
6509 This procedure applies the
1D inverse DCT transform
16 times to a block of
6510 dequantized coefficients: once for each of the
8 rows, and once for each of
6511 the
8 columns of the result.
6512 Note that the coordinate system used for the columns is the same right-handed
6513 coordinate system used by the rest of Theora.
6514 Thus, the column is indexed from bottom to top, not top to bottom.
6515 The final values are divided by sixteen, rounding with ties rounded towards
6520 For each value of
\locvar{\ri} from
0 to
7:
6523 For each value of
\locvar{\ci} from
0 to
7:
6526 Assign $
\locvar{Y
}[\locvar{\ci}]$ the value
6527 $
\bitvar{DQC
}[\locvar{\ri}*
8+
\locvar{\ci}]$.
6530 Compute
\locvar{X
}, the
1D inverse DCT of
\locvar{Y
} using the procedure
6531 described in Section~
\ref{sub:
1d-idct
}.
6533 For each value of $
\locvar{\ci}$ from
0 to
7:
6536 Assign $
\bitvar{RES
}[\locvar{\ri}][\locvar{\ci}]$ the value
6537 $
\locvar{X
}[\locvar{\ci}]$.
6541 For each value of
\locvar{\ci} from
0 to
7:
6544 For each value of
\locvar{\ri} from
0 to
7:
6547 Assign $
\locvar{Y
}[\locvar{\ri}]$ the value
6548 $
\bitvar{RES
}[\locvar{\ri}][\locvar{\ci}]$.
6551 Compute
\locvar{X
}, the
1D inverse DCT of
\locvar{Y
} using the procedure
6552 described in Section~
\ref{sub:
1d-idct
}.
6554 For each value of
\locvar{\ri} from
0 to
7:
6557 Assign $
\bitvar{RES
}[\locvar{\ri}][\locvar{\ci}]$ the value
6558 $(
\locvar{X
}[\locvar{\ri}]+
8)>>
4$.
6563 \subsubsection{The
1D Forward DCT (Non-Normative)
}
6565 \paragraph{Input parameters:
}\hfill\\*
6566 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6567 \multicolumn{1}{c
}{Name
} &
6568 \multicolumn{1}{c
}{Type
} &
6569 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6570 \multicolumn{1}{c
}{Signed?
} &
6571 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6572 \bitvar{X
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6573 14 & Yes & An
8-element array of input values. \\
6574 \bottomrule\end{tabularx
}
6576 \paragraph{Output parameters:
}\hfill\\*
6577 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6578 \multicolumn{1}{c
}{Name
} &
6579 \multicolumn{1}{c
}{Type
} &
6580 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6581 \multicolumn{1}{c
}{Signed?
} &
6582 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6583 \bitvar{Y
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6584 16 & Yes & An
8-element array of DCT
6586 \bottomrule\end{tabularx
}
6588 \paragraph{Variables used:
}\hfill\\*
6589 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6590 \multicolumn{1}{c
}{Name
} &
6591 \multicolumn{1}{c
}{Type
} &
6592 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6593 \multicolumn{1}{c
}{Signed?
} &
6594 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6595 \locvar{T
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6596 16 & Yes & An
8-element array containing the
6597 current value of each signal line. \\
6598 \locvar{R
} & Integer &
16 & Yes & A temporary value. \\
6599 \bottomrule\end{tabularx
}
6602 The forward transform used in the encoder is not mandated by this standard as
6604 Precise equivalence in the inverse transform alone is all that is required to
6605 guarantee that there is no mismatch in the prediction loop between encoder and
6606 any compliant decoder implementation.
6607 However, a forward transform is provided here as a convenience for implementing
6609 This is the version of the transform used by Xiph.org's Theora encoder, which
6610 is the same as that used by VP3.
6611 Like the inverse DCT, it is first applied to each row, and then applied to each
6612 column of the result.
6614 \begin{figure
}[htbp
]
6616 \includegraphics[width=
\textwidth]{fdct
}
6618 \caption{Signal Flow Graph for the
1D Forward DCT
}
6622 The signal flow graph for the forward transform is given in
6623 Figure~
\ref{fig:fdct
}.
6624 It is largely the reverse of the flow graph given for the inverse DCT.
6625 It is important to note that the signs on the constants in the rotations have
6626 changed, and the
\locvar{C4
} scale factors on one of the lower butterflies now
6627 appear on the opposite side.
6628 The column of numbers on the left represents the unpermuted input, and the
6629 column on the right the permuted output DCT coefficients.
6631 A proper division by $
2^
{16}$ is done after the multiplications instead of a
6632 shift in the forward transform.
6633 This can be implemented quickly by adding an offset of $
\hex{FFFF
}$ if the
6634 number is negative, and then shifting as before.
6635 This slightly increases the computational complexity of the transform.
6636 Unlike the inverse DCT,
16-bit registers and a $
16\times16\rightarrow32$ bit
6637 multiply are sufficient to avoid any overflow, so long as the input is in the
6638 range $-
6270\ldots 6270$, which is larger than required.
6642 Assign $
\locvar{T
}[0]$ the value $
\bitvar{X
}[0]+
\bitvar{X
}[7]$.
6644 Assign $
\locvar{T
}[1]$ the value $
\bitvar{X
}[1]+
\bitvar{X
}[6]$.
6646 Assign $
\locvar{T
}[2]$ the value $
\bitvar{X
}[2]+
\bitvar{X
}[5]$.
6648 Assign $
\locvar{T
}[3]$ the value $
\bitvar{X
}[3]+
\bitvar{X
}[4]$.
6650 Assign $
\locvar{T
}[4]$ the value $
\bitvar{X
}[3]-
\bitvar{X
}[4]$.
6652 Assign $
\locvar{T
}[5]$ the value $
\bitvar{X
}[2]-
\bitvar{X
}[5]$.
6654 Assign $
\locvar{T
}[6]$ the value $
\bitvar{X
}[1]-
\bitvar{X
}[6]$.
6656 Assign $
\locvar{T
}[7]$ the value $
\bitvar{X
}[0]-
\bitvar{X
}[7]$.
6658 Assign
\locvar{R
} the value $
\locvar{T
}[0]+
\locvar{T
}[3]$.
6660 Assign $
\locvar{T
}[3]$ the value $
\locvar{T
}[0]-
\locvar{T
}[3]$.
6662 Assign $
\locvar{T
}[0]$ the value
\locvar{R
}.
6664 Assign
\locvar{R
} the value $
\locvar{T
}[1]+
\locvar{T
}[2]$.
6666 Assign $
\locvar{T
}[2]$ the value $
\locvar{T
}[1]-
\locvar{T
}[2]$.
6668 Assign $
\locvar{T
}[1]$ the value
\locvar{R
}.
6670 Assign
\locvar{R
} the value $
\locvar{T
}[6]-
\locvar{T
}[5]$.
6672 Assign $
\locvar{T
}[6]$ the value
6673 $(
\locvar{C4
}*(
\locvar{T
}[6]+
\locvar{T
}[5]))//
16$.
6675 Assign $
\locvar{T
}[5]$ the value $(
\locvar{C4
}*
\locvar{R
})//
16$.
6677 Assign
\locvar{R
} the value $
\locvar{T
}[4]+
\locvar{T
}[5]$.
6679 Assign $
\locvar{T
}[5]$ the value $
\locvar{T
}[4]-
\locvar{T
}[5]$.
6681 Assign $
\locvar{T
}[4]$ the value
\locvar{R
}.
6683 Assign
\locvar{R
} the value $
\locvar{T
}[7]+
\locvar{T
}[6]$.
6685 Assign $
\locvar{T
}[6]$ the value $
\locvar{T
}[7]-
\locvar{T
}[6]$.
6687 Assign $
\locvar{T
}[7]$ the value
\locvar{R
}.
6689 Assign $
\bitvar{Y
}[0]$ the value
6690 $(
\locvar{C4
}*(
\locvar{T
}[0]+
\locvar{T
}[1]))//
16$.
6692 Assign $
\bitvar{Y
}[4]$ the value
6693 $(
\locvar{C4
}*(
\locvar{T
}[0]-
\locvar{T
}[1]))//
16$.
6695 Assign $
\bitvar{Y
}[2]$ the value
6696 $((
\locvar{S6
}*
\locvar{T
}[3])//
16)+
6697 ((
\locvar{C6
}*
\locvar{T
}[2])//
16)$.
6699 Assign $
\bitvar{Y
}[6]$ the value
6700 $((
\locvar{C6
}*
\locvar{T
}[3])//
16)-
6701 ((
\locvar{S6
}*
\locvar{T
}[2])//
16)$.
6703 Assign $
\bitvar{Y
}[1]$ the value
6704 $((
\locvar{S7
}*
\locvar{T
}[7])//
16)+
6705 ((
\locvar{C7
}*
\locvar{T
}[4])//
16)$.
6707 Assign $
\bitvar{Y
}[5]$ the value
6708 $((
\locvar{S3
}*
\locvar{T
}[6])//
16)+
6709 ((
\locvar{C3
}*
\locvar{T
}[5])//
16)$.
6711 Assign $
\bitvar{Y
}[3]$ the value
6712 $((
\locvar{C3
}*
\locvar{T
}[6])//
16)-
6713 ((
\locvar{S3
}*
\locvar{T
}[5])//
16)$.
6715 Assign $
\bitvar{Y
}[7]$ the value
6716 $((
\locvar{C7
}*
\locvar{T
}[7])//
16)-
6717 ((
\locvar{S7
}*
\locvar{T
}[4])//
16)$.
6720 \subsection{The Complete Reconstruction Algorithm
}
6723 \paragraph{Input parameters:
}\hfill\\*
6724 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6725 \multicolumn{1}{c
}{Name
} &
6726 \multicolumn{1}{c
}{Type
} &
6727 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6728 \multicolumn{1}{c
}{Signed?
} &
6729 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6730 \bitvar{ACSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
6731 16 & No & A
64-element array of scale values
6732 for AC coefficients for each
\qi\ value. \\
6733 \bitvar{DCSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
6734 16 & No & A
64-element array of scale values
6735 for the DC coefficient for each
\qi\ value. \\
6736 \bitvar{BMS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
6737 8 & No & A $
\bitvar{NBMS
}\times 64$ array
6738 containing the base matrices. \\
6739 \bitvar{NQRS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
6740 6 & No & A $
2\times 3$ array containing the
6741 number of quant ranges for a given
\qti\ and
\pli, respectively.
6742 This is at most $
63$. \\
6743 \bitvar{QRSIZES
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
6744 6 & No & A $
2\times 3\times 63$ array of the
6745 sizes of each quant range for a given
\qti\ and
\pli, respectively.
6746 Only the first $
\bitvar{NQRS
}[\qti][\pli]$ values are used. \\
6747 \bitvar{QRBMIS
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
6748 9 & No & A $
2\times 3\times 64$ array of the
6749 \bmi's used for each quant range for a given
\qti\ and
\pli, respectively.
6750 Only the first $(
\bitvar{NQRS
}[\qti][\pli]+
1)$ values are used. \\
6751 \bitvar{RPYW
} & Integer &
20 & No & The width of the $Y'$ plane of the
6752 reference frames in pixels. \\
6753 \bitvar{RPYH
} & Integer &
20 & No & The height of the $Y'$ plane of the
6754 reference frames in pixels. \\
6755 \bitvar{RPCW
} & Integer &
20 & No & The width of the $C_b$ and $C_r$
6756 planes of the reference frames in pixels. \\
6757 \bitvar{RPCH
} & Integer &
20 & No & The height of the $C_b$ and $C_r$
6758 planes of the reference frames in pixels. \\
6759 \bitvar{GOLDREFY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6760 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
6761 array containing the contents of the $Y'$ plane of the golden reference
6763 \bitvar{GOLDREFCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6764 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
6765 array containing the contents of the $C_b$ plane of the golden reference
6767 \bitvar{GOLDREFCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6768 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
6769 array containing the contents of the $C_r$ plane of the golden reference
6771 \bitvar{PREVREFY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6772 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
6773 array containing the contents of the $Y'$ plane of the previous reference
6775 \bitvar{PREVREFCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6776 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
6777 array containing the contents of the $C_b$ plane of the previous reference
6779 \bitvar{PREVREFCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6780 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
6781 array containing the contents of the $C_r$ plane of the previous reference
6783 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
6785 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6786 1 & No & An
\bitvar{NBS
}-element array of
6787 flags indicating which blocks are coded. \\
6788 \bitvar{MBMODES
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6789 3 & No & An
\bitvar{NMBS
}-element array of
6790 coding modes for each macro block. \\
6791 \bitvar{MVECTS
} &
\multicolumn{1}{p
{50pt
}}{Array of
2D Integer Vectors
} &
6792 6 & Yes & An
\bitvar{NBS
}-element array of
6793 motion vectors for each block. \\
6794 \bitvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6795 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
6796 quantized DCT coefficient values for each block in zig-zag order. \\
6797 \bitvar{NCOEFFS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6798 7 & No & An
\bitvar{NBS
}-element array of the
6799 coefficient count for each block. \\
6800 \bitvar{QIS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
6801 6 & No & An
\bitvar{NQIS
}-element array of
6803 \bitvar{QIIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
6804 2 & No & An
\bitvar{NBS
}-element array of
6805 \locvar{\qii} values for each block. \\
6806 \bottomrule\end{tabularx
}
6808 \paragraph{Output parameters:
}\hfill\\*
6809 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6810 \multicolumn{1}{c
}{Name
} &
6811 \multicolumn{1}{c
}{Type
} &
6812 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6813 \multicolumn{1}{c
}{Signed?
} &
6814 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6815 \bitvar{RECY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6816 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
6817 array containing the contents of the $Y'$ plane of the reconstructed frame. \\
6818 \bitvar{RECCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6819 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
6820 array containing the contents of the $C_b$ plane of the reconstructed frame. \\
6821 \bitvar{RECCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6822 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
6823 array containing the contents of the $C_r$ plane of the reconstructed frame. \\
6824 \bottomrule\end{tabularx
}
6826 \paragraph{Variables used:
}\hfill\\*
6827 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
6828 \multicolumn{1}{c
}{Name
} &
6829 \multicolumn{1}{c
}{Type
} &
6830 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
6831 \multicolumn{1}{c
}{Signed?
} &
6832 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
6833 \locvar{RPW
} & Integer &
20 & No & The width of the current plane of the
6834 current reference frame in pixels. \\
6835 \locvar{RPH
} & Integer &
20 & No & The height of the current plane of
6836 the current reference frame in pixels. \\
6837 \locvar{REFP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6838 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
6839 array containing the contents of the current plane of the current reference
6841 \locvar{BX
} & Integer &
20 & No & The horizontal pixel index of the
6842 lower-left corner of the current block. \\
6843 \locvar{BY
} & Integer &
20 & No & The vertical pixel index of the
6844 lower-left corner of the current block. \\
6845 \locvar{MVX
} & Integer &
5 & No & The horizontal component of the first
6846 whole-pixel motion vector. \\
6847 \locvar{MVY
} & Integer &
5 & No & The vertical component of the first
6848 whole-pixel motion vector. \\
6849 \locvar{MVX2
} & Integer &
5 & No & The horizontal component of the second
6850 whole-pixel motion vector. \\
6851 \locvar{MVY2
} & Integer &
5 & No & The vertical component of the second
6852 whole-pixel motion vector. \\
6853 \locvar{PRED
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6854 8 & No & An $
8\times 8$ array of predictor
6855 values to use for the current block. \\
6856 \locvar{RES
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
6857 16 & Yes & An $
8\times 8$ array containing the
6858 decoded residual for the current block. \\
6859 \locvar{QMAT
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
6860 16 & No & A
64-element array of quantization
6861 values for each DCT coefficient in natural order. \\
6862 \locvar{DC
} & Integer &
29 & Yes & The dequantized DC coefficient of a
6864 \locvar{P
} & Integer &
17 & Yes & A reconstructed pixel value. \\
6865 \locvar{\bi} & Integer &
36 & No & The index of the current block in
6867 \locvar{\mbi} & Integer &
32 & No & The index of the macro block
6868 containing block
\locvar{\bi}. \\
6869 \locvar{\pli} & Integer &
2 & No & The
color plane index of the current
6871 \locvar{\rfi} & Integer &
2 & No & The index of the reference frame
6872 indicated by the coding mode for macro block
\locvar{\mbi}. \\
6873 \locvar{\idx{bx
}} & Integer &
3 & No & The horizontal pixel index in the
6875 \locvar{\idx{by
}} & Integer &
3 & No & The vertical pixel index in the
6877 \locvar{\qti} & Integer &
1 & No & A quantization type index.
6878 See Table~
\ref{tab:quant-types
}.\\
6879 \locvar{\idx{qi0
}} & Integer &
6 & No & The quantization index of the DC
6881 \locvar{\qi} & Integer &
6 & No & The quantization index of the AC
6883 \bottomrule\end{tabularx
}
6886 This section takes the decoded packet data and uses the previously defined
6887 procedures to reconstruct each block of the current frame.
6888 For coded blocks, a predictor is formed using the coding mode and, if
6889 applicable, the motion vector, and then the residual is computed from the
6890 quantized DCT coefficients.
6891 For uncoded blocks, the contents of the co-located block are copied from the
6892 previous frame and the residual is cleared to zero.
6893 Then the predictor and residual are added, and the result clamped to the range
6894 $
0\ldots 255$ and stored in the current frame.
6896 In the special case that a block contains only a DC coefficient, the
6897 dequantization and inverse DCT transform is skipped.
6898 Instead the constant pixel value for the entire block is computed in one step.
6899 Note that the truncation of intermediate operations is omitted and the final
6900 rounding is slightly different in this case.
6901 The check for whether or not the block contains only a DC coefficient is based
6902 on the coefficient count returned from the token decode procedure of
6903 Section~
\ref{sec:dct-decode
}, and not by checking to see if the remaining
6904 coefficient values are zero.
6905 Also note that even when the coefficient count indicates the block contains
6906 zero coefficients, the DC coefficient is still processed, as undoing DC
6907 prediction might have made it non-zero.
6909 After this procedure, the frame is completely reconstructed, but before it can
6910 be used as a reference frame, a loop filter must be run over it to help reduce
6912 This is detailed in Section~
\ref{sec:loopfilter
}.
6916 Assign
\locvar{\idx{qi0
}} the value $
\bitvar{QIS
}[0]$.
6918 For each value of
\locvar{\bi} from
0 to $(
\bitvar{NBS
}-
1)$:
6921 Assign
\locvar{\pli} the index of the
color plane block
\locvar{\bi} belongs
6924 Assign
\locvar{BX
} the horizontal pixel index of the lower-left corner of block
6927 Assign
\locvar{BY
} the vertical pixel index of the lower-left corner of block
6930 If $
\bitvar{BCODED
}[\locvar{\bi}]$ is non-zero:
6933 Assign
\locvar{\mbi} the index of the macro block containing block
6936 If $
\bitvar{MBMODES
}[\locvar{\mbi}]$ is
1 (INTRA), assign
\locvar{\qti} the
6939 Otherwise, assign
\locvar{\qti} the value $
1$.
6941 Assign
\locvar{\rfi} the value of the Reference Frame Index column of
6942 Table~
\ref{tab:cm-refs
} corresponding to $
\bitvar{MBMODES
}[\locvar{\mbi}]$.
6944 If
\locvar{\rfi} is zero, compute
\locvar{PRED
} using the procedure given in
6945 Section~
\ref{sub:predintra
}.
6950 Assign
\locvar{REFP
},
\locvar{RPW
}, and
\locvar{RPH
} the values given in
6951 Table~
\ref{tab:refp
} corresponding to current value of
\locvar{\rfi} and
6956 \begin{tabular
}{cclll
}\toprule
6957 \locvar{\rfi} &
\locvar{\pli} &
6958 \locvar{REFP
} &
\locvar{RPW
} &
\locvar{RPH
} \\
\midrule
6959 $
1$ & $
0$ &
\bitvar{PREVREFY
} &
\bitvar{RPYW
} &
\bitvar{RPYH
} \\
6960 $
1$ & $
1$ &
\bitvar{PREVREFCB
} &
\bitvar{RPCW
} &
\bitvar{RPCH
} \\
6961 $
1$ & $
2$ &
\bitvar{PREVREFCR
} &
\bitvar{RPCW
} &
\bitvar{RPCH
} \\
6962 $
2$ & $
0$ &
\bitvar{GOLDREFY
} &
\bitvar{RPYW
} &
\bitvar{RPYH
} \\
6963 $
2$ & $
1$ &
\bitvar{GOLDREFCB
} &
\bitvar{RPCW
} &
\bitvar{RPCH
} \\
6964 $
2$ & $
2$ &
\bitvar{GOLDREFCR
} &
\bitvar{RPCW
} &
\bitvar{RPCH
} \\
6965 \bottomrule\end{tabular
}
6967 \caption{Reference Planes and Sizes for Each
\locvar{\rfi} and
\locvar{\pli}}
6972 Assign
\locvar{MVX
} the value
6974 \left\lfloor\lvert\bitvar{MVECTS
}[\locvar{\bi}]_x
\rvert\right\rfloor*
6975 \sign(
\bitvar{MVECTS
}[\locvar{\bi}]_x).
6978 Assign
\locvar{MVY
} the value
6980 \left\lfloor\lvert\bitvar{MVECTS
}[\locvar{\bi}]_y
\rvert\right\rfloor*
6981 \sign(
\bitvar{MVECTS
}[\locvar{\bi}]_y).
6984 Assign
\locvar{MVX2
} the value
6986 \left\lceil\lvert\bitvar{MVECTS
}[\locvar{\bi}]_x
\rvert\right\rceil*
6987 \sign(
\bitvar{MVECTS
}[\locvar{\bi}]_x).
6990 Assign
\locvar{MVY2
} the value
6992 \left\lceil\lvert\bitvar{MVECTS
}[\locvar{\bi}]_y
\rvert\right\rceil*
6993 \sign(
\bitvar{MVECTS
}[\locvar{\bi}]_y).
6996 If
\locvar{MVX
} equals
\locvar{MVX2
} and
\locvar{MVY
} equals
\locvar{MVY2
},
6997 use the values
\locvar{REFP
},
\locvar{RPW
},
\locvar{RPH
},
\locvar{BX
},
6998 \locvar{BY
},
\locvar{MVX
}, and
\locvar{MVY
}, compute
\locvar{PRED
} using the
6999 procedure given in Section~
\ref{sub:predfullpel
}.
7001 Otherwise, use the values
\locvar{REFP
},
\locvar{RPW
},
\locvar{RPH
},
7002 \locvar{BX
},
\locvar{BY
},
\locvar{MVX
},
\locvar{MVY
},
\locvar{MVX2
}, and
7003 \locvar{MVY2
} to compute
\locvar{PRED
} using the procedure given in
7004 Section~
\ref{sub:predhalfpel
}.
7007 If $
\bitvar{NCOEFFS
}[\locvar{\bi}]$ is less than
2:
7010 Using
\bitvar{ACSCALE
},
\bitvar{DCSCALE
},
\bitvar{BMS
},
\bitvar{NQRS
}, \\
7011 \bitvar{QRSIZES
},
\bitvar{QRBMIS
},
\locvar{\qti},
\locvar{\pli}, and
7012 \locvar{\idx{qi0
}}, use the procedure given in Section~
\ref{sub:quant-mat
} to
7013 compute the DC quantization matrix
\locvar{QMAT
}.
7015 Assign
\locvar{DC
} the value
7017 (
\bitvar{COEFFS
}[\bitvar{\bi}][0]*
\locvar{QMAT
}[0]+
15)>>
5.
7020 Truncate
\locvar{DC
} to a
16-bit representation by dropping any higher-order
7023 For each value of
\locvar{\idx{by
}} from
0 to
7, and each value of
7024 \locvar{\idx{bx
}} from
0 to
7, assign
7025 $
\locvar{RES
}[\locvar{\idx{by
}}][\locvar{\idx{bx
}}]$ the value
\locvar{DC
}.
7031 Assign
\locvar{\qi} the value $
\bitvar{QIS
}[\bitvar{QIIS
}[\locvar{\bi}]]$.
7033 Using
\bitvar{ACSCALE
},
\bitvar{DCSCALE
},
\bitvar{BMS
},
\bitvar{NQRS
}, \\
7034 \bitvar{QRSIZES
},
\bitvar{QRBMIS
},
\locvar{\qti},
\locvar{\pli},
7035 \locvar{\idx{qi0
}}, and
\locvar{\qi}, compute
\locvar{DQC
} using the procedure
7036 given in Section~
\ref{sub:dequant
}.
7038 Using
\locvar{DQC
}, compute
\locvar{RES
} using the procedure given in
7039 Section~
\ref{sub:
2d-idct
}.
7046 Assign
\locvar{\rfi} the value
1.
7048 Assign
\locvar{REFP
},
\locvar{RPW
}, and
\locvar{RPH
} the values given in
7049 Table~
\ref{tab:refp
} corresponding to current value of
\locvar{\rfi} and
7052 Assign
\locvar{MVX
} the value
0.
7054 Assign
\locvar{MVY
} the value
0.
7056 Using the values
\locvar{REFP
},
\locvar{RPW
},
\locvar{RPH
},
\locvar{BX
},
7057 \locvar{BY
},
\locvar{MVX
}, and
\locvar{MVY
}, compute
\locvar{PRED
} using the
7058 procedure given in Section~
\ref{sub:predfullpel
}.
7059 This is simply a copy of the co-located block in the previous reference frame.
7061 For each value of
\locvar{\idx{by
}} from
0 to
7, and each value of
7062 \locvar{\idx{bx
}} from
0 to
7, assign
7063 $
\locvar{RES
}[\locvar{\idx{by
}}][\locvar{\idx{bx
}}]$ the value
0.
7066 For each value of
\locvar{\idx{by
}} from
0 to
7, and each value of
7067 \locvar{\idx{bx
}} from
0 to
7:
7070 Assign
\locvar{P
} the value
7071 $(
\locvar{PRED
}[\locvar{\idx{by
}}][\locvar{\idx{bx
}}]+
7072 \locvar{RES
}[\locvar{\idx{by
}}][\locvar{\idx{bx
}}])$.
7074 If
\locvar{P
} is greater than $
255$, assign
\locvar{P
} the value $
255$.
7076 If
\locvar{P
} is less than $
0$, assign
\locvar{P
} the value $
0$.
7078 If
\locvar{\pli} equals
0, assign
7079 $
\bitvar{RECY
}[\locvar{BY
}+
\locvar{\idx{by
}}][\locvar{BX
}+
\locvar{\idx{bx
}}]$
7080 the value
\locvar{P
}.
7082 Otherwise, if
\locvar{\pli} equals
1, assign
7083 $
\bitvar{RECB
}[\locvar{BY
}+
\locvar{\idx{by
}}][\locvar{BX
}+
\locvar{\idx{bx
}}]$
7084 the value
\locvar{P
}.
7086 Otherwise,
\locvar{\pli} equals
2, so assign
7087 $
\bitvar{RECR
}[\locvar{BY
}+
\locvar{\idx{by
}}][\locvar{BX
}+
\locvar{\idx{bx
}}]$
7088 the value
\locvar{P
}.
7093 \section{Loop Filtering
}
7094 \label{sec:loopfilter
}
7096 The loop filter is a simple deblocking filter that is based on running a small
7097 edge detecting filter over the coded block edges and adjusting the pixel
7098 values by a tapered response.
7099 The filter response is modulated by the following non-linear function:
7101 \lflim(
\locvar{R
},
\bitvar{L
})&=
\left\
{\begin{array
}{ll
}
7102 0, &
\locvar{R
}\le-
2*
\bitvar{L
} \\
7103 -
\locvar{R
}-
2*
\bitvar{L
}, & -
2*
\bitvar{L
}<
\locvar{R
}\le-
\bitvar{L
} \\
7104 \locvar{R
}, & -
\bitvar{L
}<
\locvar{R
}<
\bitvar{L
} \\
7105 -
\locvar{R
}+
2*
\bitvar{L
}, &
\bitvar{L
}\le\locvar{R
}<
2*
\bitvar{L
} \\
7106 0, &
2*
\bitvar{L
}\le\locvar{R
}
7109 Here
\bitvar{L
} is a limiting value equal to $
\bitvar{LFLIMS
}[\idx{qi0
}]$.
7110 It defines the peaks of the function.
7111 \bitvar{LFLIMS
} is an array of values specified in the setup header and is
7112 indexed by
\idx{qi0
}, the first quantization index for the frame, the one used
7113 for all the DC coefficients.
7114 Larger values of
\bitvar{L
} indicate a stronger filter.
7116 \subsection{Horizontal Filter
}
7119 \paragraph{Input parameters:
}\hfill\\*
7120 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7121 \multicolumn{1}{c
}{Name
} &
7122 \multicolumn{1}{c
}{Type
} &
7123 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7124 \multicolumn{1}{c
}{Signed?
} &
7125 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7126 \bitvar{RECP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7127 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
7128 array containing the contents of a plane of the reconstructed frame. \\
7129 \bitvar{FX
} & Integer &
20 & No & The horizontal pixel index of the
7130 lower-left corner of the area to be filtered. \\
7131 \bitvar{FY
} & Integer &
20 & No & The vertical pixel index of the
7132 lower-left corner of the area to be filtered. \\
7133 \bitvar{L
} & Integer &
7 & No & The loop filter limit value. \\
7134 \bottomrule\end{tabularx
}
7136 \paragraph{Output parameters:
}\hfill\\*
7137 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7138 \multicolumn{1}{c
}{Name
} &
7139 \multicolumn{1}{c
}{Type
} &
7140 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7141 \multicolumn{1}{c
}{Signed?
} &
7142 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7143 \bitvar{RECP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7144 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
7145 array containing the contents of a plane of the reconstructed frame. \\
7146 \bottomrule\end{tabularx
}
7148 \paragraph{Variables used:
}\hfill\\*
7149 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7150 \multicolumn{1}{c
}{Name
} &
7151 \multicolumn{1}{c
}{Type
} &
7152 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7153 \multicolumn{1}{c
}{Signed?
} &
7154 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7155 \locvar{R
} & Integer &
9 & Yes & The edge detector response. \\
7156 \locvar{P
} & Integer &
9 & Yes & A filtered pixel value. \\
7157 \locvar{\idx{by
}} & Integer &
20 & No & The vertical pixel index in the
7159 \bottomrule\end{tabularx
}
7162 This procedure applies a $
4$-tap horizontal filter to each row of a vertical
7167 For each value of
\locvar{\idx{by
}} from $
0$ to $
7$:
7170 Assign
\locvar{R
} the value
7172 (
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}]-
7173 3*
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
1]+\\
7174 3*
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
2]-
7175 \bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
3]+
4)>>
3
7178 Assign
\locvar{P
} the value
7179 $(
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
1]+
7180 \lflim(
\locvar{R
},
\bitvar{L
}))$.
7182 If
\locvar{P
} is less than zero, assign
7183 $
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
1]$ the value zero.
7185 Otherwise, if
\locvar{P
} is greater than $
255$, assign
7186 $
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
1]$ the value $
255$.
7189 $
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
1]$ the value
7192 Assign
\locvar{P
} the value
7193 $(
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
2]-
7194 \lflim(
\locvar{R
},
\bitvar{L
}))$.
7196 If
\locvar{P
} is less than zero, assign
7197 $
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
2]$ the value zero.
7199 Otherwise, if
\locvar{P
} is greater than $
255$, assign
7200 $
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
2]$ the value $
255$.
7203 $
\bitvar{RECP
}[\bitvar{FY
}+
\locvar{\idx{by
}}][\bitvar{FX
}+
2]$ the value
7208 \subsection{Vertical Filter
}
7211 \paragraph{Input parameters:
}\hfill\\*
7212 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7213 \multicolumn{1}{c
}{Name
} &
7214 \multicolumn{1}{c
}{Type
} &
7215 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7216 \multicolumn{1}{c
}{Signed?
} &
7217 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7218 \bitvar{RECP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7219 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
7220 array containing the contents of a plane of the reconstructed frame. \\
7221 \bitvar{FX
} & Integer &
20 & No & The horizontal pixel index of the
7222 lower-left corner of the area to be filtered. \\
7223 \bitvar{FY
} & Integer &
20 & No & The vertical pixel index of the
7224 lower-left corner of the area to be filtered. \\
7225 \bitvar{L
} & Integer &
7 & No & The loop filter limit value. \\
7226 \bottomrule\end{tabularx
}
7228 \paragraph{Output parameters:
}\hfill\\*
7229 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7230 \multicolumn{1}{c
}{Name
} &
7231 \multicolumn{1}{c
}{Type
} &
7232 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7233 \multicolumn{1}{c
}{Signed?
} &
7234 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7235 \bitvar{RECP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7236 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
7237 array containing the contents of a plane of the reconstructed frame. \\
7238 \bottomrule\end{tabularx
}
7240 \paragraph{Variables used:
}\hfill\\*
7241 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7242 \multicolumn{1}{c
}{Name
} &
7243 \multicolumn{1}{c
}{Type
} &
7244 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7245 \multicolumn{1}{c
}{Signed?
} &
7246 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7247 \locvar{R
} & Integer &
9 & Yes & The edge detector response. \\
7248 \locvar{P
} & Integer &
9 & Yes & A filtered pixel value. \\
7249 \locvar{\idx{bx
}} & Integer &
20 & No & The horizontal pixel index in the
7251 \bottomrule\end{tabularx
}
7254 This procedure applies a $
4$-tap vertical filter to each column of a horizontal
7259 For each value of
\locvar{\idx{bx
}} from $
0$ to $
7$:
7262 Assign
\locvar{R
} the value
7264 (
\bitvar{RECP
}[\bitvar{FY
}][\bitvar{FX
}+
\locvar{\idx{bx
}}]-
7265 3*
\bitvar{RECP
}[\bitvar{FY
}+
1][\bitvar{FX
}+
\locvar{\idx{bx
}}]+\\
7266 3*
\bitvar{RECP
}[\bitvar{FY
}+
2][\bitvar{FX
}+
\locvar{\idx{bx
}}]-
7267 \bitvar{RECP
}[\bitvar{FY
}+
3][\bitvar{FX
}+
\locvar{\idx{bx
}}]+
4)>>
3
7270 Assign
\locvar{P
} the value
7271 $(
\bitvar{RECP
}[\bitvar{FY
}+
1][\bitvar{FX
}+
\locvar{\idx{bx
}}]+
7272 \lflim(
\locvar{R
},
\bitvar{L
}))$.
7274 If
\locvar{P
} is less than zero, assign
7275 $
\bitvar{RECP
}[\bitvar{FY
}+
1][\bitvar{FX
}+
\locvar{\idx{bx
}}]$ the value zero.
7277 Otherwise, if
\locvar{P
} is greater than $
255$, assign
7278 $
\bitvar{RECP
}[\bitvar{FY
}+
1][\bitvar{FX
}+
\locvar{\idx{bx
}}]$ the value $
255$.
7281 $
\bitvar{RECP
}[\bitvar{FY
}+
1][\bitvar{FX
}+
\locvar{\idx{bx
}}]$ the value
7284 Assign
\locvar{P
} the value
7285 $(
\bitvar{RECP
}[\bitvar{FY
}+
2][\bitvar{FX
}+
\locvar{\idx{bx
}}]-
7286 \lflim(
\locvar{R
},
\bitvar{L
}))$.
7288 If
\locvar{P
} is less than zero, assign
7289 $
\bitvar{RECP
}[\bitvar{FY
}+
2][\bitvar{FX
}+
\locvar{\idx{bx
}}]$ the value zero.
7291 Otherwise, if
\locvar{P
} is greater than $
255$, assign
7292 $
\bitvar{RECP
}[\bitvar{FY
}+
2][\bitvar{FX
}+
\locvar{\idx{bx
}}]$ the value $
255$.
7295 $
\bitvar{RECP
}[\bitvar{FY
}+
2][\bitvar{FX
}+
\locvar{\idx{bx
}}]$ the value
7300 \subsection{Complete Loop Filter
}
7301 \label{sub:loop-filt
}
7303 \paragraph{Input parameters:
}\hfill\\*
7304 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7305 \multicolumn{1}{c
}{Name
} &
7306 \multicolumn{1}{c
}{Type
} &
7307 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7308 \multicolumn{1}{c
}{Signed?
} &
7309 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7310 \bitvar{LFLIMS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
7311 7 & No & A
64-element array of loop filter limit
7313 \bitvar{RPYW
} & Integer &
20 & No & The width of the $Y'$ plane of the
7314 reconstruced frame in pixels. \\
7315 \bitvar{RPYH
} & Integer &
20 & No & The height of the $Y'$ plane of the
7316 reconstruced frame in pixels. \\
7317 \bitvar{RPCW
} & Integer &
20 & No & The width of the $C_b$ and $C_r$
7318 planes of the reconstruced frame in pixels. \\
7319 \bitvar{RPCH
} & Integer &
20 & No & The height of the $C_b$ and $C_r$
7320 planes of the reconstruced frame in pixels. \\
7321 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
7323 \bitvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
7324 1 & No & An
\bitvar{NBS
}-element array of
7325 flags indicating which blocks are coded. \\
7326 \bitvar{QIS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
7327 6 & No & An
\bitvar{NQIS
}-element array of
7329 \bitvar{RECY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7330 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
7331 array containing the contents of the $Y'$ plane of the reconstructed frame. \\
7332 \bitvar{RECCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7333 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7334 array containing the contents of the $C_b$ plane of the reconstructed frame. \\
7335 \bitvar{RECCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7336 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7337 array containing the contents of the $C_r$ plane of the reconstructed frame. \\
7338 \bottomrule\end{tabularx
}
7340 \paragraph{Output parameters:
}\hfill\\*
7341 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7342 \multicolumn{1}{c
}{Name
} &
7343 \multicolumn{1}{c
}{Type
} &
7344 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7345 \multicolumn{1}{c
}{Signed?
} &
7346 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7347 \bitvar{RECY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7348 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
7349 array containing the contents of the $Y'$ plane of the reconstructed frame. \\
7350 \bitvar{RECCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7351 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7352 array containing the contents of the $C_b$ plane of the reconstructed frame. \\
7353 \bitvar{RECCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7354 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7355 array containing the contents of the $C_r$ plane of the reconstructed frame. \\
7356 \bottomrule\end{tabularx
}
7358 \paragraph{Variables used:
}\hfill\\*
7359 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7360 \multicolumn{1}{c
}{Name
} &
7361 \multicolumn{1}{c
}{Type
} &
7362 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7363 \multicolumn{1}{c
}{Signed?
} &
7364 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7365 \locvar{RPW
} & Integer &
20 & No & The width of the current plane of the
7366 reconstructed frame in pixels. \\
7367 \locvar{RPH
} & Integer &
20 & No & The height of the current plane of
7368 the reconstructed frame in pixels. \\
7369 \locvar{RECP
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7370 8 & No & A $
\bitvar{RPH
}\times\bitvar{RPW
}$
7371 array containing the contents of the current plane of the reconstruced
7373 \locvar{BX
} & Integer &
20 & No & The horizontal pixel index of the
7374 lower-left corner of the current block. \\
7375 \locvar{BY
} & Integer &
20 & No & The vertical pixel index of the
7376 lower-left corner of the current block. \\
7377 \locvar{FX
} & Integer &
20 & No & The horizontal pixel index of the
7378 lower-left corner of the area to be filtered. \\
7379 \locvar{FY
} & Integer &
20 & No & The vertical pixel index of the
7380 lower-left corner of the area to be filtered. \\
7381 \locvar{L
} & Integer &
7 & No & The loop filter limit value. \\
7382 \locvar{\bi} & Integer &
36 & No & The index of the current block in
7384 \locvar{\bj} & Integer &
36 & No & The index of a neighboring block in
7386 \locvar{\pli} & Integer &
2 & No & The
color plane index of the current
7388 \bottomrule\end{tabularx
}
7391 This procedure defines the order that the various block edges are filtered.
7392 Because each application of one of the two filters above destructively modifies
7393 the contents of the reconstructed image, the precise output obtained differs
7394 depending on the order that horizontal and vertical filters are applied to the
7395 edges of a single block.
7396 The order defined here conforms to that used by VP3.
7400 Assign
\locvar{L
} the value $
\bitvar{LFLIMS
}[\bitvar{QIS
}[0]]$.
7402 For each block in
{\em raster
} order, with coded-order index
\locvar{\bi}:
7405 If $
\bitvar{BCODED
}[\locvar{\bi}]$ is non-zero:
7408 Assign
\locvar{\pli} the index of the
color plane block
\locvar{\bi} belongs
7411 Assign
\locvar{RECP
},
\locvar{RPW
}, and
\locvar{RPH
} the values given in
7412 Table~
\ref{tab:recp
} corresponding to the value of
\locvar{\pli}.
7416 \begin{tabular
}{clll
}\toprule
7417 \locvar{\pli} &
\locvar{RECP
} &
\locvar{RPW
} &
\locvar{RPH
} \\
\midrule
7418 $
0$ &
\bitvar{RECY
} &
\bitvar{RPYW
} &
\bitvar{RPYH
} \\
7419 $
1$ &
\bitvar{RECCB
} &
\bitvar{RPCW
} &
\bitvar{RPCH
} \\
7420 $
2$ &
\bitvar{RECCR
} &
\bitvar{RPCW
} &
\bitvar{RPCH
} \\
7421 \bottomrule\end{tabular
}
7423 \caption{Reconstructed Planes and Sizes for Each
\locvar{\pli}}
7428 Assign
\locvar{BX
} the horizontal pixel index of the lower-left corner of the
7431 Assign
\locvar{BY
} the vertical pixel index of the lower-left corner of the
7434 If
\locvar{BX
} is greater than zero:
7437 Assign
\locvar{FX
} the value $(
\locvar{BX
}-
2)$.
7439 Assign
\locvar{FY
} the value
\locvar{BY
}.
7441 Using
\locvar{RECP
},
\locvar{FX
},
\locvar{FY
}, and
\locvar{L
}, apply the
7442 horizontal block filter to the left edge of block
\locvar{\bi} with the
7443 procedure described in Section~
\ref{sub:filth
}.
7446 If
\locvar{BY
} is greater than zero:
7449 Assign
\locvar{FX
} the value
\locvar{BX
}.
7451 Assign
\locvar{FY
} the value $(
\locvar{BY
}-
2)$
7453 Using
\locvar{RECP
},
\locvar{FX
},
\locvar{FY
}, and
\locvar{L
}, apply the
7454 vertical block filter to the bottom edge of block
\locvar{\bi} with the
7455 procedure described in Section~
\ref{sub:filtv
}.
7458 If $(
\locvar{BX
}+
8)$ is less than
\locvar{RPW
} and
7459 $
\bitvar{BCODED
}[\locvar{\bj}]$ is zero, where
\locvar{\bj} is the coded-order
7460 index of the block adjacent to
\locvar{\bi} on the right:
7463 Assign
\locvar{FX
} the value $(
\locvar{BX
}+
6)$.
7465 Assign
\locvar{FY
} the value
\locvar{BY
}.
7467 Using
\locvar{RECP
},
\locvar{FX
},
\locvar{FY
}, and
\locvar{L
}, apply the
7468 horizontal block filter to the right edge of block
\locvar{\bi} with the
7469 procedure described in Section~
\ref{sub:filth
}.
7472 If $(
\locvar{BY
}+
8)$ is less than
\locvar{RPH
} and
7473 $
\bitvar{BCODED
}[\locvar{\bj}]$ is zero, where
\locvar{\bj} is the coded-order
7474 index of the block adjacent to
\locvar{\bi} above:
7477 Assign
\locvar{FX
} the value
\locvar{BX
}.
7479 Assign
\locvar{FY
} the value $(
\locvar{BY
}+
6)$
7481 Using
\locvar{RECP
},
\locvar{FX
},
\locvar{FY
}, and
\locvar{L
}, apply the
7482 vertical block filter to the top edge of block
\locvar{\bi} with the
7483 procedure described in Section~
\ref{sub:filtv
}.
7489 \paragraph{VP3 Compatibility
}
7491 The original VP3 decoder implemented unrestricted motion vectors by enlarging
7492 the reconstructed frame buffers and repeating the pixels on its edges into the
7494 However, for the previous reference frame this padding ocurred before the loop
7495 filter was applied, but for the golden reference frame it occurred afterwards.
7497 This means that for the previous reference frame, the padding values were
7498 required to be stored separately from the main image values.
7499 Furthermore, even if the previous and golden reference frames were in fact the
7500 same frame, they could have different padding values.
7501 Finally, the encoder did not apply the loop filter at all, which resulted in
7502 artifacts, particularly in near-static scenes, due to prediction-loop
7504 This last can only be considered a bug in the VP3 encoder.
7506 Given all these things, Theora now uniformly applies the loop filter before
7507 the reference frames are padded.
7508 This means it is possible to use the same buffer for the previous and golden
7509 reference frames when they do indeed refer to the same frame.
7510 It also means that on architectures where memory bandwidth is limited, it is
7511 possible to avoid storing padding values, and simply clamp the motion vectors
7512 applied to each pixel as described in Sections~
\ref{sub:predfullpel
}
7513 and~
\ref{sub:predhalfpel
}.
7514 This means that the predicted pixel values along the edges of the frame might
7515 differ slightly between VP3 and Theora, but since the VP3 encoder did not
7516 apply the loop filter in the first place, this is not likely to impose any
7517 serious compatibility issues.
7519 \section{Complete Frame Decode
}
7521 \paragraph{Input parameters:
}\hfill\\*
7522 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7523 \multicolumn{1}{c
}{Name
} &
7524 \multicolumn{1}{c
}{Type
} &
7525 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7526 \multicolumn{1}{c
}{Signed?
} &
7527 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7528 \bitvar{FMBW
} & Integer &
16 & No & The width of the frame in macro
7530 \bitvar{FMBH
} & Integer &
16 & No & The height of the frame in macro
7532 \bitvar{NSBS
} & Integer &
32 & No & The total number of super blocks in a
7534 \bitvar{NBS
} & Integer &
36 & No & The total number of blocks in a
7536 \bitvar{NMBS
} & Integer &
32 & No & The total number of macro blocks in a
7538 \bitvar{FRN
} & Integer &
32 & No & The frame-rate numerator. \\
7539 \bitvar{FRD
} & Integer &
32 & No & The frame-rate denominator. \\
7540 \bitvar{PARN
} & Integer &
24 & No & The pixel aspect-ratio numerator. \\
7541 \bitvar{PARD
} & Integer &
24 & No & The pixel aspect-ratio
7543 \bitvar{CS
} & Integer &
8 & No & The
color space. \\
7544 \bitvar{PF
} & Integer &
2 & No & The pixel format. \\
7545 \bitvar{NOMBR
} & Integer &
24 & No & The nominal bitrate of the stream, in
7547 \bitvar{QUAL
} & Integer &
6 & No & The quality hint. \\
7548 \bitvar{KFGSHIFT
} & Integer &
5 & No & The amount to shift the key frame
7549 number by in the granule position. \\
7550 \bitvar{LFLIMS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
7551 7 & No & A
64-element array of loop filter
7553 \bitvar{ACSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
7554 16 & No & A
64-element array of scale values
7555 for AC coefficients for each
\qi\ value. \\
7556 \bitvar{DCSCALE
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
7557 16 & No & A
64-element array of scale values
7558 for the DC coefficient for each
\qi\ value. \\
7559 \bitvar{NBMS
} & Integer &
10 & No & The number of base matrices. \\
7560 \bitvar{BMS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
7561 8 & No & A $
\bitvar{NBMS
}\times 64$ array
7562 containing the base matrices. \\
7563 \bitvar{NQRS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer array
} &
7564 6 & No & A $
2\times 3$ array containing the
7565 number of quant ranges for a given
\qti\ and
\pli, respectively.
7566 This is at most $
63$. \\
7567 \bitvar{QRSIZES
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
7568 6 & No & A $
2\times 3\times 63$ array of the
7569 sizes of each quant range for a given
\qti\ and
\pli, respectively.
7570 Only the first $
\bitvar{NQRS
}[\qti][\pli]$ values will be used. \\
7571 \bitvar{QRBMIS
} &
\multicolumn{1}{p
{50pt
}}{3D Integer array
} &
7572 9 & No & A $
2\times 3\times 64$ array of the
7573 \bmi's used for each quant range for a given
\qti\ and
\pli, respectively.
7574 Only the first $(
\bitvar{NQRS
}[\qti][\pli]+
1)$ values will be used. \\
7575 \bitvar{HTS
} &
\multicolumn{3}{l
}{Huffman table array
}
7576 & An
80-element array of Huffman tables
7577 with up to
32 entries each. \\
7578 \bitvar{GOLDREFY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7579 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
7580 array containing the contents of the $Y'$ plane of the golden reference
7582 \bitvar{GOLDREFCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7583 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7584 array containing the contents of the $C_b$ plane of the golden reference
7586 \bitvar{GOLDREFCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7587 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7588 array containing the contents of the $C_r$ plane of the golden reference
7590 \bitvar{PREVREFY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7591 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
7592 array containing the contents of the $Y'$ plane of the previous reference
7594 \bitvar{PREVREFCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7595 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7596 array containing the contents of the $C_b$ plane of the previous reference
7598 \bitvar{PREVREFCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7599 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7600 array containing the contents of the $C_r$ plane of the previous reference
7602 \bottomrule\end{tabularx
}
7604 \paragraph{Output parameters:
}\hfill\\*
7605 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7606 \multicolumn{1}{c
}{Name
} &
7607 \multicolumn{1}{c
}{Type
} &
7608 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7609 \multicolumn{1}{c
}{Signed?
} &
7610 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7611 \bitvar{RECY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7612 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
7613 array containing the contents of the $Y'$ plane of the reconstructed frame. \\
7614 \bitvar{RECCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7615 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7616 array containing the contents of the $C_b$ plane of the reconstructed
7618 \bitvar{RECCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7619 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7620 array containing the contents of the $C_r$ plane of the reconstructed
7622 \bitvar{GOLDREFY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7623 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
7624 array containing the contents of the $Y'$ plane of the golden reference
7626 \bitvar{GOLDREFCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7627 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7628 array containing the contents of the $C_b$ plane of the golden reference
7630 \bitvar{GOLDREFCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7631 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7632 array containing the contents of the $C_r$ plane of the golden reference
7634 \bitvar{PREVREFY
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7635 8 & No & A $
\bitvar{RPYH
}\times\bitvar{RPYW
}$
7636 array containing the contents of the $Y'$ plane of the previous reference
7638 \bitvar{PREVREFCB
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7639 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7640 array containing the contents of the $C_b$ plane of the previous reference
7642 \bitvar{PREVREFCR
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7643 8 & No & A $
\bitvar{RPCH
}\times\bitvar{RPCW
}$
7644 array containing the contents of the $C_r$ plane of the previous reference
7646 \bottomrule\end{tabularx
}
7648 \paragraph{Variables used:
}\hfill\\*
7649 \begin{tabularx
}{\textwidth}{@
{}llrcX@
{}}\toprule
7650 \multicolumn{1}{c
}{Name
} &
7651 \multicolumn{1}{c
}{Type
} &
7652 \multicolumn{1}{p
{30pt
}}{\centering Size (bits)
} &
7653 \multicolumn{1}{c
}{Signed?
} &
7654 \multicolumn{1}{c
}{Description and restrictions
} \\
\midrule\endhead
7655 \locvar{FTYPE
} & Integer &
1 & No & The frame type. \\
7656 \locvar{NQIS
} & Integer &
2 & No & The number of
\qi\ values. \\
7657 \locvar{QIS
} &
\multicolumn{1}{p
{40pt
}}{Integer array
} &
7658 6 & No & An
\locvar{NQIS
}-element array of
7660 \locvar{BCODED
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
7661 1 & No & An
\bitvar{NBS
}-element array of flags
7662 indicating which blocks are coded. \\
7663 \locvar{MBMODES
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
7664 3 & No & An
\bitvar{NMBS
}-element array of
7665 coding modes for each macro block. \\
7666 \locvar{MVECTS
} &
\multicolumn{1}{p
{50pt
}}{Array of
2D Integer Vectors
} &
7667 6 & Yes & An
\bitvar{NBS
}-element array of motion
7668 vectors for each block. \\
7669 \locvar{QIIS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
7670 2 & No & An
\bitvar{NBS
}-element array of
7671 \locvar{\qii} values for each block. \\
7672 \locvar{COEFFS
} &
\multicolumn{1}{p
{50pt
}}{2D Integer Array
} &
7673 16 & Yes & An $
\bitvar{NBS
}\times 64$ array of
7674 quantized DCT coefficient values for each block in zig-zag order. \\
7675 \locvar{NCOEFFS
} &
\multicolumn{1}{p
{40pt
}}{Integer Array
} &
7676 7 & No & An
\bitvar{NBS
}-element array of the
7677 coefficient count for each block. \\
7678 \bitvar{RPYW
} & Integer &
20 & No & The width of the $Y'$ plane of the
7679 reference frames in pixels. \\
7680 \bitvar{RPYH
} & Integer &
20 & No & The height of the $Y'$ plane of the
7681 reference frames in pixels. \\
7682 \bitvar{RPCW
} & Integer &
20 & No & The width of the $C_b$ and $C_r$
7683 planes of the reference frames in pixels. \\
7684 \bitvar{RPCH
} & Integer &
20 & No & The height of the $C_b$ and $C_r$
7685 planes of the reference frames in pixels. \\
7686 \locvar{\bi} & Integer &
36 & No & The index of the current block in coded
7688 \bottomrule\end{tabularx
}
7691 This procedure uses all the procedures defined in the previous section of this
7692 chapter to decode and reconstruct a complete frame.
7693 It takes as input values decoded from the headers, as well as the current
7695 As output, it gives the uncropped, reconstructed frame.
7696 This should be cropped to picture region before display.
7697 As a special case, a
0-byte packet is treated exactly like an inter frame with
7702 If the size of the data packet is non-zero:
7705 Decode the frame header values
\locvar{FTYPE
},
\locvar{NQIS
}, and
\locvar{QIS
}
7706 using the procedure given in Section~
\ref{sub:frame-header
}.
7708 Using
\locvar{FTYPE
},
\bitvar{NSBS
}, and
\bitvar{NBS
}, decode the list of coded
7709 block flags into
\locvar{BCODED
} using the procedure given in
7710 Section~
\ref{sub:coded-blocks
}.
7712 Using
\locvar{FTYPE
},
\bitvar{NMBS
},
\bitvar{NBS
}, and
\bitvar{BCODED
}, decode
7713 the macro block coding modes into
\locvar{MBMODES
} using the procedure given
7714 in Section~
\ref{sub:mb-modes
}.
7716 If
\locvar{FTYPE
} is non-zero (inter frame), using
\bitvar{PF
},
\bitvar{NMBS
},
7717 \locvar{MBMODES
},
\bitvar{NBS
}, and
\locvar{BCODED
}, decode the motion vectors
7718 into
\locvar{MVECTS
} using the procedure given in Section~
\ref{sub:mv-decode
}.
7720 Using
\bitvar{NBS
},
\locvar{BCODED
}, and
\locvar{NQIS
}, decode the block-level
7721 \qi\ values into
\locvar{QIIS
} using the procedure given in
7722 Section~
\ref{sub:block-qis
}.
7724 Using
\bitvar{NBS
},
\bitvar{NMBS
},
\locvar{BCODED
}, and
\bitvar{HTS
}, decode
7725 the DCT coefficients into
\locvar{NCOEFFS
} and
\locvar{NCOEFFS
} using the
7726 procedure given in Section~
\ref{sub:dct-coeffs
}.
7728 Using
\locvar{BCODED
} and
\locvar{MBMODES
}, undo the DC prediction on the DC
7729 coefficients stored in
\locvar{COEFFS
} using the procedure given in
7730 Section~
\ref{sub:dc-pred-undo
}.
7736 Assign
\locvar{FTYPE
} the value
1 (inter frame).
7738 Assign
\locvar{NQIS
} the value
1.
7740 Assign $
\locvar{QIS
}[0]$ the value
63.
7742 For each value of
\locvar{\bi} from
0 to $(
\bitvar{NBS
}-
1)$, assign
7743 $
\locvar{BCODED
}[\locvar{\bi}]$ the value zero.
7746 Assign
\locvar{RPYW
} and
\locvar{RPYH
} the values $(
16*
\bitvar{FMBW
})$ and
7747 $(
16*
\bitvar{FMBH
})$, respectively.
7749 Assign
\locvar{RPCW
} and
\locvar{RPCH
} the values from the row of
7750 Table~
\ref{tab:rpcwh-for-pf
} corresponding to
\bitvar{PF
}.
7754 \begin{tabular
}{crr
}\toprule
7755 \bitvar{PF
} &
\multicolumn{1}{c
}{\locvar{RPCW
}}
7756 &
\multicolumn{1}{c
}{\locvar{RPCH
}} \\
\midrule
7757 $
0$ & $
8*
\bitvar{FMBW
}$ & $
8*
\bitvar{FMBH
}$ \\
7758 $
2$ & $
8*
\bitvar{FMBW
}$ & $
16*
\bitvar{FMBH
}$ \\
7759 $
3$ & $
16*
\bitvar{FMBW
}$ & $
16*
\bitvar{FMBH
}$ \\
7760 \bottomrule\end{tabular
}
7762 \caption{Width and Height of Chroma Planes for each Pixel Format
}
7763 \label{tab:rpcwh-for-pf
}
7767 Using
\bitvar{ACSCALE
},
\bitvar{DCSCALE
},
\bitvar{BMS
},
\bitvar{NQRS
},
7768 \bitvar{QRSIZES
},
\bitvar{QRBMIS
},
\bitvar{NBS
},
\locvar{BCODED
},
7769 \locvar{MBMODES
},
\locvar{MVECTS
},
\locvar{COEFFS
},
\locvar{NCOEFFS
},
7770 \locvar{QIS
},
\locvar{QIIS
},
\locvar{RPYW
},
\locvar{RPYH
},
\locvar{RPCW
},
7771 \locvar{RPCH
},
\bitvar{GOLDREFY
},
\bitvar{GOLDREFCB
},
\bitvar{GOLDREFCR
},
7772 \bitvar{PREVREFY
},
\bitvar{PREVREFCB
}, and
\bitvar{PREVREFCR
}, reconstruct the
7773 complete frame into
\bitvar{RECY
},
\bitvar{RECCB
}, and
\bitvar{RECCR
} using
7774 the procedure given in Section~
\ref{sub:recon
}.
7776 Using
\bitvar{LFLIMS
},
\locvar{RPYW
},
\locvar{RPYH
},
\locvar{RPCW
},
7777 \locvar{RPCH
},
\bitvar{NBS
},
\locvar{BCODED
}, and
\locvar{QIS
}, apply the loop
7778 filter to the reconstructed frame in
\bitvar{RECY
},
\bitvar{RECCB
}, and
7779 \bitvar{RECCR
} using the procedure given in Section~
\ref{sub:loop-filt
}.
7781 If
\locvar{FTYPE
} is zero (intra frame), assign
\bitvar{GOLDREFY
},
7782 \bitvar{GOLDREFCB
}, and
\bitvar{GOLDREFCR
} the values
\bitvar{RECY
},
7783 \bitvar{RECCB
}, and
\bitvar{RECCR
}, respectively.
7785 Assign
\bitvar{PREVREFY
},
\bitvar{PREVREFCB
}, and
\bitvar{PREVREFCR
} the values
7786 \bitvar{RECY
},
\bitvar{RECCB
}, and
\bitvar{RECCR
}, respectively.
7792 \chapter{Ogg Bitstream Encapsulation
}
7793 \label{app:oggencapsulation
}
7797 This
document specifies the embedding or encapsulation of Theora packets
7798 in an Ogg transport stream.
7800 Ogg is a stream oriented wrapper for coded, linear time-based data.
7801 It provides syncronization, multiplexing, framing, error detection and
7802 seeking landmarks for the decoder and complements the raw packet format
7803 used by the Theora codec.
7805 This
document assumes familiarity with the details of the Ogg standard.
7806 The Xiph.org documentation provides an overview of the Ogg transport stream
7807 format at
\url{http://www.xiph.org/ogg/doc/oggstream.html
} and a detailed
7808 description at
\url{http://www.xiph.org/ogg/doc/framing.html
}.
7809 The format is also defined in RFC~
3533 \cite{rfc3533
}.
7810 While Theora packets can be embedded in a wide variety of media
7811 containers and streaming mechanisms, the Xiph.org Foundation
7812 recommends Ogg as the native format for Theora video in file-oriented
7813 storage and transmission contexts.
7815 \subsection{MIME type
}
7817 The generic MIME type of any Ogg file is
{\tt application/ogg
}.
7818 The specific MIME type for the Ogg Theora profile documented here
7819 is
{\tt video/ogg
}. This is the MIME type recommended for files
7820 conforming to this appendix. The recommended filename extension
7823 Outside of an encapsulation, the mime type
{\tt video/theora
} may
7824 be used to refer specifically to the Theora compressed video stream.
7826 \section{Embedding in a logical bitstream
}
7828 Ogg separates the concept of a
{\em logical bitstream
} consisting of the
7829 framing of a particular sequence of packets and complete within itself
7830 from the
{\em physical bitstream
} which may consist either of a single
7831 logical bitstream or a number of logical bitstreams multiplexed
7833 This section specifies the embedding of Theora packets in a logical Ogg
7835 The mapping of Ogg Theora logical bitstreams into a multiplexed physical Ogg
7836 stream is described in the next section.
7838 \subsection{Headers
}
7840 The initial identification header packet appears by itself in a
7842 This page defines the start of the logical stream and MUST have
7843 the `beginning of stream' flag set.
7845 The second and third header packets (comment metadata and decoder
7846 setup data) can together span one or more Ogg pages.
7847 If there are additional non-normative header packets, they MUST be
7848 included in this sequence of pages as well.
7849 The comment header packet MUST begin the second Ogg page in the logical
7850 bitstream, and there MUST be a page break between the last header
7851 packet and the first frame data packet.
7853 These two page break requirements facilitate stream identification and
7854 simplify header acquisition for seeking and live streaming applications.
7856 All header pages MUST have their granule position field set to zero.
7858 \subsection{Frame data
}
7860 The first frame data packet in a logical bitstream MUST begin a new Ogg
7862 All other data packets are placed one at a time into Ogg pages
7863 until the end of the stream.
7864 Packets can span pages and multiple packets can be placed within any
7866 The last page in the logical bitstream SHOULD have its
7867 'end of stream' flag set to indicate complete transmission
7868 of the available video.
7870 Frame data pages MUST be marked with a granule position corresponding to
7871 the end of the display interval of the last frame/packet that finishes
7872 in that page. See the next section for details.
7874 \subsection{Granule position
}
7876 Data packets are marked by a granulepos derived from the count of decodable
7877 frames after that packet is processed. The field itself is divided into two
7878 sections, the width of the less significant section being given by the KFGSHIFT
7879 parameter decoded from the identification header
7880 (Section~
\ref{sec:idheader
}).
7881 The more significant portion of the field gives the count of coded
7882 frames after the coding of the last keyframe in stream, and the less
7883 significant portion gives the count of frames since the last keyframe.
7884 Thus a stream would begin with a split granulepos of $
1|
0$ (a keyframe),
7885 followed by $
1|
1$, $
1|
2$, $
1|
3$, etc. Around a keyframe in the
7886 middle of the stream the granulepos sequence might be $
1234|
35$,
7887 $
1234|
36$, $
1234|
37$, $
1271|
0$ (for the keyframe), $
1271|
1$, and so
7888 on. In this way the granulepos field increased monotonically as required
7889 by the Ogg format, but contains information necessary to efficiently
7890 find the previous keyframe to continue decoding after a seek.
7892 Prior to bitstream version
3.2.1, data packets were marked by a
7893 granulepos derived from the index of the frame being decoded,
7894 rather than the count. That is they marked the beginning of the
7895 display interval of a frame rather than the end. Such streams
7896 have the VREV field of the identification header set to `
0'
7897 instead of `
1'. They can be interpreted according to the description
7898 above by adding
1 to the more signification field of the split
7899 granulepos when VREV is less than
1.
7901 \section{Multiplexed stream mapping
}
7903 Applications supporting Ogg Theora must support Theora bitstreams
7904 multiplexed with compressed audio data in the Vorbis I and Speex
7905 formats, and should support Ogg-encapsulated MNG graphics for overlays.
7907 Multiple audio and video bitstreams may be multiplexed together.
7908 How playback of multiple/alternate streams is handled is up to the
7910 Some conventions based on included metadata aide interoperability
7912 %TODO: describe multiple vs. alternate streams, language mapping
7913 % and reference metadata descriptions.
7915 \subsection{Chained streams
}
7917 Ogg Theora decoders and playback applications MUST support both grouped
7918 streams (multiplexed concurrent logical streams) and chained streams
7919 (sequential concatenation of independent physical bitstreams).
7921 The number and codec data types of multiplexed streams and the decoder
7922 parameters for those stream types that re-occur can all change at a
7924 A playback application MUST be prepared to handle such changes and
7925 SHOULD do so smoothly with the minimum possible visible disruption.
7926 The specification of grouped streams below applies independently to each
7927 segment of a chained bitstream.
7929 \subsection{Grouped streams
}
7931 At the beginning of a multiplexed stream, the `beginning of stream'
7932 pages for each logical bitstream will be grouped together.
7933 Within these, the first page to occur MUST be the Theora page.
7934 This facilitates identification of Ogg Theora files among other
7935 Ogg-encapsulated content.
7936 A playback application must nevertheless handle streams where this
7937 arrangement is not correct.
7938 %TBT: Then what's the point of requiring it in the spec?
7940 If there is more than one Theora logical stream, the first page should
7941 be from the primary stream.
7942 That is, the best choice for the stream a generic player should begin
7943 displaying without special user direction.
7944 If there is more than one audio stream, or of any other stream
7945 type, the identification page of the primary stream of that type
7946 should be placed before the others.
7947 %TBT: That's all pretty vague.
7949 After the `beginning of stream' pages, the header pages of each of
7950 the logical streams MUST be grouped together before any data pages
7953 After all the header pages have been placed,
7954 the data pages are multiplexed together.
7955 They should be placed in the stream in increasing order by the
7956 time equivalents of their granule position fields.
7957 This facilitates seeking while limiting the buffering requirements of the
7958 playback demultiplexer.
7959 %TODO: A lot of this language is encoder-oriented.
7960 %TODO: We define a decoder-oriented specification.
7961 %TODO: The language should be changed to match.
7966 \section{VP3 Compatibility
}
7967 \label{app:vp3-compat
}
7968 This section lists all of the encoder and decoder issues that may affect VP3
7970 Each is described in more detail in the text itself.
7971 This list is provided merely for reference.
7975 Bitstream headers (Section~
\ref{sec:headers
}).
7978 Identification header (Section~
\ref{sec:idheader
}).
7981 Non-multiple of
16 picture sizes.
7983 Standardized
color spaces.
7985 Support for $
4:
4:
4$ and $
4:
2:
2$ pixel formats.
7991 Loop filter limit values (Section~
\ref{sub:loop-filter-limits
}).
7993 Quantization parameters (Section~
\ref{sub:quant-params
}).
7995 Huffman tables (Section~
\ref{sub:huffman-tables
}).
7999 Frame header format (Section~
\ref{sub:frame-header
}).
8001 Extended long-run bit strings (Section~
\ref{sub:long-run
}).
8003 INTER
\_MV\_FOUR handling of uncoded blocks (Section~
\ref{sub:mb-mv-decode
}).
8005 Block-level
\qi\ values (Section~
\ref{sub:block-qis
}).
8007 Zero-length EOB runs (Section~
\ref{sub:eob-token
}).
8009 Unrestricted motion vector padding and the loop filter
8010 (Section~
\ref{sub:loop-filt
}).
8013 \section{Loop Filter Limit Values
}
8014 \label{app:vp3-loop-filter-limits
}
8016 The hard-coded loop filter limit values used in VP3 are defined as follows:
8018 \bitvar{LFLIMS
} = &
\begin{array
}[t
]{r@
{}rrrrrrrr@
{}l
}
8019 \
{ &
30, &
25, &
20, &
20, &
15, &
15, &
14, &
14, & \\
8020 &
13, &
13, &
12, &
12, &
11, &
11, &
10, &
10, & \\
8021 &
9, &
9, &
8, &
8, &
7, &
7, &
7, &
7, & \\
8022 &
6, &
6, &
6, &
6, &
5, &
5, &
5, &
5, & \\
8023 &
4, &
4, &
4, &
4, &
3, &
3, &
3, &
3, & \\
8024 &
2, &
2, &
2, &
2, &
2, &
2, &
2, &
2, & \\
8025 &
0, &
0, &
0, &
0, &
0, &
0, &
0, &
0, & \\
8026 &
0, &
0, &
0, &
0, &
0, &
0, &
0, &
0\;\ & \!\
} \\
8030 \section{Quantization Parameters
}
8031 \label{app:vp3-quant-params
}
8033 The hard-coded quantization parameters used by VP3 are defined as follows:
8036 \bitvar{ACSCALE
} = &
\begin{array
}[t
]{r@
{}rrrrrrrr@
{}l
}
8037 \
{ &
500, &
450, &
400, &
370, &
340, &
310, &
285, &
265, & \\
8038 &
245, &
225, &
210, &
195, &
185, &
180, &
170, &
160, & \\
8039 &
150, &
145, &
135, &
130, &
125, &
115, &
110, &
107, & \\
8040 &
100, &
96, &
93, &
89, &
85, &
82, &
75, &
74, & \\
8041 &
70, &
68, &
64, &
60, &
57, &
56, &
52, &
50, & \\
8042 &
49, &
45, &
44, &
43, &
40, &
38, &
37, &
35, & \\
8043 &
33, &
32, &
30, &
29, &
28, &
25, &
24, &
22, & \\
8044 &
21, &
19, &
18, &
17, &
15, &
13, &
12, &
10\;\ & \!\
} \\
8046 \bitvar{DCSCALE
} = &
\begin{array
}[t
]{r@
{}rrrrrrrr@
{}l
}
8047 \
{ &
220, &
200, &
190, &
180, &
170, &
170, &
160, &
160, & \\
8048 &
150, &
150, &
140, &
140, &
130, &
130, &
120, &
120, & \\
8049 &
110, &
110, &
100, &
100, &
90, &
90, &
90, &
80, & \\
8050 &
80, &
80, &
70, &
70, &
70, &
60, &
60, &
60, & \\
8051 &
60, &
50, &
50, &
50, &
50, &
40, &
40, &
40, & \\
8052 &
40, &
40, &
30, &
30, &
30, &
30, &
30, &
30, & \\
8053 &
30, &
20, &
20, &
20, &
20, &
20, &
20, &
20, & \\
8054 &
20, &
10, &
10, &
10, &
10, &
10, &
10, &
10\;\ & \!\
} \\
8058 VP3 defines only a single quantization range for each quantization type and
8059 color plane, and the base matrix used is constant throughout the range.
8060 There are three base matrices defined.
8061 The first is used for the $Y'$ channel of INTRA mode blocks, and the second for
8062 both the $C_b$ and $C_r$ channels of INTRA mode blocks.
8063 The last is used for INTER mode blocks of all channels.
8066 \bitvar{BMS
} = \
{ &
\begin{array
}[t
]{r@
{}rrrrrrrr@
{}l
}
8067 \
{ &
16, &
11, &
10, &
16, &
24, &
40, &
51, &
61, & \\
8068 &
12, &
12, &
14, &
19, &
26, &
58, &
60, &
55, & \\
8069 &
14, &
13, &
16, &
24, &
40, &
57, &
69, &
56, & \\
8070 &
14, &
17, &
22, &
29, &
51, &
87, &
80, &
62, & \\
8071 &
18, &
22, &
37, &
58, &
68, &
109, &
103, &
77, & \\
8072 &
24, &
35, &
55, &
64, &
81, &
104, &
113, &
92, & \\
8073 &
49, &
64, &
78, &
87, &
103, &
121, &
120, &
101, & \\
8074 &
72, &
92, &
95, &
98, &
112, &
100, &
103, &
99\;\ & \!\
}, \\
8076 %& \begin{array}[t]{r@{}rrrrrrrr@{}l}
8077 \
{ &
17, &
18, &
24, &
47, &
99, &
99, &
99, &
99, & \\
8078 &
18, &
21, &
26, &
66, &
99, &
99, &
99, &
99, & \\
8079 &
24, &
26, &
56, &
99, &
99, &
99, &
99, &
99, & \\
8080 &
47, &
66, &
99, &
99, &
99, &
99, &
99, &
99, & \\
8081 &
99, &
99, &
99, &
99, &
99, &
99, &
99, &
99, & \\
8082 &
99, &
99, &
99, &
99, &
99, &
99, &
99, &
99, & \\
8083 &
99, &
99, &
99, &
99, &
99, &
99, &
99, &
99, & \\
8084 &
99, &
99, &
99, &
99, &
99, &
99, &
99, &
99\;\ & \!\
}, \\
8086 %& \begin{array}[t]{r@{}rrrrrrrr@{}l}
8087 \
{ &
16, &
16, &
16, &
20, &
24, &
28, &
32, &
40, & \\
8088 &
16, &
16, &
20, &
24, &
28, &
32, &
40, &
48, & \\
8089 &
16, &
20, &
24, &
28, &
32, &
40, &
48, &
64, & \\
8090 &
20, &
24, &
28, &
32, &
40, &
48, &
64, &
64, & \\
8091 &
24, &
28, &
32, &
40, &
48, &
64, &
64, &
64, & \\
8092 &
28, &
32, &
40, &
48, &
64, &
64, &
64, &
96, & \\
8093 &
32, &
40, &
48, &
64, &
64, &
64, &
96, &
128, & \\
8094 &
40, &
48, &
64, &
64, &
64, &
96, &
128, &
128\;\ & \!\
}\;\;\
} \\
8098 The remaining parameters simply assign these matrices to the proper quant
8102 \bitvar{NQRS
} = & \
{ \
{1,
1,
1\
}, \
{1,
1,
1\
} \
} \\
8103 \bitvar{QRSIZES
} = &
8104 \
{ \
{ \
{1\
}, \
{1\
}, \
{1\
} \
}, \
{ \
{1\
}, \
{1\
}, \
{1\
} \
} \
} \\
8106 \
{ \
{ \
{0,
0\
}, \
{1,
1\
}, \
{1,
1\
} \
}, \
{ \
{2,
2\
}, \
{2,
2\
}, \
{2,
2\
} \
} \
} \\
8109 \section{Huffman Tables
}
8110 \label{app:vp3-huffman-tables
}
8112 The following tables contain the hard-coded Huffman codes used by VP3.
8113 There are
80 tables in all, each with a Huffman code for all
32 token values.
8114 The tokens are sorted by the most significant bits of their Huffman code.
8115 This is the same order in which they will be decoded from the setup header.
8122 Ogg is a
\href{http://www.xiph.org
}{Xiph.org Foundation
} effort to protect
8123 essential tenets of Internet multimedia from corporate hostage-taking; Open
8124 Source is the net's greatest tool to keep everyone honest.
8125 See
\href{http://www.xiph.org/about.html
}{About the Xiph.org Foundation
} for
8128 Ogg Theora is the first Ogg video codec.
8129 Anyone may freely use and distribute the Ogg and Theora specifications, whether
8130 in private, public, or corporate capacity.
8131 However, the Xiph.org Foundation and the Ogg project reserve the right to set
8132 the Ogg Theora specification and certify specification compliance.
8134 Xiph.org's Theora software codec implementation is distributed under a BSD-like
8136 This does not restrict third parties from distributing independent
8137 implementations of Theora software under other licenses.
8139 \begin{wrapfigure
}{l
}{0pt
}
8140 \includegraphics[width=
2.5cm
]{xifish
}
8143 These pages are Copyright
\textcopyright{} 2004-
2007 Xiph.org Foundation.
8144 All rights reserved.
8145 Ogg, Theora, Vorbis, Xiph.org Foundation and their logos are trademarks
8146 (
\texttrademark) of the
\href{http://www.xiph.org
}{Xiph.org Foundation
}.
8148 This
document is set in
\LaTeX.