third_party/dav1d/NEWS

   1 Changes for 1.0.0 'Peregrine falcon':
   2 -------------------------------------
   3
   4 1.0.0 is a major release of dav1d, adding important features and bug fixes.
   5
   6 It notably changes, in an important way, the way threading works, by adding
   7 an automatic thread management.
   8
   9 It also adds support for AVX-512 acceleration, and adds speedups to existing x86
  10 code (from SSE2 to AVX2).
  11
  12 1.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call
  13 to get information of which frame failed to decode, in error cases.
  14
  15 Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning
  16 of the project to have a proper release.
  17
  18                                      .''.
  19          .''.      .        *''*    :_\/_:     .
  20         :_\/_:   _\(/_  .:.*_\/_*   : /\ :  .'.:.'.
  21     .''.: /\ :   ./)\   ':'* /\ * :  '..'.  -=:o:=-
  22    :_\/_:'.:::.    ' *''*    * '.\'/.' _\(/_'.':'.'
  23    : /\ : :::::     *_\/_*     -= o =-  /)\    '  *
  24     '..'  ':::'     * /\ *     .'/.\'.   '
  25         *            *..*         :
  26           *                       :
  27           *         1.0.0
  28
  29
  30
  31 Changes for 0.9.2 'Golden Eagle':
  32 ---------------------------------
  33
  34 0.9.2 is a small update of dav1d on the 0.9.x branch:
  35  - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes
  36  - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b
  37  - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b
  38  - ARM NEON optimizations for FilmGrain Gen_grain functions
  39  - Optimizations for splat_mv in SSE2/AVX2 and NEON
  40  - x86: SGR improvements for SSSE3 CPUs
  41  - x86: AVX2 optimizations for cfl_ac
  42
  43
  44 Changes for 0.9.1 'Golden Eagle':
  45 ---------------------------------
  46
  47 0.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3:
  48  - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge),
  49    prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener,
  50    sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors
  51  - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32
  52  - Fixes for filmgrain on ARM
  53  - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4
  54  - Misc improvements on SSE2, SSE4
  55
  56
  57 Changes for 0.9.0 'Golden Eagle':
  58 ---------------------------------
  59
  60 0.9.0 is a major version of dav1d, adding notably 10b acceleration on x64.
  61
  62 Details:
  63  - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide
  64    a large boost for high-bitdepth decoding on modern x86 computers and servers.
  65  - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit)
  66  - New API to signal events happening during the decoding process
  67
  68
  69 Changes for 0.8.2 'Eurasian hobby':
  70 -----------------------------------
  71
  72 0.8.2 is a middle-size update of the 0.8.0 branch:
  73  - ARM32 optimizations for ipred and itx in 10/12bits,
  74    completing the 10b/12b work on ARM64 and ARM32
  75  - Give the post-filters their own threads
  76  - ARM64: rewrite the wiener functions
  77  - Speed up coefficient decoding, 0.5%-3% global decoding gain
  78  - x86 optimizations for CDEF_filter and wiener in 10/12bit
  79  - x86: rewrite the SGR AVX2 asm
  80  - x86: improve msac speed on SSE2+ machines
  81  - ARM32: improve speed of ipred and warp
  82  - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
  83  - ARM32/64: improve speed of looprestoration
  84  - Add seeking, pausing to the player
  85  - Update the player for rendering of 10b/12b
  86  - Misc speed improvements and fixes on all platforms
  87  - Add a xxh3 muxer in the dav1d application
  88
  89
  90 Changes for 0.8.1 'Eurasian hobby':
  91 -----------------------------------
  92
  93 0.8.1 is a minor update on 0.8.0:
  94  - Keep references to buffers valid after dav1d_close(). Fixes a regression
  95    caused by the picture buffer pool added in 0.8.0.
  96  - ARM32 optimizations for 10bit bitdepth for SGR
  97  - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
  98  - ARM64 optimizations for 10bit bitdepth for SGR
  99  - x86 optimizations for wiener in SSE2/SSSE3/AVX2
 100
 101
 102 Changes for 0.8.0 'Eurasian hobby':
 103 -----------------------------------
 104
 105 0.8.0 is a major update for dav1d:
 106  - Improve the performance by using a picture buffer pool;
 107    The improvements can reach 10% on some cases on Windows.
 108  - Support for Apple ARM Silicon
 109  - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
 110  - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
 111    put/prep 8tap/bilin, wiener and CDEF filters
 112  - ARM64 optimizations for cfl_ac 444 for all bitdepths
 113  - x86 optimizations for MC 8-tap, mc_scaled in AVX2
 114  - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
 115
 116
 117 Changes for 0.7.1 'Frigatebird':
 118 ------------------------------
 119
 120 0.7.1 is a minor update on 0.7.0:
 121  - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
 122  - SSE2 optimizations for prep_bilin and prep_8tap
 123  - AVX2 optimizations for MC scaled
 124  - Fix a clamping issue in motion vector projection
 125  - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
 126  - Improvements on the dav1dplay utility player to support resizing
 127
 128
 129 Changes for 0.7.0 'Frigatebird':
 130 ------------------------------
 131
 132 0.7.0 is a major release for dav1d:
 133  - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
 134  - 10b/12b ARM64 optimizations are mostly complete:
 135    - ipred (paeth, smooth, dc, pal, filter, cfl)
 136    - itxfm (only 10b)
 137  - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
 138  - AVX2 for cfl4:4:4
 139  - AVX-512 CDEF filter
 140  - ARM64 8b improvements for cfl_ac and itxfm
 141  - ARM64 implementation for emu_edge in 8b/10b/12b
 142  - ARM32 implementation for emu_edge in 8b
 143  - Improvements on the dav1dplay utility player to support 10 bit,
 144    non-4:2:0 pixel formats and film grain on the GPU
 145
 146
 147 Changes for 0.6.0 'Gyrfalcon':
 148 ------------------------------
 149
 150 0.6.0 is a major release for dav1d:
 151  - New ARM64 optimizations for the 10/12bit depth:
 152     - mc_avg, mc_w_avg, mc_mask
 153     - mc_put/mc_prep 8tap/bilin
 154     - mc_warp_8x8
 155     - mc_w_mask
 156     - mc_blend
 157     - wiener
 158     - SGR
 159     - loopfilter
 160     - cdef
 161  - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
 162  - New SSSE3 optimizations for film grain
 163  - New AVX2 optimizations for msac_adapt16
 164  - Fix rare mismatches against the reference decoder, notably because of clipping
 165  - Improvements on ARM64 on msac, cdef and looprestoration optimizations
 166  - Improvements on AVX2 optimizations for cdef_filter
 167  - Improvements in the C version for itxfm, cdef_filter
 168
 169
 170 Changes for 0.5.2 'Asiatic Cheetah':
 171 ------------------------------------
 172
 173 0.5.2 is a small release improving speed for ARM32 and adding minor features:
 174  - ARM32 optimizations for loopfilter, ipred_dc|h|v
 175  - Add section-5 raw OBU demuxer
 176  - Improve the speed by reducing the L2 cache collisions
 177  - Fix minor issues
 178
 179
 180 Changes for 0.5.1 'Asiatic Cheetah':
 181 ------------------------------------
 182
 183 0.5.1 is a small release improving speeds and fixing minor issues
 184 compared to 0.5.0:
 185  - SSE2 optimizations for CDEF, wiener and warp_affine
 186  - NEON optimizations for SGR on ARM32
 187  - Fix mismatch issue in x86 asm in inverse identity transforms
 188  - Fix build issue in ARM64 assembly if debug info was enabled
 189  - Add a workaround for Xcode 11 -fstack-check bug
 190
 191
 192 Changes for 0.5.0 'Asiatic Cheetah':
 193 ------------------------------------
 194
 195 0.5.0 is a medium release fixing regressions and minor issues,
 196 and improving speed significantly:
 197  - Export ITU T.35 metadata
 198  - Speed improvements on blend_ on ARM
 199  - Speed improvements on decode_coef and MSAC
 200  - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
 201  - NEON optimizations for CDEF and warp on ARM32
 202  - SSE2 optimizations for MSAC hi_tok decoding
 203  - SSSE3 optimizations for deblocking loopfilters and warp_affine
 204  - AVX2 optimizations for film grain and ipred_z2
 205  - SSE4 optimizations for warp_affine
 206  - VSX optimizations for wiener
 207  - Fix inverse transform overflows in x86 and NEON asm
 208  - Fix integer overflows with large frames
 209  - Improve film grain generation to match reference code
 210  - Improve compatibility with older binutils for ARM
 211  - More advanced Player example in tools
 212
 213
 214 Changes for 0.4.0 'Cheetah':
 215 ----------------------------
 216
 217  - Fix playback with unknown OBUs
 218  - Add an option to limit the maximum frame size
 219  - SSE2 and ARM64 optimizations for MSAC
 220  - Improve speed on 32bits systems
 221  - Optimization in obmc blend
 222  - Reduce RAM usage significantly
 223  - The initial PPC SIMD code, cdef_filter
 224  - NEON optimizations for blend functions on ARM
 225  - NEON optimizations for w_mask functions on ARM
 226  - NEON optimizations for inverse transforms on ARM64
 227  - VSX optimizations for CDEF filter
 228  - Improve handling of malloc failures
 229  - Simple Player example in tools
 230
 231
 232 Changes for 0.3.1 'Sailfish':
 233 ------------------------------
 234
 235  - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
 236  - Reduce binary size, notably on Windows
 237  - SSSE3 optimizations for ipred_filter
 238  - ARM optimizations for MSAC
 239
 240
 241 Changes for 0.3.0 'Sailfish':
 242 ------------------------------
 243
 244 This is the final release for the numerous speed improvements of 0.3.0-rc.
 245 It mostly:
 246  - Fixes an annoying crash on SSSE3 that happened in the itx functions
 247
 248
 249 Changes for 0.2.2 (0.3.0-rc) 'Antelope':
 250 -----------------------------
 251
 252  - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
 253    The impact is important on SSSE3, SSE4 and AVX2 cpus
 254  - SSSE3 optimizations for all blocks size in itx
 255  - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
 256  - Speed improvements on CDEF for SSE4 CPUs
 257  - NEON optimizations for SGR and loop filter
 258  - Minor crashes, improvements and build changes
 259
 260
 261 Changes for 0.2.1 'Antelope':
 262 ----------------------------
 263
 264  - SSSE3 optimization for cdef_dir
 265  - AVX2 improvements of the existing CDEF optimizations
 266  - NEON improvements of the existing CDEF and wiener optimizations
 267  - Clarification about the numbering/versionning scheme
 268
 269
 270 Changes for 0.2.0 'Antelope':
 271 ----------------------------
 272
 273  - ARM64 and ARM optimizations using NEON instructions
 274  - SSSE3 optimizations for both 32 and 64bits
 275  - More AVX2 assembly, reaching almost completion
 276  - Fix installation of includes
 277  - Rewrite inverse transforms to avoid overflows
 278  - Snap packaging for Linux
 279  - Updated API (ABI and API break)
 280  - Fixes for un-decodable samples
 281
 282
 283 Changes for 0.1.0 'Gazelle':
 284 ----------------------------
 285
 286 Initial release of dav1d, the fast and small AV1 decoder.
 287  - Support for all features of the AV1 bitstream
 288  - Support for all bitdepth, 8, 10 and 12bits
 289  - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
 290  - Full acceleration for AVX2 64bits processors, making it the fastest decoder
 291  - Partial acceleration for SSSE3 processors
 292  - Partial acceleration for NEON processors