NEWS 7.93 KB
Newer Older
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
1 2 3 4
Changes for 0.8.2 'Eurasian hobby':
-----------------------------------

0.8.2 is a middle-size update of the 0.8.0 branch:
5 6
 - ARM32 optimizations for ipred and itx in 10/12bits,
   completing the 10b/12b work on ARM64 and ARM32
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
7 8 9
 - Give the post-filters their own threads
 - ARM64: rewrite the wiener functions
 - Speed up coefficient decoding, 0.5%-3% global decoding gain
10
 - x86 optimizations for CDEF_filter and wiener in 10/12bit
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
11 12 13 14 15 16 17 18 19 20 21 22
 - x86: rewrite the SGR AVX2 asm
 - x86: improve msac speed on SSE2+ machines
 - ARM32: improve speed of ipred and warp
 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
 - ARM32/64: improve speed of looprestoration
 - Add seeking, pausing to the player
 - Update the player for rendering of 10b/12b
 - Misc speed improvements and fixes on all platforms
 - Add a xxh3 muxer in the dav1d application


Changes for 0.8.1 'Eurasian hobby':
Janne Grunau's avatar
Janne Grunau committed
23 24 25 26 27 28 29 30 31 32 33
-----------------------------------

0.8.1 is a minor update on 0.8.0:
 - Keep references to buffers valid after dav1d_close(). Fixes a regression
   caused by the picture buffer pool added in 0.8.0.
 - ARM32 optimizations for 10bit bitdepth for SGR
 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
 - ARM64 optimizations for 10bit bitdepth for SGR
 - x86 optimizations for wiener in SSE2/SSSE3/AVX2


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
34
Changes for 0.8.0 'Eurasian hobby':
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
35 36 37 38 39 40 41 42 43 44 45 46 47 48
-----------------------------------

0.8.0 is a major update for dav1d:
 - Improve the performance by using a picture buffer pool;
   The improvements can reach 10% on some cases on Windows.
 - Support for Apple ARM Silicon
 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
   put/prep 8tap/bilin, wiener and CDEF filters
 - ARM64 optimizations for cfl_ac 444 for all bitdepths
 - x86 optimizations for MC 8-tap, mc_scaled in AVX2
 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
49 50 51 52
Changes for 0.7.1 'Frigatebird':
------------------------------

0.7.1 is a minor update on 0.7.0:
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
53
 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
54 55 56
 - SSE2 optimizations for prep_bilin and prep_8tap
 - AVX2 optimizations for MC scaled
 - Fix a clamping issue in motion vector projection
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
57
 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
58 59 60
 - Improvements on the dav1dplay utility player to support resizing


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
61 62 63 64 65 66 67 68 69 70 71 72 73 74
Changes for 0.7.0 'Frigatebird':
------------------------------

0.7.0 is a major release for dav1d:
 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
 - 10b/12b ARM64 optimizations are mostly complete:
   - ipred (paeth, smooth, dc, pal, filter, cfl)
   - itxfm (only 10b)
 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
 - AVX2 for cfl4:4:4
 - AVX-512 CDEF filter
 - ARM64 8b improvements for cfl_ac and itxfm
 - ARM64 implementation for emu_edge in 8b/10b/12b
 - ARM32 implementation for emu_edge in 8b
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
75 76
 - Improvements on the dav1dplay utility player to support 10 bit,
   non-4:2:0 pixel formats and film grain on the GPU
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
77 78


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
79 80 81 82 83 84 85 86
Changes for 0.6.0 'Gyrfalcon':
------------------------------

0.6.0 is a major release for dav1d:
 - New ARM64 optimizations for the 10/12bit depth:
    - mc_avg, mc_w_avg, mc_mask
    - mc_put/mc_prep 8tap/bilin
    - mc_warp_8x8
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
87 88
    - mc_w_mask
    - mc_blend
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
89
    - wiener
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
90 91
    - SGR
    - loopfilter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
92
    - cdef
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
93
 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
94 95 96
 - New SSSE3 optimizations for film grain
 - New AVX2 optimizations for msac_adapt16
 - Fix rare mismatches against the reference decoder, notably because of clipping
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
97 98
 - Improvements on ARM64 on msac, cdef and looprestoration optimizations
 - Improvements on AVX2 optimizations for cdef_filter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
99 100 101
 - Improvements in the C version for itxfm, cdef_filter


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
102 103 104 105 106 107 108 109 110 111
Changes for 0.5.2 'Asiatic Cheetah':
------------------------------------

0.5.2 is a small release improving speed for ARM32 and adding minor features:
 - ARM32 optimizations for loopfilter, ipred_dc|h|v
 - Add section-5 raw OBU demuxer
 - Improve the speed by reducing the L2 cache collisions
 - Fix minor issues


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
112 113 114 115 116 117 118 119 120
Changes for 0.5.1 'Asiatic Cheetah':
------------------------------------

0.5.1 is a small release improving speeds and fixing minor issues
compared to 0.5.0:
 - SSE2 optimizations for CDEF, wiener and warp_affine
 - NEON optimizations for SGR on ARM32
 - Fix mismatch issue in x86 asm in inverse identity transforms
 - Fix build issue in ARM64 assembly if debug info was enabled
121
 - Add a workaround for Xcode 11 -fstack-check bug
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
122 123


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
124
Changes for 0.5.0 'Asiatic Cheetah':
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
125
------------------------------------
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
126 127 128 129 130 131

0.5.0 is a medium release fixing regressions and minor issues,
and improving speed significantly:
 - Export ITU T.35 metadata
 - Speed improvements on blend_ on ARM
 - Speed improvements on decode_coef and MSAC
132
 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
133 134 135
 - NEON optimizations for CDEF and warp on ARM32
 - SSE2 optimizations for MSAC hi_tok decoding
 - SSSE3 optimizations for deblocking loopfilters and warp_affine
136
 - AVX2 optimizations for film grain and ipred_z2
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
137
 - SSE4 optimizations for warp_affine
138
 - VSX optimizations for wiener
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
139 140 141 142
 - Fix inverse transform overflows in x86 and NEON asm
 - Fix integer overflows with large frames
 - Improve film grain generation to match reference code
 - Improve compatibility with older binutils for ARM
143
 - More advanced Player example in tools
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
144 145


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
146 147 148 149 150 151 152 153
Changes for 0.4.0 'Cheetah':
----------------------------

 - Fix playback with unknown OBUs
 - Add an option to limit the maximum frame size
 - SSE2 and ARM64 optimizations for MSAC
 - Improve speed on 32bits systems
 - Optimization in obmc blend
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
154
 - Reduce RAM usage significantly
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
155
 - The initial PPC SIMD code, cdef_filter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
156 157 158
 - NEON optimizations for blend functions on ARM
 - NEON optimizations for w_mask functions on ARM
 - NEON optimizations for inverse transforms on ARM64
159
 - VSX optimizations for CDEF filter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
160 161
 - Improve handling of malloc failures
 - Simple Player example in tools
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
162 163


164 165 166 167 168 169 170 171 172
Changes for 0.3.1 'Sailfish':
------------------------------

 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
 - Reduce binary size, notably on Windows
 - SSSE3 optimizations for ipred_filter
 - ARM optimizations for MSAC


173 174 175 176 177 178 179 180 181 182
Changes for 0.3.0 'Sailfish':
------------------------------

This is the final release for the numerous speed improvements of 0.3.0-rc.
It mostly:
 - Fixes an annoying crash on SSSE3 that happened in the itx functions


Changes for 0.2.2 (0.3.0-rc) 'Antelope':
-----------------------------
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
183

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
184
 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
185
   The impact is important on SSSE3, SSE4 and AVX2 cpus
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
186
 - SSSE3 optimizations for all blocks size in itx
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
187
 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
188 189 190 191
 - Speed improvements on CDEF for SSE4 CPUs
 - NEON optimizations for SGR and loop filter
 - Minor crashes, improvements and build changes

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
192

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
193 194 195
Changes for 0.2.1 'Antelope':
----------------------------

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
196
 - SSSE3 optimization for cdef_dir
197
 - AVX2 improvements of the existing CDEF optimizations
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
198 199 200
 - NEON improvements of the existing CDEF and wiener optimizations
 - Clarification about the numbering/versionning scheme

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
201

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
202
Changes for 0.2.0 'Antelope':
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
203 204
----------------------------

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
205 206
 - ARM64 and ARM optimizations using NEON instructions
 - SSSE3 optimizations for both 32 and 64bits
207
 - More AVX2 assembly, reaching almost completion
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
208 209
 - Fix installation of includes
 - Rewrite inverse transforms to avoid overflows
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
210 211 212
 - Snap packaging for Linux
 - Updated API (ABI and API break)
 - Fixes for un-decodable samples
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
213 214


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
215 216
Changes for 0.1.0 'Gazelle':
----------------------------
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
217

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
218 219 220
Initial release of dav1d, the fast and small AV1 decoder.
 - Support for all features of the AV1 bitstream
 - Support for all bitdepth, 8, 10 and 12bits
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
221
 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
222
 - Full acceleration for AVX2 64bits processors, making it the fastest decoder
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
223 224
 - Partial acceleration for SSSE3 processors
 - Partial acceleration for NEON processors