NEWS 4.91 KB
Newer Older
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
1
2
3
4
5
6
7
8
Changes for 0.6.0 'Gyrfalcon':
------------------------------

0.6.0 is a major release for dav1d:
 - New ARM64 optimizations for the 10/12bit depth:
    - mc_avg, mc_w_avg, mc_mask
    - mc_put/mc_prep 8tap/bilin
    - mc_warp_8x8
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
9
10
    - mc_w_mask
    - mc_blend
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
11
    - wiener
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
12
13
    - SGR
    - loopfilter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
14
    - cdef
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
15
 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
16
17
18
 - New SSSE3 optimizations for film grain
 - New AVX2 optimizations for msac_adapt16
 - Fix rare mismatches against the reference decoder, notably because of clipping
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
19
20
 - Improvements on ARM64 on msac, cdef and looprestoration optimizations
 - Improvements on AVX2 optimizations for cdef_filter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
21
22
23
 - Improvements in the C version for itxfm, cdef_filter


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
24
25
26
27
28
29
30
31
32
33
Changes for 0.5.2 'Asiatic Cheetah':
------------------------------------

0.5.2 is a small release improving speed for ARM32 and adding minor features:
 - ARM32 optimizations for loopfilter, ipred_dc|h|v
 - Add section-5 raw OBU demuxer
 - Improve the speed by reducing the L2 cache collisions
 - Fix minor issues


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
34
35
36
37
38
39
40
41
42
Changes for 0.5.1 'Asiatic Cheetah':
------------------------------------

0.5.1 is a small release improving speeds and fixing minor issues
compared to 0.5.0:
 - SSE2 optimizations for CDEF, wiener and warp_affine
 - NEON optimizations for SGR on ARM32
 - Fix mismatch issue in x86 asm in inverse identity transforms
 - Fix build issue in ARM64 assembly if debug info was enabled
43
 - Add a workaround for Xcode 11 -fstack-check bug
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
44
45


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
46
Changes for 0.5.0 'Asiatic Cheetah':
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
47
------------------------------------
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
48
49
50
51
52
53

0.5.0 is a medium release fixing regressions and minor issues,
and improving speed significantly:
 - Export ITU T.35 metadata
 - Speed improvements on blend_ on ARM
 - Speed improvements on decode_coef and MSAC
54
 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
55
56
57
 - NEON optimizations for CDEF and warp on ARM32
 - SSE2 optimizations for MSAC hi_tok decoding
 - SSSE3 optimizations for deblocking loopfilters and warp_affine
58
 - AVX2 optimizations for film grain and ipred_z2
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
59
 - SSE4 optimizations for warp_affine
60
 - VSX optimizations for wiener
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
61
62
63
64
 - Fix inverse transform overflows in x86 and NEON asm
 - Fix integer overflows with large frames
 - Improve film grain generation to match reference code
 - Improve compatibility with older binutils for ARM
65
 - More advanced Player example in tools
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
66
67


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
68
69
70
71
72
73
74
75
Changes for 0.4.0 'Cheetah':
----------------------------

 - Fix playback with unknown OBUs
 - Add an option to limit the maximum frame size
 - SSE2 and ARM64 optimizations for MSAC
 - Improve speed on 32bits systems
 - Optimization in obmc blend
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
76
 - Reduce RAM usage significantly
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
77
 - The initial PPC SIMD code, cdef_filter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
78
79
80
 - NEON optimizations for blend functions on ARM
 - NEON optimizations for w_mask functions on ARM
 - NEON optimizations for inverse transforms on ARM64
81
 - VSX optimizations for CDEF filter
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
82
83
 - Improve handling of malloc failures
 - Simple Player example in tools
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
84
85


86
87
88
89
90
91
92
93
94
Changes for 0.3.1 'Sailfish':
------------------------------

 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
 - Reduce binary size, notably on Windows
 - SSSE3 optimizations for ipred_filter
 - ARM optimizations for MSAC


95
96
97
98
99
100
101
102
103
104
Changes for 0.3.0 'Sailfish':
------------------------------

This is the final release for the numerous speed improvements of 0.3.0-rc.
It mostly:
 - Fixes an annoying crash on SSSE3 that happened in the itx functions


Changes for 0.2.2 (0.3.0-rc) 'Antelope':
-----------------------------
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
105

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
106
 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
107
   The impact is important on SSSE3, SSE4 and AVX2 cpus
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
108
 - SSSE3 optimizations for all blocks size in itx
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
109
 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
110
111
112
113
 - Speed improvements on CDEF for SSE4 CPUs
 - NEON optimizations for SGR and loop filter
 - Minor crashes, improvements and build changes

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
114

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
115
116
117
Changes for 0.2.1 'Antelope':
----------------------------

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
118
 - SSSE3 optimization for cdef_dir
119
 - AVX2 improvements of the existing CDEF optimizations
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
120
121
122
 - NEON improvements of the existing CDEF and wiener optimizations
 - Clarification about the numbering/versionning scheme

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
123

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
124
Changes for 0.2.0 'Antelope':
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
125
126
----------------------------

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
127
128
 - ARM64 and ARM optimizations using NEON instructions
 - SSSE3 optimizations for both 32 and 64bits
129
 - More AVX2 assembly, reaching almost completion
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
130
131
 - Fix installation of includes
 - Rewrite inverse transforms to avoid overflows
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
132
133
134
 - Snap packaging for Linux
 - Updated API (ABI and API break)
 - Fixes for un-decodable samples
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
135
136


Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
137
138
Changes for 0.1.0 'Gazelle':
----------------------------
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
139

Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
140
141
142
Initial release of dav1d, the fast and small AV1 decoder.
 - Support for all features of the AV1 bitstream
 - Support for all bitdepth, 8, 10 and 12bits
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
143
 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
144
 - Full acceleration for AVX2 64bits processors, making it the fastest decoder
Jean-Baptiste Kempf's avatar
Jean-Baptiste Kempf committed
145
146
 - Partial acceleration for SSSE3 processors
 - Partial acceleration for NEON processors