SSSE3 CDEF
Merge request reports
Activity
added performance label
mentioned in issue #216
A very impressive performance increase!
decoder dav1d dav1d Build 222bf249 1d4119a0 Build date 2019-02-15 2019-02-18 ISA SSSE3 SSSE3 Morocco ST 15,89 19,48 122,6% Morocco MT 66,21 97,48 147,2% Dua Lipa ST 8,45 12,97 153,5% Dua Lipa MT 37,66 59,36 157,6% Edited by Ewout ter HoevenIntel Xeon w3680 Westmere 6x3.33 GHz (12 logical cores)
Chimera 8-bit 1080pstock f26bf7fa
frame threads time 1 13m27.962s 6 4m8.488s patched 1d4119a0
frame threads time diff 1 9m35.266s 40% faster 6 2m49.121s 47% faster Edited by Bradley SeposIntel Xeon E5-2699 v4 Broadwell 22x2.2 GHz (44 logical cores)
Liquid cooled with sustained turbo 2.8 GHz
Chimera 8-bit 1080pstock f26bf7fa (cross GCC 8.2 i686)
frame threads time 1 15m55.161s 11 4m53.511s 22 4m38.825s patched 1d4119a0 (cross GCC 8.2 i686)
frame threads time diff 1 crash 11 crash 22 crash It seems when cross-compiled using mingw-w64 and GCC 8.2, dav1d.exe crashes immediately (only with this patchset).
Edited by Bradley Seposadded 6 commits
-
1d4119a0...f26bf7fa - 2 commits from branch
videolan:master
- 87333927 - cdef: remove redundant AVX2 code
- 99e4ad64 - not so relevant avx2 changes
- b97f4fc0 - cdef: add SSSE3 implementation
- 3686c068 - dbg
Toggle commit list-
1d4119a0...f26bf7fa - 2 commits from branch
added 10 commits
Toggle commit listmentioned in issue #15 (closed)
added 2 commits
- Resolved by Jean-Baptiste Kempf
added 2 commits
- Resolved by Victorien Le Couviour--Tuffet
added 2 commits
@psilokos I see a few AVX2 commits, do you expect significant changes in AVX2 performance?
Edited by Ewout ter Hoevenadded 2 commits
@EwoutH not for what was already there, but I added the YUV 422 case, so you can make a bench of this one yes :)
@psilokos Then I will go on a great quest for 4:2:2 samples :)
@EwoutH you already have one here https://drive.google.com/open?id=18Sg7Kk37mOmYUlk6ycbu4_GCuC7sLvGt
assigned to @gramner
changed milestone to %0.2.0
Intel Xeon w3680 Westmere 6x3.33 GHz (12 logical cores)
Chimera 8-bit 1080pstock 1ba8423a
frame threads tile threads time 1 1 13m21.751s 6 1 4m7.455s 6 2 3m1.221s patched e88293fb
frame threads tile threads time diff 1 1 9m32.778s 40% faster 6 1 2m48.903s 46% faster 6 2 1m47.225s 69% faster
Intel Xeon E5-2699 v4 Broadwell 22x2.2 GHz (44 logical cores)
Liquid cooled with sustained turbo 2.8 GHz
Chimera 8-bit 1080pstock 1ba8423a (cross GCC 8.2 i686)
frame threads tile threads time 1 1 16m22.632s 11 1 5m5.261s 22 1 4m46.58s 11 2 3m59.2s 22 2 3m44.466s patched e88293fb (cross GCC 8.2 i686)
frame threads tile threads time diff 1 1 10m9.794s 61% faster 11 1 2m39.771s 91% faster 22 1 2m29.10s 92% faster 11 2 1m28.825s 169% faster 22 2 1m23.607s 168% faster Edited by Bradley Sepos- Resolved by Jean-Baptiste Kempf
- Resolved by Victorien Le Couviour--Tuffet
- Resolved by Victorien Le Couviour--Tuffet
mentioned in issue #78
added 17 commits
-
e88293fb...9cb94d29 - 12 commits from branch
videolan:master
- 6e774b23 - checkasm: widen cdef filter strengh max value
- 18ea9ba0 - cdef: improve AVX2 cdef_filter macro consistency
- 6b62d725 - cdef: remove redundant AVX2 code
- 018b8db0 - cdef: add AVX2 4x8 filter
- 1e2fc370 - cdef: add SSSE3 implementation
Toggle commit list-
e88293fb...9cb94d29 - 12 commits from branch
added 6 commits
- 3b19f5b6 - checkasm: decrease cdef filter min damping value
- ee3c5aa5 - cdef: improve AVX2 cdef_filter macro consistency
- f9f50601 - cdef: remove redundant AVX2 code
- 50310d80 - cdef: optimize 4 by X filters for HAVE_RIGHT=0
- 040598ba - cdef: add AVX2 4x8 filter
- 2f3ce00c - cdef: add SSSE3 implementation
Toggle commit list- Resolved by Victorien Le Couviour--Tuffet
- Resolved by Victorien Le Couviour--Tuffet