Skip to content

cdef_filter_{4x{4,8},8x8}_avx2 optimizations

Add a seperate fully edged case.

---------------------
fully edged blocks perf
------------------------------------------
before: cdef_filter_4x4_8bpc_avx2: 91.0
 after: cdef_filter_4x4_8bpc_avx2: 75.7
---------------------
before: cdef_filter_4x8_8bpc_avx2: 154.6
 after: cdef_filter_4x8_8bpc_avx2: 131.8
---------------------
before: cdef_filter_8x8_8bpc_avx2: 214.1
 after: cdef_filter_8x8_8bpc_avx2: 195.9
------------------------------------------

See #305.


Add 2 seperate code paths for pri/sec strength equals 0. Having both strengths not equal to 0 is uncommon, branching to skip unnecessary computations is therefore beneficial.

------------------------------------------
before: cdef_filter_4x4_8bpc_avx2: 93.8
 after: cdef_filter_4x4_8bpc_avx2: 71.7
---------------------
before: cdef_filter_4x8_8bpc_avx2: 161.5
 after: cdef_filter_4x8_8bpc_avx2: 116.3
---------------------
before: cdef_filter_8x8_8bpc_avx2: 221.8
 after: cdef_filter_8x8_8bpc_avx2: 156.4
------------------------------------------

Full decode comparison against my local master (this branch is based on my local master too):

"e3dbf926 - arm64: looprestoration: NEON implementation of SGR for 10 bpc - Martin Storsjö"

https://pastebin.com/1n2Rt4qY


Fully edged blocks checkasm bench - 10 runs

Test was changed as it only benches non zero pri and sec strength for edges == 0xf:

 tests/checkasm/cdef.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/tests/checkasm/cdef.c b/tests/checkasm/cdef.c
index cde4f45..db25415 100644
--- a/tests/checkasm/cdef.c
+++ b/tests/checkasm/cdef.c
@@ -57,7 +57,7 @@ static void check_cdef_filter(const cdef_fn fn, const int w, const int h) {

     if (check_func(fn, "cdef_filter_%dx%d_%dbpc", w, h, BITDEPTH)) {
         for (int dir = 0; dir < 8; dir++) {
-            for (enum CdefEdgeFlags edges = 0x0; edges <= 0xf; edges++) {
+            for (enum CdefEdgeFlags edges = 0xf; edges <= 0xf; edges++) {
 #if BITDEPTH == 16
                 const int bitdepth_max = rnd() & 1 ? 0x3ff : 0xfff;
 #else
@@ -85,17 +85,8 @@ static void check_cdef_filter(const cdef_fn fn, const int w, const int h) {
                             pri_strength, sec_strength, dir, damping, to_binary(edges));
                     return;
                 }
-                if (dir == 7 && (edges == 0x5 || edges == 0xa || edges == 0xf)) {
-                    /* Benchmark a fixed set of cases to get consistent results:
-                     *  1) top/left edges and pri_strength only
-                     *  2) bottom/right edges and sec_strength only
-                     *  3) all edges and both pri_strength and sec_strength
-                     */
-                    pri_strength = (edges & 1) << bitdepth_min_8;
-                    sec_strength = (edges & 2) << bitdepth_min_8;
                     bench_new(a_dst, stride, left, top, pri_strength, sec_strength,
                               dir, damping, edges HIGHBD_TAIL_SUFFIX);
-                }
             }
         }
     }

https://pastebin.com/BZ07s78Q

Edited by Victorien Le Couviour--Tuffet

Merge request reports