Skip to content

aarch64: Improve performance of `subWxH_dct` kernels. sub4x4 SVE removed because NEON is faster

BEFORE                    =>   AFTER                     = IMPROVEMENT
--------------------------------------------------------------------------
sub4x4_dct_c: 67          =>   sub4x4_dct_c: 66          =
sub4x4_dct_neon: 51       =>   sub4x4_dct_neon: 15       = 51/13 = 3.4x
sub4x4_dct_sve: 19        =>   sub4x4_dct_sve: 19        = now redundant
sub8x8_dct_c: 321         =>   sub8x8_dct_c: 317         =
sub8x8_dct_neon: 69       =>   sub8x8_dct_neon: 63       = 69/63 = 1.10x
sub8x8_dct8_c: 540        =>   sub8x8_dct8_c: 534        =
sub8x8_dct8_neon: 110     =>   sub8x8_dct8_neon: 105     = 110/105 = 1.05x
sub8x8_dct_dc_c: 130      =>   sub8x8_dct_dc_c: 130      =
sub8x8_dct_dc_neon: 22    =>   sub8x8_dct_dc_neon: 18    = 22/18 = 1.22x
sub8x16_dct_dc_c: 283     =>   sub8x16_dct_dc_c: 280     =
sub8x16_dct_dc_neon: 51   =>   sub8x16_dct_dc_neon: 47   = 51/48 = 1.09x
sub16x16_dct_c: 1352      =>   sub16x16_dct_c: 1345      =
sub16x16_dct_neon: 318    =>   sub16x16_dct_neon: 283    = 318/283 = 1.12x
sub16x16_dct8_c: 2273     =>   sub16x16_dct8_c: 2279     =
sub16x16_dct8_neon: 499   =>   sub16x16_dct8_neon: 479   = 499/479 = 1.04x
Edited by Matthias Langer

Merge request reports

Loading