Skip to content

ADS kernel implementation for SVE

Matthias Langer requested to merge nekobasu/x264:ads_for_sve into master

This is the first of in series of changes we developed earlier this year while optimizing x264 for NVIDIA Grace.

Contains:

  • Implementations of ADS kernels for SVE (any vector size)
  • Implementations of ADS kernels for SVE2 (any vector size)

checkasm8

x264: using random seed 3804690644
x264: ARMv8
 - intra pred :          [OK]
 - coeff_last :          [OK]
 - coeff_level_run :     [OK]
x264: NEON
 - pixel sad :           [OK]
 - pixel sad_aligned :   [OK]
 - pixel ssd :           [OK]
 - pixel satd :          [OK]
 - pixel sa8d :          [OK]
 - pixel sa8d_satd :     [OK]
 - pixel sad_x3 :        [OK]
 - pixel sad_x4 :        [OK]
 - pixel var :           [OK]
 - pixel var2 :          [OK]
 - pixel hadamard_ac :   [OK]
 - pixel vsad :          [OK]
 - pixel asd :           [OK]
 - intra satd_x3 :       [OK]
 - intra sad_x3 :        [OK]
 - ssd_nv12 :            [OK]
 - ssim :                [OK]
 - sub_dct4 :            [OK]
 - sub_dct8 :            [OK]
 - add_idct4 :           [OK]
 - add_idct8 :           [OK]
 - dct4x4dc :            [OK]
 - idct4x4dc :           [OK]
 - zigzag_interleave :   [OK]
 - zigzag_frame :        [OK]
 - zigzag_field :        [OK]
 - mc luma :             [OK]
 - mc chroma :           [OK]
 - mc wpredb :           [OK]
 - mc weight :           [OK]
 - mc offsetadd :        [OK]
 - mc offsetsub :        [OK]
 - store_interleave :    [OK]
 - plane_copy :          [OK]
 - hpel filter :         [OK]
 - lowres init :         [OK]
 - integral init :       [OK]
 - mbtree :              [OK]
 - memcpy aligned :      [OK]
 - memzero aligned :     [OK]
 - intra pred :          [OK]
 - deblock :             [OK]
 - quant :               [OK]
 - dequant :             [OK]
 - denoise dct :         [OK]
 - decimate_score :      [OK]
 - coeff_last :          [OK]
 - coeff_level_run :     [OK]
 - nal escape:           [OK]
x264: SVE (128 bits)
 - pixel ssd :           [OK]
 - pixel sa8d :          [OK]
 - pixel var :           [OK]
 - pixel hadamard_ac :   [OK]
 - esa ads:              [OK]
 - sub_dct4 :            [OK]
 - zigzag_interleave :   [OK]
 - mc wpredb :           [OK]
 - deblock :             [OK]
x264: SVE2 (128 bits)
 - add_idct4 :           [OK]
x264: All tests passed Yeah :)

checkasm10

I have no name!@a64c740923aa:/x264$ ./checkasm10
x264: using random seed 3721534389
x264: ARMv8
x264: NEON
 - pixel sad :           [OK]
 - pixel ssd :           [OK]
 - pixel satd :          [OK]
 - pixel sa8d :          [OK]
 - pixel sa8d_satd :     [OK]
 - pixel sad_x3 :        [OK]
 - pixel sad_x4 :        [OK]
 - pixel var :           [OK]
 - pixel var2 :          [OK]
 - pixel hadamard_ac :   [OK]
 - pixel vsad :          [OK]
 - pixel asd :           [OK]
 - ssd_nv12 :            [OK]
 - ssim :                [OK]
 - mc luma :             [OK]
 - mc chroma :           [OK]
 - mc wpredb :           [OK]
 - mc weight :           [OK]
 - mc offsetadd :        [OK]
 - mc offsetsub :        [OK]
 - store_interleave :    [OK]
 - plane_copy :          [OK]
 - hpel filter :         [OK]
 - lowres init :         [OK]
 - integral init :       [OK]
 - mbtree :              [OK]
 - memcpy aligned :      [OK]
 - memzero aligned :     [OK]
 - quant :               [OK]
 - dequant :             [OK]
 - denoise dct :         [OK]
 - decimate_score :      [OK]
 - coeff_last :          [OK]
 - coeff_level_run :     [OK]
 - nal escape:           [OK]
x264: SVE (128 bits)
 - pixel ssd :           [OK]
 - esa ads:              [OK]
x264: SVE2 (128 bits)
x264: All tests passed Yeah :)
Edited by Matthias Langer

Merge request reports