ADS kernel implementation for SVE
This is the first of in series of changes we developed earlier this year while optimizing x264 for NVIDIA Grace.
Contains:
- Implementations of ADS kernels for SVE (any vector size)
- Implementations of ADS kernels for SVE2 (any vector size)
checkasm8
x264: using random seed 3804690644
x264: ARMv8
- intra pred : [OK]
- coeff_last : [OK]
- coeff_level_run : [OK]
x264: NEON
- pixel sad : [OK]
- pixel sad_aligned : [OK]
- pixel ssd : [OK]
- pixel satd : [OK]
- pixel sa8d : [OK]
- pixel sa8d_satd : [OK]
- pixel sad_x3 : [OK]
- pixel sad_x4 : [OK]
- pixel var : [OK]
- pixel var2 : [OK]
- pixel hadamard_ac : [OK]
- pixel vsad : [OK]
- pixel asd : [OK]
- intra satd_x3 : [OK]
- intra sad_x3 : [OK]
- ssd_nv12 : [OK]
- ssim : [OK]
- sub_dct4 : [OK]
- sub_dct8 : [OK]
- add_idct4 : [OK]
- add_idct8 : [OK]
- dct4x4dc : [OK]
- idct4x4dc : [OK]
- zigzag_interleave : [OK]
- zigzag_frame : [OK]
- zigzag_field : [OK]
- mc luma : [OK]
- mc chroma : [OK]
- mc wpredb : [OK]
- mc weight : [OK]
- mc offsetadd : [OK]
- mc offsetsub : [OK]
- store_interleave : [OK]
- plane_copy : [OK]
- hpel filter : [OK]
- lowres init : [OK]
- integral init : [OK]
- mbtree : [OK]
- memcpy aligned : [OK]
- memzero aligned : [OK]
- intra pred : [OK]
- deblock : [OK]
- quant : [OK]
- dequant : [OK]
- denoise dct : [OK]
- decimate_score : [OK]
- coeff_last : [OK]
- coeff_level_run : [OK]
- nal escape: [OK]
x264: SVE (128 bits)
- pixel ssd : [OK]
- pixel sa8d : [OK]
- pixel var : [OK]
- pixel hadamard_ac : [OK]
- esa ads: [OK]
- sub_dct4 : [OK]
- zigzag_interleave : [OK]
- mc wpredb : [OK]
- deblock : [OK]
x264: SVE2 (128 bits)
- add_idct4 : [OK]
x264: All tests passed Yeah :)
checkasm10
I have no name!@a64c740923aa:/x264$ ./checkasm10
x264: using random seed 3721534389
x264: ARMv8
x264: NEON
- pixel sad : [OK]
- pixel ssd : [OK]
- pixel satd : [OK]
- pixel sa8d : [OK]
- pixel sa8d_satd : [OK]
- pixel sad_x3 : [OK]
- pixel sad_x4 : [OK]
- pixel var : [OK]
- pixel var2 : [OK]
- pixel hadamard_ac : [OK]
- pixel vsad : [OK]
- pixel asd : [OK]
- ssd_nv12 : [OK]
- ssim : [OK]
- mc luma : [OK]
- mc chroma : [OK]
- mc wpredb : [OK]
- mc weight : [OK]
- mc offsetadd : [OK]
- mc offsetsub : [OK]
- store_interleave : [OK]
- plane_copy : [OK]
- hpel filter : [OK]
- lowres init : [OK]
- integral init : [OK]
- mbtree : [OK]
- memcpy aligned : [OK]
- memzero aligned : [OK]
- quant : [OK]
- dequant : [OK]
- denoise dct : [OK]
- decimate_score : [OK]
- coeff_last : [OK]
- coeff_level_run : [OK]
- nal escape: [OK]
x264: SVE (128 bits)
- pixel ssd : [OK]
- esa ads: [OK]
x264: SVE2 (128 bits)
x264: All tests passed Yeah :)
Edited by Matthias Langer