loongarch: Improve the performance of pixel series functions
Performance has improved from 11.27fps to 20.50fps by using the
following command:
./configure && make -j5
./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
functions performance performance
(c) (asm)
hadamard_ac_8x8 117 21
hadamard_ac_8x16 236 42
hadamard_ac_16x8 235 31
hadamard_ac_16x16 473 60
intra_sad_x3_4x4 50 21
intra_sad_x3_8x8 183 34
intra_sad_x3_8x8c 181 36
intra_sad_x3_16x16 643 68
intra_satd_x3_4x4 83 61
intra_satd_x3_8x8c 344 81
intra_satd_x3_16x16 1389 136
sa8d_8x8 97 19
sa8d_16x16 394 68
satd_4x4 24 8
satd_4x8 51 11
satd_4x16 103 24
satd_8x4 52 9
satd_8x8 108 12
satd_8x16 218 24
satd_16x8 218 19
satd_16x16 437 38
ssd_4x4 10 5
ssd_4x8 24 8
ssd_4x16 42 15
ssd_8x4 23 5
ssd_8x8 37 9
ssd_8x16 74 17
ssd_16x8 72 11
ssd_16x16 140 23
var2_8x8 91 37
var2_8x16 176 66
var_8x8 50 15
var_8x16 65 29
var_16x16 132 56
Signed-off-by: gxw <guxiwei-hf@loongson.cn>