Commits on Source (9)
-
LSX/LASX is the LOONGARCH 128-bit/256-bit SIMD Architecture. Signed-off-by:
Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by:
Xiwei Gu <guxiwei-hf@loongson.cn>
1ecc51ee -
Common macros and functions for loongson optimization. Signed-off-by:
Shiyou Yin <yinshiyou-hf@loongson.cn>
25ffd616 -
Performance has improved from 4.76fps to 4.92fps. Tested with following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) deblock_luma[0] 79 39 deblock_luma[1] 91 18 deblock_luma_intra[0] 63 44 deblock_luma_intra[1] 71 18 deblock_strength 104 33 Signed-off-by:
Hao Chen <chenhao@loongson.cn>
d7d283f6 -
Performance has improved from 4.92fps to 6.32fps. Tested with following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) sad_4x4 13 3 sad_4x8 26 7 sad_4x16 57 13 sad_8x4 24 3 sad_8x8 54 8 sad_8x16 108 13 sad_16x8 95 8 sad_16x16 189 13 sad_x3_4x4 37 6 sad_x3_4x8 71 13 sad_x3_8x4 70 8 sad_x3_8x8 162 14 sad_x3_8x16 323 25 sad_x3_16x8 279 15 sad_x3_16x16 555 27 sad_x4_4x4 49 8 sad_x4_4x8 95 17 sad_x4_8x4 94 8 sad_x4_8x8 214 16 sad_x4_8x16 429 33 sad_x4_16x8 372 18 sad_x4_16x16 740 34 Signed-off-by:
wanglu <wanglu@loongson.cn>
00b8e3b9 -
Performance has improved from 6.32fps to 6.34fps. Tested with following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) intra_predict_4x4_dc 3 2 intra_predict_4x4_dc8 1 1 intra_predict_4x4_dcl 2 1 intra_predict_4x4_dct 2 1 intra_predict_4x4_ddl 7 2 intra_predict_4x4_h 2 1 intra_predict_4x4_v 1 1 intra_predict_8x8_dc 8 2 intra_predict_8x8_dc8 1 1 intra_predict_8x8_dcl 5 2 intra_predict_8x8_dct 5 2 intra_predict_8x8_ddl 27 3 intra_predict_8x8_ddr 26 3 intra_predict_8x8_h 4 2 intra_predict_8x8_v 3 1 intra_predict_8x8_vl 29 3 intra_predict_8x8_vr 31 4 intra_predict_8x8c_dc 8 5 intra_predict_8x8c_dc8 1 1 intra_predict_8x8c_dcl 5 3 intra_predict_8x8c_dct 5 3 intra_predict_8x8c_h 4 2 intra_predict_8x8c_p 58 30 intra_predict_8x8c_v 4 1 intra_predict_16x16_dc 32 8 intra_predict_16x16_dc8 9 4 intra_predict_16x16_dcl 26 6 intra_predict_16x16_dct 26 6 intra_predict_16x16_h 23 7 intra_predict_16x16_p 182 44 intra_predict_16x16_v 22 4 Signed-off-by:
Xiwei Gu <guxiwei-hf@loongson.cn>
d8ed272a -
Performance has improved from 6.34fps to 6.78fps. Tested with following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) coeff_last15 3 2 coeff_last16 3 1 coeff_last64 42 6 decimate_score15 8 12 decimate_score16 8 11 decimate_score64 61 43 dequant_4x4_cqm 16 5 dequant_4x4_dc_cqm 13 5 dequant_4x4_dc_flat 13 5 dequant_4x4_flat 16 5 dequant_8x8_cqm 71 9 dequant_8x8_flat 71 9 Signed-off-by:
Shiyou Yin <yinshiyou-hf@loongson.cn>
65e7bac5 -
Yin Shiyou authored
Performance has improved from 6.78fps to 10.53fps. Tested with following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) avg_4x2 16 5 avg_4x4 30 6 avg_4x8 63 10 avg_4x16 124 19 avg_8x4 60 6 avg_8x8 119 10 avg_8x16 233 19 avg_16x8 229 21 avg_16x16 451 41 get_ref_4x4 30 9 get_ref_4x8 52 11 get_ref_8x4 45 9 get_ref_8x8 80 11 get_ref_8x16 156 16 get_ref_12x10 137 13 get_ref_16x8 147 11 get_ref_16x16 282 16 get_ref_20x18 278 22 hpel_filter 5163 686 lowres_init 5440 286 mc_chroma_2x2 24 7 mc_chroma_2x4 42 10 mc_chroma_4x2 41 7 mc_chroma_4x4 75 10 mc_chroma_4x8 144 19 mc_chroma_8x4 137 15 mc_chroma_8x8 269 28 mc_luma_4x4 30 10 mc_luma_4x8 52 12 mc_luma_8x4 44 10 mc_luma_8x8 80 13 mc_luma_8x16 156 19 mc_luma_16x8 147 13 mc_luma_16x16 281 19 memcpy_aligned 14 9 memzero_aligned 24 4 offsetadd_w4 79 18 offsetadd_w8 142 18 offsetadd_w16 277 25 offsetadd_w20 1118 38 offsetsub_w4 75 18 offsetsub_w8 140 18 offsetsub_w16 265 25 offsetsub_w20 989 39 weight_w4 111 19 weight_w8 205 19 weight_w16 396 29 weight_w20 1143 45 deinterleave_chroma_fdec 76 9 deinterleave_chroma_fenc 86 9 plane_copy_deinterleave 733 90 plane_copy_interleave 791 245 store_interleave_chroma 82 12 Signed-off-by:
Xiwei Gu <guxiwei-hf@loongson.cn>
981c8f25 -
Yin Shiyou authored
Performance has improved from 10.53fps to 11.27fps. Tested with following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) add4x4_idct 34 9 add8x8_idct 139 31 add8x8_idct8 269 39 add8x8_idct_dc 67 7 add16x16_idct 564 123 add16x16_idct_dc 260 22 dct4x4dc 18 10 idct4x4dc 16 9 sub4x4_dct 25 7 sub8x8_dct 101 12 sub8x8_dct8 160 25 sub16x16_dct 403 52 sub16x16_dct8 646 68 zigzag_scan_4x4_frame 4 1 Signed-off-by:
zhoupeng <zhoupeng@loongson.cn>
fa7f1fce -
Yin Shiyou authored
Performance has improved from 11.27fps to 20.50fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) hadamard_ac_8x8 117 21 hadamard_ac_8x16 236 42 hadamard_ac_16x8 235 31 hadamard_ac_16x16 473 60 intra_sad_x3_4x4 50 21 intra_sad_x3_8x8 183 34 intra_sad_x3_8x8c 181 36 intra_sad_x3_16x16 643 68 intra_satd_x3_4x4 83 61 intra_satd_x3_8x8c 344 81 intra_satd_x3_16x16 1389 136 sa8d_8x8 97 19 sa8d_16x16 394 68 satd_4x4 24 8 satd_4x8 51 11 satd_4x16 103 24 satd_8x4 52 9 satd_8x8 108 12 satd_8x16 218 24 satd_16x8 218 19 satd_16x16 437 38 ssd_4x4 10 5 ssd_4x8 24 8 ssd_4x16 42 15 ssd_8x4 23 5 ssd_8x8 37 9 ssd_8x16 74 17 ssd_16x8 72 11 ssd_16x16 140 23 var2_8x8 91 37 var2_8x16 176 66 var_8x8 50 15 var_8x16 65 29 var_16x16 132 56 Signed-off-by:
Hecai Yuan <yuanhecai@loongson.cn>
5f84d403
Showing
- Makefile 26 additions, 0 deletionsMakefile
- common/cpu.c 22 additions, 0 deletionscommon/cpu.c
- common/dct.c 42 additions, 1 deletioncommon/dct.c
- common/deblock.c 21 additions, 0 deletionscommon/deblock.c
- common/loongarch/dct-a.S 2016 additions, 0 deletionscommon/loongarch/dct-a.S
- common/loongarch/dct.h 95 additions, 0 deletionscommon/loongarch/dct.h
- common/loongarch/deblock-a.S 1618 additions, 0 deletionscommon/loongarch/deblock-a.S
- common/loongarch/deblock.h 54 additions, 0 deletionscommon/loongarch/deblock.h
- common/loongarch/loongson_asm.S 712 additions, 0 deletionscommon/loongarch/loongson_asm.S
- common/loongarch/loongson_util.S 47 additions, 0 deletionscommon/loongarch/loongson_util.S
- common/loongarch/mc-a.S 2702 additions, 0 deletionscommon/loongarch/mc-a.S
- common/loongarch/mc-c.c 406 additions, 0 deletionscommon/loongarch/mc-c.c
- common/loongarch/mc.h 196 additions, 0 deletionscommon/loongarch/mc.h
- common/loongarch/pixel-a.S 3542 additions, 0 deletionscommon/loongarch/pixel-a.S
- common/loongarch/pixel-c.c 259 additions, 0 deletionscommon/loongarch/pixel-c.c
- common/loongarch/pixel.h 335 additions, 0 deletionscommon/loongarch/pixel.h
- common/loongarch/predict-a.S 1383 additions, 0 deletionscommon/loongarch/predict-a.S
- common/loongarch/predict-c.c 106 additions, 0 deletionscommon/loongarch/predict-c.c
- common/loongarch/predict.h 150 additions, 0 deletionscommon/loongarch/predict.h
- common/loongarch/quant-a.S 986 additions, 0 deletionscommon/loongarch/quant-a.S
common/loongarch/dct-a.S
0 → 100644
This diff is collapsed.
common/loongarch/dct.h
0 → 100644
common/loongarch/deblock-a.S
0 → 100644
This diff is collapsed.
common/loongarch/deblock.h
0 → 100644
common/loongarch/loongson_asm.S
0 → 100644
common/loongarch/loongson_util.S
0 → 100644
common/loongarch/mc-a.S
0 → 100644
This diff is collapsed.
common/loongarch/mc-c.c
0 → 100644
This diff is collapsed.
common/loongarch/mc.h
0 → 100644
common/loongarch/pixel-a.S
0 → 100644
This diff is collapsed.
common/loongarch/pixel-c.c
0 → 100644
This diff is collapsed.
common/loongarch/pixel.h
0 → 100644
This diff is collapsed.
common/loongarch/predict-a.S
0 → 100644
This diff is collapsed.
common/loongarch/predict-c.c
0 → 100644
This diff is collapsed.
common/loongarch/predict.h
0 → 100644
This diff is collapsed.
common/loongarch/quant-a.S
0 → 100644
This diff is collapsed.