Skip to content
Snippets Groups Projects

loongarch: support LoongArch LSX and LASX optimization.

Closed Lu Wang requested to merge wangluls/x264:LOONGARCH-V4 into master
  1. Mar 03, 2023
    • Hecai Yuan's avatar
      loongarch: Improve the performance of pixel series functions · 330b4a2d
      Hecai Yuan authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 11.27fps to 20.50fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      hadamard_ac_8x8          117             21
      hadamard_ac_8x16         236             42
      hadamard_ac_16x8         235             31
      hadamard_ac_16x16        473             60
      intra_sad_x3_4x4         50              21
      intra_sad_x3_8x8         183             34
      intra_sad_x3_8x8c        181             36
      intra_sad_x3_16x16       643             68
      intra_satd_x3_4x4        83              61
      intra_satd_x3_8x8c       344             81
      intra_satd_x3_16x16      1389            136
      sa8d_8x8                 97              19
      sa8d_16x16               394             68
      satd_4x4                 24              8
      satd_4x8                 51              11
      satd_4x16                103             24
      satd_8x4                 52              9
      satd_8x8                 108             12
      satd_8x16                218             24
      satd_16x8                218             19
      satd_16x16               437             38
      ssd_4x4                  10              5
      ssd_4x8                  24              8
      ssd_4x16                 42              15
      ssd_8x4                  23              5
      ssd_8x8                  37              9
      ssd_8x16                 74              17
      ssd_16x8                 72              11
      ssd_16x16                140             23
      var2_8x8                 91              37
      var2_8x16                176             66
      var_8x8                  50              15
      var_8x16                 65              29
      var_16x16                132             56
      
      Signed-off-by: default avataryuanhecai <yuanhecai@loongson.cn>
      Change-Id: Ice971fe384615aefbc5e4aca0c87f60499a7c7f7
      330b4a2d
    • zhoupeng's avatar
      loongarch: Improve the performance of dct series functions · 2413c413
      zhoupeng authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 10.53fps to 11.27fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      add4x4_idct              34              9
      add8x8_idct              139             31
      add8x8_idct8             269             39
      add8x8_idct_dc           67              7
      add16x16_idct            564             123
      add16x16_idct_dc         260             22
      dct4x4dc                 18              10
      idct4x4dc                16              9
      sub4x4_dct               25              7
      sub8x8_dct               101             12
      sub8x8_dct8              160             25
      sub16x16_dct             403             52
      sub16x16_dct8            646             68
      zigzag_scan_4x4_frame    4               1
      
      Signed-off-by: default avatarzhoupeng <zhoupeng@loongson.cn>
      Change-Id: I706d8f313c7130a2bc2fa20886409c2ef9e0cd92
      2413c413
    • guxiwei's avatar
      loongarch: Improve the performance of mc series functions · 4b9719d6
      guxiwei authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 6.78fps to 10.53fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      avg_4x2                  16              5
      avg_4x4                  30              6
      avg_4x8                  63              10
      avg_4x16                 124             19
      avg_8x4                  60              6
      avg_8x8                  119             10
      avg_8x16                 233             19
      avg_16x8                 229             21
      avg_16x16                451             41
      get_ref_4x4              30              9
      get_ref_4x8              52              11
      get_ref_8x4              45              9
      get_ref_8x8              80              11
      get_ref_8x16             156             16
      get_ref_12x10            137             13
      get_ref_16x8             147             11
      get_ref_16x16            282             16
      get_ref_20x18            278             22
      hpel_filter              5163            686
      lowres_init              5440            286
      mc_chroma_2x2            24              7
      mc_chroma_2x4            42              10
      mc_chroma_4x2            41              7
      mc_chroma_4x4            75              10
      mc_chroma_4x8            144             19
      mc_chroma_8x4            137             15
      mc_chroma_8x8            269             28
      mc_luma_4x4              30              10
      mc_luma_4x8              52              12
      mc_luma_8x4              44              10
      mc_luma_8x8              80              13
      mc_luma_8x16             156             19
      mc_luma_16x8             147             13
      mc_luma_16x16            281             19
      memcpy_aligned           14              9
      memzero_aligned          24              4
      offsetadd_w4             79              18
      offsetadd_w8             142             18
      offsetadd_w16            277             25
      offsetadd_w20            1118            38
      offsetsub_w4             75              18
      offsetsub_w8             140             18
      offsetsub_w16            265             25
      offsetsub_w20            989             39
      weight_w4                111             19
      weight_w8                205             19
      weight_w16               396             29
      weight_w20               1143            45
      deinterleave_chroma_fdec 76              9
      deinterleave_chroma_fenc 86              9
      plane_copy_deinterleave  733             90
      plane_copy_interleave    791             245
      store_interleave_chroma  82              12
      
      Signed-off-by: default avatargxw <guxiwei-hf@loongson.cn>
      Change-Id: I6cdccff3d2cf79f0fc4571a7e2ee3699d24d698f
      4b9719d6
    • Yin Shiyou's avatar
      loongarch: Improve the performance of quant series functions · ba09be07
      Yin Shiyou authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 6.34fps to 6.78fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      coeff_last15             3               2
      coeff_last16             3               1
      coeff_last64             42              6
      decimate_score15         8               12
      decimate_score16         8               11
      decimate_score64         61              43
      dequant_4x4_cqm          16              5
      dequant_4x4_dc_cqm       13              5
      dequant_4x4_dc_flat      13              5
      dequant_4x4_flat         16              5
      dequant_8x8_cqm          71              9
      dequant_8x8_flat         71              9
      
      Signed-off-by: default avataryinshiyou <yinshiyou-hf@loongson.cn>
      Change-Id: I599e4da3d930b96792e4a3b3576d841c1d80d5e9
      ba09be07
    • Lu Wang's avatar
      loongarch: Improve the performance of predict series functions · f7e14b93
      Lu Wang authored
      
      Performance has improved from 6.32fps to 6.34fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      intra_predict_4x4_dc     3               2
      intra_predict_4x4_dc8    1               1
      intra_predict_4x4_dcl    2               1
      intra_predict_4x4_dct    2               1
      intra_predict_4x4_ddl    7               2
      intra_predict_4x4_h      2               1
      intra_predict_4x4_v      1               1
      intra_predict_8x8_dc     8               2
      intra_predict_8x8_dc8    1               1
      intra_predict_8x8_dcl    5               2
      intra_predict_8x8_dct    5               2
      intra_predict_8x8_ddl    27              3
      intra_predict_8x8_ddr    26              3
      intra_predict_8x8_h      4               2
      intra_predict_8x8_v      3               1
      intra_predict_8x8_vl     29              3
      intra_predict_8x8_vr     31              4
      intra_predict_8x8c_dc    8               5
      intra_predict_8x8c_dc8   1               1
      intra_predict_8x8c_dcl   5               3
      intra_predict_8x8c_dct   5               3
      intra_predict_8x8c_h     4               2
      intra_predict_8x8c_p     58              30
      intra_predict_8x8c_v     4               1
      intra_predict_16x16_dc   32              8
      intra_predict_16x16_dc8  9               4
      intra_predict_16x16_dcl  26              6
      intra_predict_16x16_dct  26              6
      intra_predict_16x16_h    23              7
      intra_predict_16x16_p    182             44
      intra_predict_16x16_v    22              4
      
      Signed-off-by: default avatarwanglu <wanglu@loongson.cn>
      Change-Id: Iecbd15f3a355ad46692bf1a68d97e32683a3a6f0
      f7e14b93
    • Lu Wang's avatar
      loongarch: Improve the performance of sad/sad_x3/sad_x4 series functions · 39951e92
      Lu Wang authored
      
      Performance has improved from 4.92fps to 6.32fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      sad_4x4                 13               3
      sad_4x8                 26               7
      sad_4x16                57               13
      sad_8x4                 24               3
      sad_8x8                 54               8
      sad_8x16                108              13
      sad_16x8                95               8
      sad_16x16               189              13
      sad_x3_4x4              37               6
      sad_x3_4x8              71               13
      sad_x3_8x4              70               8
      sad_x3_8x8              162              14
      sad_x3_8x16             323              25
      sad_x3_16x8             279              15
      sad_x3_16x16            555              27
      sad_x4_4x4              49               8
      sad_x4_4x8              95               17
      sad_x4_8x4              94               8
      sad_x4_8x8              214              16
      sad_x4_8x16             429              33
      sad_x4_16x8             372              18
      sad_x4_16x16            740              34
      
      Signed-off-by: default avatarwanglu <wanglu@loongson.cn>
      Change-Id: Ib9000f55d3175e0138baa1d7c6687b9ad7d809c1
      39951e92
  2. Feb 22, 2023
Loading