Skip to content
Snippets Groups Projects

Loongarch: multiple SIMD optimization functions are added

Merged Hecai Yuan requested to merge HecaiYuan/dav1d:master into master
  1. Sep 30, 2024
    • jinbo's avatar
      loongarch: minor improvement on decode_symbol_adapt · ed004fe9
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      Change-Id: I78fe788113ff2487ba1ce2e7d0c7d7c78c5a8c58
      ed004fe9
    • Hecai Yuan's avatar
      loongarch: rewrite optimization functions in loongarch/itx.S · 62a51df1
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Change-Id: I1566e8145d36296f2c76107cf15fc2cc7ac0ecc7
      62a51df1
    • guxiwei's avatar
      LoongArch: Add save_tmvs_lsx · 757f294a
      guxiwei authored and Hecai Yuan's avatar Hecai Yuan committed
      The performance data is as follows:
      save_tmvs_c:        3938.6 ( 1.00x)
      save_tmvs_lsx:      1355.3 ( 2.91x)
      757f294a
    • jinbo's avatar
      loongarch: refactor loopfilter · 3d96175d
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      bench performance before:
      lpf_h_sb_y_w16_8bpc_c:      117.0 ( 1.00x)
      lpf_h_sb_y_w16_8bpc_lsx:     33.9 ( 3.46x)
      lpf_v_sb_y_w16_8bpc_c:      132.1 ( 1.00x)
      lpf_v_sb_y_w16_8bpc_lsx:     59.7 ( 2.21x)
      
      bench performance after:
      lpf_h_sb_y_w16_8bpc_c:      114.9 ( 1.00x)
      lpf_h_sb_y_w16_8bpc_lsx:     32.0 ( 3.59x)
      lpf_v_sb_y_w16_8bpc_c:      132.5 ( 1.00x)
      lpf_v_sb_y_w16_8bpc_lsx:     28.1 ( 4.72x)
      
      Change-Id: Ie64e164a9416c438f6b3881ce18fb42e2ddd073d
      3d96175d
    • Hecai Yuan's avatar
      loongarch: add lasx implementation of sgr_3x3 for 8 bpc · 70582027
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      sgr_3x3_8bpc_c:                                   27233.1 ( 1.00x)
      sgr_3x3_8bpc_lsx:                                 12874.7 ( 2.12x)
      sgr_3x3_8bpc_lasx:                                10183.7 ( 2.67x)
      
      Change-Id: I2aa469e8560733d6191396186bf776a12ad6e4a3
      70582027
    • Hecai Yuan's avatar
      loongarch: rewirte warp_8x8/8x8t_lsx for 8 bpc · 96d6e472
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      before:
      warp_8x8_8bpc_c:                                    109.8 ( 1.00x)
      warp_8x8_8bpc_lsx:                                   44.6 ( 2.46x)
      warp_8x8t_8bpc_c:                                    97.5 ( 1.00x)
      warp_8x8t_8bpc_lsx:                                  43.7 ( 2.23x)
      
      after:
      warp_8x8_8bpc_c:                                    109.8 ( 1.00x)
      warp_8x8_8bpc_lsx:                                   39.2 ( 2.80x)
      warp_8x8t_8bpc_c:                                    97.5 ( 1.00x)
      warp_8x8t_8bpc_lsx:                                  37.9 ( 2.57x)
      
      Change-Id: I11728c2c30821b8e2b1c85208710dfe5d1c1269c
      96d6e472
    • jinbo's avatar
      loongarch: Refine prep_8tap_8bpc_lasx · b9e9a0ef
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      mct_8tap_regular_w8_h_8bpc_c:                  47.1 ( 1.00x)
      mct_8tap_regular_w8_h_8bpc_lsx:                 6.3 ( 7.46x)
      mct_8tap_regular_w8_h_8bpc_lasx:                4.4 (10.80x)
      mct_8tap_regular_w8_hv_8bpc_c:                118.9 ( 1.00x)
      mct_8tap_regular_w8_hv_8bpc_lsx:               19.2 ( 6.20x)
      mct_8tap_regular_w8_hv_8bpc_lasx:              13.7 ( 8.69x)
      mct_8tap_regular_w8_v_8bpc_c:                  60.3 ( 1.00x)
      mct_8tap_regular_w8_v_8bpc_lsx:                 5.4 (11.08x)
      mct_8tap_regular_w8_v_8bpc_lasx:                3.3 (18.33x)
      
      Change-Id: I1140f6ffbd738166f2581bc9111ebbdf6f9fa72c
      b9e9a0ef
    • Hecai Yuan's avatar
      loongarch: add lasx implementation of wiener filter for 8 bpc · af11a10a
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      wiener_5tap_8bpc_c:                               18382.0 ( 1.00x)
      wiener_5tap_8bpc_lsx:                              4166.9 ( 4.41x)
      wiener_5tap_8bpc_lasx:                             2832.2 ( 6.49x)
      wiener_7tap_8bpc_c:                               18339.6 ( 1.00x)
      wiener_7tap_8bpc_lsx:                              4168.3 ( 4.40x)
      wiener_7tap_8bpc_lasx:                             2832.5 ( 6.47x)
      
      Change-Id: I183a8cb008203fb61683b0543d9409d58d141a2e
      af11a10a
    • zhoupeng's avatar
      Loongarch: Optimized load_tmvs_c function by LSX · 90a9549b
      zhoupeng authored and Hecai Yuan's avatar Hecai Yuan committed
      load_tmvs_c:     9702.0 ( 1.00x)
      load_tmvs_lsx:   7857.0 ( 1.23x)
      90a9549b
    • pengxu's avatar
      Loongarch: Optimized ipred_z1 8bpc functions by LSX · 411fc219
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      intra_pred_z1_w4_8bpc_c:        16.5 ( 1.00x)
      intra_pred_z1_w4_8bpc_lsx:       7.1 ( 2.31x)
      intra_pred_z1_w8_8bpc_c:        31.9 ( 1.00x)
      intra_pred_z1_w8_8bpc_lsx:      10.0 ( 3.20x)
      intra_pred_z1_w16_8bpc_c:       80.1 ( 1.00x)
      intra_pred_z1_w16_8bpc_lsx:     20.2 ( 3.96x)
      intra_pred_z1_w32_8bpc_c:      185.8 ( 1.00x)
      intra_pred_z1_w32_8bpc_lsx:     40.8 ( 4.55x)
      intra_pred_z1_w64_8bpc_c:      511.1 ( 1.00x)
      intra_pred_z1_w64_8bpc_lsx:     99.0 ( 5.16x)
      
      Change-Id: Id7591e9b87e5b4d7fc3f438397e25dc6ca8e7f91
      411fc219
    • zhoupeng's avatar
      Loongarch: Optimized emu_edge_c function by LSX · 7c63bb1b
      zhoupeng authored and Hecai Yuan's avatar Hecai Yuan committed
      emu_edge_w4_8bpc_c:        9.0 ( 1.00x)
      emu_edge_w4_8bpc_lsx:      6.7 ( 1.34x)
      emu_edge_w8_8bpc_c:       12.9 ( 1.00x)
      emu_edge_w8_8bpc_lsx:      9.2 ( 1.40x)
      emu_edge_w16_8bpc_c:       20.0 ( 1.00x)
      emu_edge_w16_8bpc_lsx:     16.3 ( 1.23x)
      emu_edge_w32_8bpc_c:       44.6 ( 1.00x)
      emu_edge_w32_8bpc_lsx:     33.3 ( 1.34x)
      emu_edge_w64_8bpc_c:       79.9 ( 1.00x)
      emu_edge_w64_8bpc_lsx:     66.2 ( 1.21x)
      emu_edge_w128_8bpc_c:      193.9 ( 1.00x)
      emu_edge_w128_8bpc_lsx:    197.8 ( 0.98x)
      
      Change-Id: I180c94d311509740b03793419d5790a931532980
      7c63bb1b
    • guxiwei's avatar
      LoongArch64: Implement checked_call() · e3101ddc
      guxiwei authored and Hecai Yuan's avatar Hecai Yuan committed
      Now checkasm calls the test function 'func_new' through
      the wrapper 'checked_call' instead of calling it directly.
      The purpose of the wrapper is to check if 'func_new' correctly
      saves and restores static registers. The wrapper writes dirty
      values to the static registers, and after calling 'func_new',
      it checks if the dirty values in the static registers remain consistent.
      
      Change-Id: Ia9290b55ab0f2dd87801f6fd175813d3f717d851
      e3101ddc
    • pengxu's avatar
      Loongarch: Optimized ipred_filter 8bpc functions by LSX · 7f891597
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      intra_pred_filter_w4_8bpc_c:          17.9 ( 1.00x)
      intra_pred_filter_w4_8bpc_lsx:         8.9 ( 2.00x)
      intra_pred_filter_w8_8bpc_c:          55.3 ( 1.00x)
      intra_pred_filter_w8_8bpc_lsx:        23.8 ( 2.33x)
      intra_pred_filter_w16_8bpc_c:        109.4 ( 1.00x)
      intra_pred_filter_w16_8bpc_lsx:       49.1 ( 2.23x)
      intra_pred_filter_w32_8bpc_c:        270.2 ( 1.00x)
      intra_pred_filter_w32_8bpc_lsx:      126.1 ( 2.14x)
      
      Change-Id: Ic4c23cb1d54d5f8557c31cdfbbd54f8beaaa32c2
      7f891597
    • Hecai Yuan's avatar
      loongarch: Add the some optimization function about itx for 8bpc · f398bf96
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      1. inv_txfm_add_dct_dct_32x16_8bpc_lsx
      2. inv_txfm_add_dct_dct_32x8_8bpc_lsx
      3. inv_txfm_add_dct_dct_64x32_8bpc_lsx
      4. inv_txfm_add_adst_flipadst_16x16_8bpc_lsx
      5. inv_txfm_add_flipadst_adst_16x16_8bpc_lsx
      6. inv_txfm_add_adst_adst_16x16_8bpc_lasx
      
      Relative speedup over C code:
      
      inv_txfm_add_32x16_dct_dct_0_8bpc_c:                 78.4 ( 1.00x)
      inv_txfm_add_32x16_dct_dct_0_8bpc_lsx:                5.7 (13.81x)
      inv_txfm_add_32x16_dct_dct_1_8bpc_c:                710.1 ( 1.00x)
      inv_txfm_add_32x16_dct_dct_1_8bpc_lsx:              102.9 ( 6.90x)
      inv_txfm_add_32x16_dct_dct_2_8bpc_c:                918.0 ( 1.00x)
      inv_txfm_add_32x16_dct_dct_2_8bpc_lsx:              103.2 ( 8.90x)
      inv_txfm_add_32x16_dct_dct_3_8bpc_c:                914.3 ( 1.00x)
      inv_txfm_add_32x16_dct_dct_3_8bpc_lsx:              103.2 ( 8.86x)
      inv_txfm_add_32x16_dct_dct_4_8bpc_c:                929.8 ( 1.00x)
      inv_txfm_add_32x16_dct_dct_4_8bpc_lsx:              102.9 ( 9.03x)
      
      inv_txfm_add_32x8_dct_dct_0_8bpc_c:                  39.6 ( 1.00x)
      inv_txfm_add_32x8_dct_dct_0_8bpc_lsx:                 3.0 (13.10x)
      inv_txfm_add_32x8_dct_dct_1_8bpc_c:                 431.6 ( 1.00x)
      inv_txfm_add_32x8_dct_dct_1_8bpc_lsx:                42.6 (10.13x)
      inv_txfm_add_32x8_dct_dct_2_8bpc_c:                 431.5 ( 1.00x)
      inv_txfm_add_32x8_dct_dct_2_8bpc_lsx:                42.6 (10.13x)
      inv_txfm_add_32x8_dct_dct_3_8bpc_c:                 432.0 ( 1.00x)
      inv_txfm_add_32x8_dct_dct_3_8bpc_lsx:                42.6 (10.14x)
      inv_txfm_add_32x8_dct_dct_4_8bpc_c:                 431.3 ( 1.00x)
      inv_txfm_add_32x8_dct_dct_4_8bpc_lsx:                42.6 (10.13x)
      
      inv_txfm_add_64x32_dct_dct_0_8bpc_c:                304.3 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_0_8bpc_lsx:               20.3 (15.01x)
      inv_txfm_add_64x32_dct_dct_1_8bpc_c:               2743.1 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_1_8bpc_lsx:              270.9 (10.13x)
      inv_txfm_add_64x32_dct_dct_2_8bpc_c:               3197.1 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_2_8bpc_lsx:              327.7 ( 9.76x)
      inv_txfm_add_64x32_dct_dct_3_8bpc_c:               3638.3 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_3_8bpc_lsx:              383.7 ( 9.48x)
      inv_txfm_add_64x32_dct_dct_4_8bpc_c:               4084.5 ( 1.00x)
      inv_txfm_add_64x32_dct_dct_4_8bpc_lsx:              441.7 ( 9.25x)
      
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_c:          277.3 ( 1.00x)
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_lsx:         58.7 ( 4.72x)
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_c:          358.1 ( 1.00x)
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_lsx:         58.7 ( 6.10x)
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_c:          449.3 ( 1.00x)
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_lsx:         58.7 ( 7.65x)
      
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_c:          277.2 ( 1.00x)
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_lsx:         58.7 ( 4.72x)
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_c:          358.7 ( 1.00x)
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_lsx:         58.7 ( 6.11x)
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_c:          450.4 ( 1.00x)
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_lsx:         58.7 ( 7.67x)
      
      inv_txfm_add_16x16_adst_adst_0_8bpc_c:              253.4 ( 1.00x)
      inv_txfm_add_16x16_adst_adst_0_8bpc_lasx:            23.1 (10.98x)
      inv_txfm_add_16x16_adst_adst_1_8bpc_c:              325.2 ( 1.00x)
      inv_txfm_add_16x16_adst_adst_1_8bpc_lasx:            23.1 (14.08x)
      inv_txfm_add_16x16_adst_adst_2_8bpc_c:              405.9 ( 1.00x)
      inv_txfm_add_16x16_adst_adst_2_8bpc_lasx:            23.1 (17.56x)
      
      Change-Id: Iaa5419a830c3308e2c4c9ac5b3699c3a971301ed
      f398bf96
    • Hecai Yuan's avatar
      loongarch: add lsx implementation of itx_8bpc.add_16x8 series function for 8 bpc · 13a857d0
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Relative speedup over C code:
      
      inv_txfm_add_16x8_adst_adst_0_8bpc_c:               127.7 ( 1.00x)
      inv_txfm_add_16x8_adst_adst_0_8bpc_lsx:              29.6 ( 4.32x)
      inv_txfm_add_16x8_adst_adst_1_8bpc_c:               206.6 ( 1.00x)
      inv_txfm_add_16x8_adst_adst_1_8bpc_lsx:              29.6 ( 6.98x)
      inv_txfm_add_16x8_adst_adst_2_8bpc_c:               206.6 ( 1.00x)
      inv_txfm_add_16x8_adst_adst_2_8bpc_lsx:              29.6 ( 6.99x)
      inv_txfm_add_16x8_adst_dct_0_8bpc_c:                126.7 ( 1.00x)
      inv_txfm_add_16x8_adst_dct_0_8bpc_lsx:               25.8 ( 4.91x)
      inv_txfm_add_16x8_adst_dct_1_8bpc_c:                205.1 ( 1.00x)
      inv_txfm_add_16x8_adst_dct_1_8bpc_lsx:               25.8 ( 7.94x)
      inv_txfm_add_16x8_adst_dct_2_8bpc_c:                205.2 ( 1.00x)
      inv_txfm_add_16x8_adst_dct_2_8bpc_lsx:               25.8 ( 7.94x)
      inv_txfm_add_16x8_adst_flipadst_0_8bpc_c:           128.3 ( 1.00x)
      inv_txfm_add_16x8_adst_flipadst_0_8bpc_lsx:          29.8 ( 4.30x)
      inv_txfm_add_16x8_adst_flipadst_1_8bpc_c:           207.2 ( 1.00x)
      inv_txfm_add_16x8_adst_flipadst_1_8bpc_lsx:          29.9 ( 6.94x)
      inv_txfm_add_16x8_adst_flipadst_2_8bpc_c:           207.1 ( 1.00x)
      inv_txfm_add_16x8_adst_flipadst_2_8bpc_lsx:          29.8 ( 6.94x)
      inv_txfm_add_16x8_adst_identity_0_8bpc_c:            78.3 ( 1.00x)
      inv_txfm_add_16x8_adst_identity_0_8bpc_lsx:          18.6 ( 4.21x)
      inv_txfm_add_16x8_adst_identity_1_8bpc_c:           157.1 ( 1.00x)
      inv_txfm_add_16x8_adst_identity_1_8bpc_lsx:          18.6 ( 8.45x)
      inv_txfm_add_16x8_adst_identity_2_8bpc_c:           157.2 ( 1.00x)
      inv_txfm_add_16x8_adst_identity_2_8bpc_lsx:          18.6 ( 8.46x)
      inv_txfm_add_16x8_dct_adst_0_8bpc_c:                127.4 ( 1.00x)
      inv_txfm_add_16x8_dct_adst_0_8bpc_lsx:               25.4 ( 5.02x)
      inv_txfm_add_16x8_dct_adst_1_8bpc_c:                201.2 ( 1.00x)
      inv_txfm_add_16x8_dct_adst_1_8bpc_lsx:               25.4 ( 7.93x)
      inv_txfm_add_16x8_dct_adst_2_8bpc_c:                201.2 ( 1.00x)
      inv_txfm_add_16x8_dct_adst_2_8bpc_lsx:               25.4 ( 7.93x)
      inv_txfm_add_16x8_dct_dct_0_8bpc_c:                  21.8 ( 1.00x)
      inv_txfm_add_16x8_dct_dct_0_8bpc_lsx:                 2.1 (10.52x)
      inv_txfm_add_16x8_dct_dct_1_8bpc_c:                 200.2 ( 1.00x)
      inv_txfm_add_16x8_dct_dct_1_8bpc_lsx:                21.6 ( 9.28x)
      inv_txfm_add_16x8_dct_dct_2_8bpc_c:                 200.2 ( 1.00x)
      inv_txfm_add_16x8_dct_dct_2_8bpc_lsx:                21.6 ( 9.28x)
      inv_txfm_add_16x8_dct_flipadst_0_8bpc_c:            127.2 ( 1.00x)
      inv_txfm_add_16x8_dct_flipadst_0_8bpc_lsx:           25.6 ( 4.96x)
      inv_txfm_add_16x8_dct_flipadst_1_8bpc_c:            201.2 ( 1.00x)
      inv_txfm_add_16x8_dct_flipadst_1_8bpc_lsx:           25.7 ( 7.84x)
      inv_txfm_add_16x8_dct_flipadst_2_8bpc_c:            201.7 ( 1.00x)
      inv_txfm_add_16x8_dct_flipadst_2_8bpc_lsx:           25.7 ( 7.86x)
      inv_txfm_add_16x8_dct_identity_0_8bpc_c:             77.3 ( 1.00x)
      inv_txfm_add_16x8_dct_identity_0_8bpc_lsx:           14.5 ( 5.35x)
      inv_txfm_add_16x8_dct_identity_1_8bpc_c:            151.2 ( 1.00x)
      inv_txfm_add_16x8_dct_identity_1_8bpc_lsx:           14.5 (10.46x)
      inv_txfm_add_16x8_dct_identity_2_8bpc_c:            151.5 ( 1.00x)
      inv_txfm_add_16x8_dct_identity_2_8bpc_lsx:           14.5 (10.48x)
      inv_txfm_add_16x8_flipadst_adst_0_8bpc_c:           128.5 ( 1.00x)
      inv_txfm_add_16x8_flipadst_adst_0_8bpc_lsx:          29.7 ( 4.32x)
      inv_txfm_add_16x8_flipadst_adst_1_8bpc_c:           207.3 ( 1.00x)
      inv_txfm_add_16x8_flipadst_adst_1_8bpc_lsx:          29.7 ( 6.97x)
      inv_txfm_add_16x8_flipadst_adst_2_8bpc_c:           207.4 ( 1.00x)
      inv_txfm_add_16x8_flipadst_adst_2_8bpc_lsx:          29.7 ( 6.98x)
      inv_txfm_add_16x8_flipadst_dct_0_8bpc_c:            126.8 ( 1.00x)
      inv_txfm_add_16x8_flipadst_dct_0_8bpc_lsx:           25.9 ( 4.90x)
      inv_txfm_add_16x8_flipadst_dct_1_8bpc_c:            204.8 ( 1.00x)
      inv_txfm_add_16x8_flipadst_dct_1_8bpc_lsx:           25.9 ( 7.92x)
      inv_txfm_add_16x8_flipadst_dct_2_8bpc_c:            205.4 ( 1.00x)
      inv_txfm_add_16x8_flipadst_dct_2_8bpc_lsx:           25.9 ( 7.94x)
      inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_c:       128.6 ( 1.00x)
      inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_lsx:      30.0 ( 4.29x)
      inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_c:       206.6 ( 1.00x)
      inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_lsx:      29.9 ( 6.90x)
      inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_c:       206.5 ( 1.00x)
      inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_lsx:      29.9 ( 6.90x)
      inv_txfm_add_16x8_flipadst_identity_0_8bpc_c:        77.8 ( 1.00x)
      inv_txfm_add_16x8_flipadst_identity_0_8bpc_lsx:      18.6 ( 4.18x)
      inv_txfm_add_16x8_flipadst_identity_1_8bpc_c:       156.3 ( 1.00x)
      inv_txfm_add_16x8_flipadst_identity_1_8bpc_lsx:      18.6 ( 8.40x)
      inv_txfm_add_16x8_flipadst_identity_2_8bpc_c:       156.6 ( 1.00x)
      inv_txfm_add_16x8_flipadst_identity_2_8bpc_lsx:      18.6 ( 8.42x)
      inv_txfm_add_16x8_identity_adst_0_8bpc_c:           120.7 ( 1.00x)
      inv_txfm_add_16x8_identity_adst_0_8bpc_lsx:          21.1 ( 5.71x)
      inv_txfm_add_16x8_identity_adst_1_8bpc_c:           120.8 ( 1.00x)
      inv_txfm_add_16x8_identity_adst_1_8bpc_lsx:          21.1 ( 5.71x)
      inv_txfm_add_16x8_identity_adst_2_8bpc_c:           145.5 ( 1.00x)
      inv_txfm_add_16x8_identity_adst_2_8bpc_lsx:          21.2 ( 6.88x)
      inv_txfm_add_16x8_identity_dct_0_8bpc_c:            119.1 ( 1.00x)
      inv_txfm_add_16x8_identity_dct_0_8bpc_lsx:           17.9 ( 6.67x)
      inv_txfm_add_16x8_identity_dct_1_8bpc_c:            119.1 ( 1.00x)
      inv_txfm_add_16x8_identity_dct_1_8bpc_lsx:           17.9 ( 6.67x)
      inv_txfm_add_16x8_identity_dct_2_8bpc_c:            143.8 ( 1.00x)
      inv_txfm_add_16x8_identity_dct_2_8bpc_lsx:           17.9 ( 8.06x)
      inv_txfm_add_16x8_identity_flipadst_0_8bpc_c:       120.7 ( 1.00x)
      inv_txfm_add_16x8_identity_flipadst_0_8bpc_lsx:      21.3 ( 5.66x)
      inv_txfm_add_16x8_identity_flipadst_1_8bpc_c:       120.4 ( 1.00x)
      inv_txfm_add_16x8_identity_flipadst_1_8bpc_lsx:      21.3 ( 5.65x)
      inv_txfm_add_16x8_identity_flipadst_2_8bpc_c:       144.9 ( 1.00x)
      inv_txfm_add_16x8_identity_flipadst_2_8bpc_lsx:      21.3 ( 6.80x)
      inv_txfm_add_16x8_identity_identity_0_8bpc_c:        70.2 ( 1.00x)
      inv_txfm_add_16x8_identity_identity_0_8bpc_lsx:       9.5 ( 7.38x)
      inv_txfm_add_16x8_identity_identity_1_8bpc_c:        95.6 ( 1.00x)
      inv_txfm_add_16x8_identity_identity_1_8bpc_lsx:       9.5 (10.06x)
      inv_txfm_add_16x8_identity_identity_2_8bpc_c:        95.6 ( 1.00x)
      inv_txfm_add_16x8_identity_identity_2_8bpc_lsx:       9.5 (10.06x)
      
      Change-Id: If1e274cab0e8441297a1eb44bd86be580f4c8f62
      13a857d0
    • Hecai Yuan's avatar
      loongarch: opt inv_txfm_add_adst_dct/dct_dct/identity_identity_16x4_8bpc_lsx · 843f00e5
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Relative speedup over C code:
      
      inv_txfm_add_16x4_adst_dct_0_8bpc_c:                 61.7 ( 1.00x)
      inv_txfm_add_16x4_adst_dct_0_8bpc_lsx:               17.8 ( 3.46x)
      inv_txfm_add_16x4_adst_dct_1_8bpc_c:                 96.2 ( 1.00x)
      inv_txfm_add_16x4_adst_dct_1_8bpc_lsx:               17.8 ( 5.39x)
      inv_txfm_add_16x4_adst_dct_2_8bpc_c:                 96.2 ( 1.00x)
      inv_txfm_add_16x4_adst_dct_2_8bpc_lsx:               17.8 ( 5.39x)
      inv_txfm_add_16x4_dct_dct_0_8bpc_c:                  10.8 ( 1.00x)
      inv_txfm_add_16x4_dct_dct_0_8bpc_lsx:                 0.9 (12.23x)
      inv_txfm_add_16x4_dct_dct_1_8bpc_c:                  94.5 ( 1.00x)
      inv_txfm_add_16x4_dct_dct_1_8bpc_lsx:                13.6 ( 6.94x)
      inv_txfm_add_16x4_dct_dct_2_8bpc_c:                  94.7 ( 1.00x)
      inv_txfm_add_16x4_dct_dct_2_8bpc_lsx:                13.6 ( 6.95x)
      inv_txfm_add_16x4_identity_identity_0_8bpc_c:        42.1 ( 1.00x)
      inv_txfm_add_16x4_identity_identity_0_8bpc_lsx:       5.1 ( 8.21x)
      inv_txfm_add_16x4_identity_identity_1_8bpc_c:        53.0 ( 1.00x)
      inv_txfm_add_16x4_identity_identity_1_8bpc_lsx:       5.1 (10.35x)
      inv_txfm_add_16x4_identity_identity_2_8bpc_c:        53.0 ( 1.00x)
      inv_txfm_add_16x4_identity_identity_2_8bpc_lsx:       5.1 (10.35x)
      
      Change-Id: I0be4f77e381da390e300070337fff404dcdcb862
      843f00e5
    • pengxu's avatar
      Loongarch: Optimized cfl_pred_cfl, cfl_pred_cfl_128, cfl_pred_cfl_top and... · 083cf424
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      Loongarch: Optimized cfl_pred_cfl, cfl_pred_cfl_128, cfl_pred_cfl_top and cfl_pred_cfl_left 8bpc functions by LSX
      
      cfl_pred_cfl_128_w4_8bpc_c:         19.4 ( 1.00x)
      cfl_pred_cfl_128_w4_8bpc_lsx:        4.2 ( 4.63x)
      cfl_pred_cfl_128_w8_8bpc_c:         66.3 ( 1.00x)
      cfl_pred_cfl_128_w8_8bpc_lsx:        7.3 ( 9.11x)
      cfl_pred_cfl_128_w16_8bpc_c:       150.1 ( 1.00x)
      cfl_pred_cfl_128_w16_8bpc_lsx:      14.4 (10.45x)
      cfl_pred_cfl_128_w32_8bpc_c:       403.6 ( 1.00x)
      cfl_pred_cfl_128_w32_8bpc_lsx:      34.7 (11.65x)
      cfl_pred_cfl_left_w4_8bpc_c:        20.5 ( 1.00x)
      cfl_pred_cfl_left_w4_8bpc_lsx:       4.4 ( 4.63x)
      cfl_pred_cfl_left_w8_8bpc_c:        67.9 ( 1.00x)
      cfl_pred_cfl_left_w8_8bpc_lsx:       7.6 ( 8.94x)
      cfl_pred_cfl_left_w16_8bpc_c:      152.0 ( 1.00x)
      cfl_pred_cfl_left_w16_8bpc_lsx:     14.6 (10.38x)
      cfl_pred_cfl_left_w32_8bpc_c:      405.8 ( 1.00x)
      cfl_pred_cfl_left_w32_8bpc_lsx:     35.0 (11.58x)
      cfl_pred_cfl_top_w4_8bpc_c:         20.0 ( 1.00x)
      cfl_pred_cfl_top_w4_8bpc_lsx:        4.4 ( 4.51x)
      cfl_pred_cfl_top_w8_8bpc_c:         67.6 ( 1.00x)
      cfl_pred_cfl_top_w8_8bpc_lsx:        7.5 ( 8.99x)
      cfl_pred_cfl_top_w16_8bpc_c:       152.5 ( 1.00x)
      cfl_pred_cfl_top_w16_8bpc_lsx:      14.6 (10.41x)
      cfl_pred_cfl_top_w32_8bpc_c:       408.0 ( 1.00x)
      cfl_pred_cfl_top_w32_8bpc_lsx:      35.2 (11.58x)
      cfl_pred_cfl_w4_8bpc_c:             21.1 ( 1.00x)
      cfl_pred_cfl_w4_8bpc_lsx:            4.8 ( 4.43x)
      cfl_pred_cfl_w8_8bpc_c:             68.6 ( 1.00x)
      cfl_pred_cfl_w8_8bpc_lsx:            7.9 ( 8.73x)
      cfl_pred_cfl_w16_8bpc_c:           154.4 ( 1.00x)
      cfl_pred_cfl_w16_8bpc_lsx:          15.0 (10.29x)
      cfl_pred_cfl_w32_8bpc_c:           410.3 ( 1.00x)
      cfl_pred_cfl_w32_8bpc_lsx:          35.6 (11.54x)
      
      Change-Id: I4ec7cc71483298d28379bfbd824e97a0d74d0c23
      083cf424
    • pengxu's avatar
      Loongarch: Optimized pal_pred 8bpc functions by LSX · 3f6c845d
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      pal_pred_w4_8bpc_c:         3.0 ( 1.00x)
      pal_pred_w4_8bpc_lsx:       0.6 ( 5.46x)
      pal_pred_w8_8bpc_c:         8.8 ( 1.00x)
      pal_pred_w8_8bpc_lsx:       0.9 ( 9.49x)
      pal_pred_w16_8bpc_c:       26.0 ( 1.00x)
      pal_pred_w16_8bpc_lsx:      1.9 (13.70x)
      pal_pred_w32_8bpc_c:       60.6 ( 1.00x)
      pal_pred_w32_8bpc_lsx:      4.0 (15.10x)
      pal_pred_w64_8bpc_c:      146.9 ( 1.00x)
      pal_pred_w64_8bpc_lsx:      9.2 (15.97x)
      
      Change-Id: I5414f096a23b09c3a512e727b93fa22104d141f9
      3f6c845d
    • jinbo's avatar
      loongarch: Add prep_8tap_8bpc_lsx · b26f315d
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      mct_8tap_regular_w4_0_8bpc_c:                        3.7 ( 1.00x)
      mct_8tap_regular_w4_0_8bpc_lsx:                      0.9 ( 4.21x)
      mct_8tap_regular_w4_h_8bpc_c:                       15.7 ( 1.00x)
      mct_8tap_regular_w4_h_8bpc_lsx:                      1.7 ( 9.24x)
      mct_8tap_regular_w4_hv_8bpc_c:                      44.1 ( 1.00x)
      mct_8tap_regular_w4_hv_8bpc_lsx:                     6.3 ( 6.96x)
      mct_8tap_regular_w4_v_8bpc_c:                       19.8 ( 1.00x)
      mct_8tap_regular_w4_v_8bpc_lsx:                      2.4 ( 8.21x)
      mct_8tap_regular_w8_0_8bpc_c:                       10.5 ( 1.00x)
      mct_8tap_regular_w8_0_8bpc_lsx:                      1.3 ( 8.27x)
      mct_8tap_regular_w8_h_8bpc_c:                       47.2 ( 1.00x)
      mct_8tap_regular_w8_h_8bpc_lsx:                      6.2 ( 7.61x)
      mct_8tap_regular_w8_hv_8bpc_c:                     119.5 ( 1.00x)
      mct_8tap_regular_w8_hv_8bpc_lsx:                    18.9 ( 6.32x)
      mct_8tap_regular_w8_v_8bpc_c:                       60.5 ( 1.00x)
      mct_8tap_regular_w8_v_8bpc_lsx:                      5.4 (11.12x)
      mct_8tap_regular_w16_0_8bpc_c:                      28.8 ( 1.00x)
      mct_8tap_regular_w16_0_8bpc_lsx:                     2.8 (10.32x)
      mct_8tap_regular_w16_h_8bpc_c:                     151.9 ( 1.00x)
      mct_8tap_regular_w16_h_8bpc_lsx:                    19.8 ( 7.67x)
      mct_8tap_regular_w16_hv_8bpc_c:                    357.5 ( 1.00x)
      mct_8tap_regular_w16_hv_8bpc_lsx:                   57.6 ( 6.21x)
      mct_8tap_regular_w16_v_8bpc_c:                     195.6 ( 1.00x)
      mct_8tap_regular_w16_v_8bpc_lsx:                    16.9 (11.61x)
      mct_8tap_regular_w32_0_8bpc_c:                     104.6 ( 1.00x)
      mct_8tap_regular_w32_0_8bpc_lsx:                    11.6 ( 9.03x)
      mct_8tap_regular_w32_h_8bpc_c:                     596.3 ( 1.00x)
      mct_8tap_regular_w32_h_8bpc_lsx:                    77.8 ( 7.67x)
      mct_8tap_regular_w32_hv_8bpc_c:                   1329.0 ( 1.00x)
      mct_8tap_regular_w32_hv_8bpc_lsx:                  217.9 ( 6.10x)
      mct_8tap_regular_w32_v_8bpc_c:                     771.0 ( 1.00x)
      mct_8tap_regular_w32_v_8bpc_lsx:                    65.7 (11.73x)
      mct_8tap_regular_w64_0_8bpc_c:                     242.0 ( 1.00x)
      mct_8tap_regular_w64_0_8bpc_lsx:                    27.0 ( 8.95x)
      mct_8tap_regular_w64_h_8bpc_c:                    1455.9 ( 1.00x)
      mct_8tap_regular_w64_h_8bpc_lsx:                   186.9 ( 7.79x)
      mct_8tap_regular_w64_hv_8bpc_c:                   3221.7 ( 1.00x)
      mct_8tap_regular_w64_hv_8bpc_lsx:                  521.8 ( 6.17x)
      mct_8tap_regular_w64_v_8bpc_c:                    1836.1 ( 1.00x)
      mct_8tap_regular_w64_v_8bpc_lsx:                   158.2 (11.61x)
      mct_8tap_regular_w128_0_8bpc_c:                    629.0 ( 1.00x)
      mct_8tap_regular_w128_0_8bpc_lsx:                   66.3 ( 9.49x)
      mct_8tap_regular_w128_h_8bpc_c:                   3617.5 ( 1.00x)
      mct_8tap_regular_w128_h_8bpc_lsx:                  463.6 ( 7.80x)
      mct_8tap_regular_w128_hv_8bpc_c:                  7881.7 ( 1.00x)
      mct_8tap_regular_w128_hv_8bpc_lsx:                1290.3 ( 6.11x)
      mct_8tap_regular_w128_v_8bpc_c:                   4552.9 ( 1.00x)
      mct_8tap_regular_w128_v_8bpc_lsx:                  391.1 (11.64x)
      
      Change-Id: I8c6046e4bd6c1fb19d5712234abece0355fb77fa
      b26f315d
    • zhoupeng's avatar
      Loongarch: Optimized blenc_h_c function by LSX/LASX · ce45ebde
      zhoupeng authored and Hecai Yuan's avatar Hecai Yuan committed
      blend_h_w2_8bpc_c:                                   3.8 ( 1.00x)
      blend_h_w2_8bpc_lsx:                                 1.9 ( 1.98x)
      blend_h_w2_8bpc_lasx:                                1.9 ( 1.98x)
      blend_h_w4_8bpc_c:                                   6.4 ( 1.00x)
      blend_h_w4_8bpc_lsx:                                 1.8 ( 3.49x)
      blend_h_w4_8bpc_lasx:                                1.8 ( 3.49x)
      blend_h_w8_8bpc_c:                                  11.6 ( 1.00x)
      blend_h_w8_8bpc_lsx:                                 1.8 ( 6.45x)
      blend_h_w8_8bpc_lasx:                                1.8 ( 6.48x)
      blend_h_w16_8bpc_c:                                 21.5 ( 1.00x)
      blend_h_w16_8bpc_lsx:                                2.1 (10.47x)
      blend_h_w16_8bpc_lasx:                               2.1 (10.48x)
      blend_h_w32_8bpc_c:                                 41.9 ( 1.00x)
      blend_h_w32_8bpc_lsx:                                3.8 (11.08x)
      blend_h_w32_8bpc_lasx:                               3.9 (10.67x)
      blend_h_w64_8bpc_c:                                 82.0 ( 1.00x)
      blend_h_w64_8bpc_lsx:                                6.9 (11.89x)
      blend_h_w64_8bpc_lasx:                               4.6 (17.93x)
      blend_h_w128_8bpc_c:                               202.3 ( 1.00x)
      blend_h_w128_8bpc_lsx:                              16.4 (12.30x)
      blend_h_w128_8bpc_lasx:                             11.4 (17.77x)
      
      Change-Id: I6d6599ccbaba8a62a629c4a52254b2369dba60f6
      ce45ebde
    • zhoupeng's avatar
      Loongarch: Optimized blend_c/blenc_v_c function by LSX · 5319278d
      zhoupeng authored and Hecai Yuan's avatar Hecai Yuan committed
      blend_v_w2_8bpc_c:                                   5.7 ( 1.00x)
      blend_v_w2_8bpc_lsx:                                 3.6 ( 1.60x)
      blend_v_w4_8bpc_c:                                  22.8 ( 1.00x)
      blend_v_w4_8bpc_lsx:                                 7.1 ( 3.20x)
      blend_v_w8_8bpc_c:                                  40.2 ( 1.00x)
      blend_v_w8_8bpc_lsx:                                 7.1 ( 5.63x)
      blend_v_w16_8bpc_c:                                 74.6 ( 1.00x)
      blend_v_w16_8bpc_lsx:                                8.1 ( 9.26x)
      blend_v_w32_8bpc_c:                                144.0 ( 1.00x)
      blend_v_w32_8bpc_lsx:                               13.3 (10.83x)
      blend_w4_8bpc_c:                                     4.9 ( 1.00x)
      blend_w4_8bpc_lsx:                                   1.9 ( 2.49x)
      blend_w8_8bpc_c:                                    14.1 ( 1.00x)
      blend_w8_8bpc_lsx:                                   3.2 ( 4.37x)
      blend_w16_8bpc_c:                                   51.5 ( 1.00x)
      blend_w16_8bpc_lsx:                                  7.9 ( 6.51x)
      blend_w32_8bpc_c:                                  127.5 ( 1.00x)
      blend_w32_8bpc_lsx:                                 19.6 ( 6.52x)
      
      Change-Id: I95e2dbc1f0735688f5473687f1a7e8d37ffbe417
      5319278d
    • pengxu's avatar
      Loongarch: Optimized ipred_smooth, ipred_smooth_h and ipred_smooth_v 8bpc functions by LSX · 0b9c756f
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      intra_pred_smooth_h_w4_8bpc_c:         7.3 ( 1.00x)
      intra_pred_smooth_h_w4_8bpc_lsx:       3.1 ( 2.36x)
      intra_pred_smooth_h_w8_8bpc_c:        21.3 ( 1.00x)
      intra_pred_smooth_h_w8_8bpc_lsx:       4.5 ( 4.71x)
      intra_pred_smooth_h_w16_8bpc_c:       66.3 ( 1.00x)
      intra_pred_smooth_h_w16_8bpc_lsx:     13.4 ( 4.96x)
      intra_pred_smooth_h_w32_8bpc_c:      160.0 ( 1.00x)
      intra_pred_smooth_h_w32_8bpc_lsx:     29.3 ( 5.46x)
      intra_pred_smooth_h_w64_8bpc_c:      400.2 ( 1.00x)
      intra_pred_smooth_h_w64_8bpc_lsx:     68.3 ( 5.86x)
      intra_pred_smooth_v_w4_8bpc_c:         6.6 ( 1.00x)
      intra_pred_smooth_v_w4_8bpc_lsx:       3.1 ( 2.10x)
      intra_pred_smooth_v_w8_8bpc_c:        19.3 ( 1.00x)
      intra_pred_smooth_v_w8_8bpc_lsx:       4.9 ( 3.95x)
      intra_pred_smooth_v_w16_8bpc_c:       58.6 ( 1.00x)
      intra_pred_smooth_v_w16_8bpc_lsx:     24.0 ( 2.44x)
      intra_pred_smooth_v_w32_8bpc_c:      139.4 ( 1.00x)
      intra_pred_smooth_v_w32_8bpc_lsx:     27.0 ( 5.17x)
      intra_pred_smooth_v_w64_8bpc_c:      344.8 ( 1.00x)
      intra_pred_smooth_v_w64_8bpc_lsx:     70.8 ( 4.87x)
      intra_pred_smooth_w4_8bpc_c:          10.2 ( 1.00x)
      intra_pred_smooth_w4_8bpc_lsx:         7.9 ( 1.30x)
      intra_pred_smooth_w8_8bpc_c:          30.3 ( 1.00x)
      intra_pred_smooth_w8_8bpc_lsx:        20.0 ( 1.51x)
      intra_pred_smooth_w16_8bpc_c:         96.3 ( 1.00x)
      intra_pred_smooth_w16_8bpc_lsx:       58.3 ( 1.65x)
      intra_pred_smooth_w32_8bpc_c:        231.1 ( 1.00x)
      intra_pred_smooth_w32_8bpc_lsx:      134.3 ( 1.72x)
      intra_pred_smooth_w64_8bpc_c:        571.5 ( 1.00x)
      intra_pred_smooth_w64_8bpc_lsx:      326.5 ( 1.75x)
      
      Change-Id: I22b6c2dcf27c5393bba374b4fbe8879c0463f828
      0b9c756f
    • pengxu's avatar
      Loongarch: Optimized ipred_paeth 8bpc function by LSX · 7463c2af
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      intra_pred_paeth_w4_8bpc_c:          12.3 ( 1.00x)
      intra_pred_paeth_w4_8bpc_lsx:         3.9 ( 3.12x)
      intra_pred_paeth_w8_8bpc_c:          39.7 ( 1.00x)
      intra_pred_paeth_w8_8bpc_lsx:         6.4 ( 6.20x)
      intra_pred_paeth_w16_8bpc_c:        133.6 ( 1.00x)
      intra_pred_paeth_w16_8bpc_lsx:       17.0 ( 7.85x)
      intra_pred_paeth_w32_8bpc_c:        342.8 ( 1.00x)
      intra_pred_paeth_w32_8bpc_lsx:       52.7 ( 6.50x)
      intra_pred_paeth_w64_8bpc_c:        903.8 ( 1.00x)
      intra_pred_paeth_w64_8bpc_lsx:      107.3 ( 8.42x)
      
      Change-Id: I457bdb24fdd6b5400ec030bffbdd40c79d8165c1
      7463c2af
    • pengxu's avatar
      Loongarch: Optimized ipred_h and ipred_v 8bpc function by LSX · 3e9d80d8
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      intra_pred_h_w4_8bpc_c:               4.3 ( 1.00x)
      intra_pred_h_w4_8bpc_lsx:             3.5 ( 1.21x)
      intra_pred_h_w8_8bpc_c:               5.7 ( 1.00x)
      intra_pred_h_w8_8bpc_lsx:             5.1 ( 1.11x)
      intra_pred_h_w16_8bpc_c:             13.2 ( 1.00x)
      intra_pred_h_w16_8bpc_lsx:            7.1 ( 1.86x)
      intra_pred_h_w32_8bpc_c:             12.4 ( 1.00x)
      intra_pred_h_w32_8bpc_lsx:            6.3 ( 1.96x)
      intra_pred_h_w64_8bpc_c:             25.9 ( 1.00x)
      intra_pred_h_w64_8bpc_lsx:            5.8 ( 4.44x)
      intra_pred_v_w4_8bpc_c:               4.6 ( 1.00x)
      intra_pred_v_w4_8bpc_lsx:             2.5 ( 1.85x)
      intra_pred_v_w8_8bpc_c:               6.9 ( 1.00x)
      intra_pred_v_w8_8bpc_lsx:             4.5 ( 1.53x)
      intra_pred_v_w16_8bpc_c:             13.3 ( 1.00x)
      intra_pred_v_w16_8bpc_lsx:            5.2 ( 2.56x)
      intra_pred_v_w32_8bpc_c:             16.1 ( 1.00x)
      intra_pred_v_w32_8bpc_lsx:            5.1 ( 3.13x)
      intra_pred_v_w64_8bpc_c:             21.7 ( 1.00x)
      intra_pred_v_w64_8bpc_lsx:            7.7 ( 2.80x)
      
      Change-Id: I51b3dd13877315b9c1c64590c19f1ad38bfc4bdf
      3e9d80d8
    • pengxu's avatar
      Loongarch: Optimized ipred_dc,ipred_dc_128 8bpc,ipred_dc_left and ipred_dc_top functions by LSX · 2a9cbcc2
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      intra_pred_dc_w4_8bpc_c:              2.1 ( 1.00x)
      intra_pred_dc_w4_8bpc_lsx:            1.3 ( 1.54x)
      intra_pred_dc_w8_8bpc_c:              3.6 ( 1.00x)
      intra_pred_dc_w8_8bpc_lsx:            3.7 ( 0.97x)
      intra_pred_dc_w16_8bpc_c:             6.9 ( 1.00x)
      intra_pred_dc_w16_8bpc_lsx:           7.8 ( 0.88x)
      intra_pred_dc_w32_8bpc_c:            14.1 ( 1.00x)
      intra_pred_dc_w32_8bpc_lsx:           7.1 ( 1.97x)
      intra_pred_dc_w64_8bpc_c:            25.3 ( 1.00x)
      intra_pred_dc_w64_8bpc_lsx:           7.4 ( 3.41x)
      intra_pred_dc_128_w4_8bpc_c:          0.6 ( 1.00x)
      intra_pred_dc_128_w4_8bpc_lsx:        0.8 ( 0.76x)
      intra_pred_dc_128_w8_8bpc_c:          1.4 ( 1.00x)
      intra_pred_dc_128_w8_8bpc_lsx:        3.2 ( 0.45x)
      intra_pred_dc_128_w16_8bpc_c:         3.4 ( 1.00x)
      intra_pred_dc_128_w16_8bpc_lsx:       7.3 ( 0.47x)
      intra_pred_dc_128_w32_8bpc_c:         8.8 ( 1.00x)
      intra_pred_dc_128_w32_8bpc_lsx:       6.4 ( 1.38x)
      intra_pred_dc_128_w64_8bpc_c:        17.0 ( 1.00x)
      intra_pred_dc_128_w64_8bpc_lsx:       6.2 ( 2.74x)
      intra_pred_dc_left_w4_8bpc_c:         1.1 ( 1.00x)
      intra_pred_dc_left_w4_8bpc_lsx:       1.1 ( 1.00x)
      intra_pred_dc_left_w8_8bpc_c:         2.1 ( 1.00x)
      intra_pred_dc_left_w8_8bpc_lsx:       3.4 ( 0.64x)
      intra_pred_dc_left_w16_8bpc_c:        4.6 ( 1.00x)
      intra_pred_dc_left_w16_8bpc_lsx:      7.5 ( 0.62x)
      intra_pred_dc_left_w32_8bpc_c:       10.3 ( 1.00x)
      intra_pred_dc_left_w32_8bpc_lsx:      7.8 ( 1.32x)
      intra_pred_dc_left_w64_8bpc_c:       18.7 ( 1.00x)
      intra_pred_dc_left_w64_8bpc_lsx:      6.6 ( 2.83x)
      intra_pred_dc_top_w4_8bpc_c:          0.9 ( 1.00x)
      intra_pred_dc_top_w4_8bpc_lsx:        0.8 ( 1.10x)
      intra_pred_dc_top_w8_8bpc_c:          1.9 ( 1.00x)
      intra_pred_dc_top_w8_8bpc_lsx:        2.8 ( 0.67x)
      intra_pred_dc_top_w16_8bpc_c:         4.2 ( 1.00x)
      intra_pred_dc_top_w16_8bpc_lsx:       5.5 ( 0.77x)
      intra_pred_dc_top_w32_8bpc_c:        10.4 ( 1.00x)
      intra_pred_dc_top_w32_8bpc_lsx:       6.7 ( 1.54x)
      intra_pred_dc_top_w64_8bpc_c:        19.9 ( 1.00x)
      intra_pred_dc_top_w64_8bpc_lsx:       6.9 ( 2.87x)
      
      Change-Id: Ib5349e2430302da0424a474ce0fedc457439c761
      2a9cbcc2
    • pengxu's avatar
      Loongarch: Optimized cdef_filter_block 4x4,4x8,8x8 8bpc function by LSX · 62c47f35
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      cdef_filter_4x4_01_8bpc_c:      420.8 ( 1.00x)
      cdef_filter_4x4_01_8bpc_lsx:    117.2 ( 3.59x)
      cdef_filter_4x4_10_8bpc_c:      265.8 ( 1.00x)
      cdef_filter_4x4_10_8bpc_lsx:     98.9 ( 2.69x)
      cdef_filter_4x4_11_8bpc_c:     1036.2 ( 1.00x)
      cdef_filter_4x4_11_8bpc_lsx:    169.6 ( 6.11x)
      cdef_filter_4x8_01_8bpc_c:      802.6 ( 1.00x)
      cdef_filter_4x8_01_8bpc_lsx:    206.1 ( 3.89x)
      cdef_filter_4x8_10_8bpc_c:      489.1 ( 1.00x)
      cdef_filter_4x8_10_8bpc_lsx:    167.4 ( 2.92x)
      cdef_filter_4x8_11_8bpc_c:     2028.9 ( 1.00x)
      cdef_filter_4x8_11_8bpc_lsx:    309.4 ( 6.56x)
      cdef_filter_8x8_01_8bpc_c:     1562.2 ( 1.00x)
      cdef_filter_8x8_01_8bpc_lsx:    295.3 ( 5.29x)
      cdef_filter_8x8_10_8bpc_c:      949.4 ( 1.00x)
      cdef_filter_8x8_10_8bpc_lsx:    207.6 ( 4.57x)
      cdef_filter_8x8_11_8bpc_c:     4009.6 ( 1.00x)
      cdef_filter_8x8_11_8bpc_lsx:    466.8 ( 8.59x)
      
      Change-Id: I8cd43426a27055e18c44a7701fa50f8835c712be
      62c47f35
    • jinbo's avatar
      Refine mc_put_8tap · fa7b72d0
      jinbo authored and Hecai Yuan's avatar Hecai Yuan committed
      Performance speedup over lsx is around 68%~156%.
      
      Change-Id: I0b39cd0e05e3cbd84fded121d29a91ea2a620f03
      fa7b72d0
    • guxiwei's avatar
      msac: Add msac_decode_bool_equia_lsx and msac_decode_hi_tok_lsx · 02309b9f
      guxiwei authored and Hecai Yuan's avatar Hecai Yuan committed
      The performance data is as follows:
      msac_decode_bool_equi_c:             0.4 ( 1.00x)
      msac_decode_bool_equi_lsx:           0.3 ( 1.07x)
      msac_decode_hi_tok_c:                1.8 ( 1.00x)
      msac_decode_hi_tok_lsx:              1.4 ( 1.27x)
      
      Change-Id: Ic2f2678cf699bb22c579424af71ae2603e228482
      02309b9f
    • pengxu's avatar
      Loongarch: Optimized cdef_find_dir_8bpc function by LSX · 2154425f
      pengxu authored and Hecai Yuan's avatar Hecai Yuan committed
      cdef_dir_8bpc_c:                 28.8 ( 1.00x)
      cdef_dir_8bpc_lsx:               19.1 ( 1.51x)
      
      Change-Id: Ic7c1f32c5b1733b011f4c448cffc93f745b564f5
      2154425f
    • Hecai Yuan's avatar
      loongarch: opt inv_txfm_add_identity_identity_8x32_8bpc_lsx · f6ffdc90
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Relative speedup over C code:
      
      inv_txfm_add_8x32_identity_identity_0_8bpc_c:       126.1 ( 1.00x)
      inv_txfm_add_8x32_identity_identity_0_8bpc_lsx:       1.6 (78.59x)
      inv_txfm_add_8x32_identity_identity_1_8bpc_c:       136.9 ( 1.00x)
      inv_txfm_add_8x32_identity_identity_1_8bpc_lsx:       1.6 (85.31x)
      inv_txfm_add_8x32_identity_identity_2_8bpc_c:       148.0 ( 1.00x)
      inv_txfm_add_8x32_identity_identity_2_8bpc_lsx:       3.3 (45.47x)
      inv_txfm_add_8x32_identity_identity_3_8bpc_c:       159.4 ( 1.00x)
      inv_txfm_add_8x32_identity_identity_3_8bpc_lsx:       4.9 (32.78x)
      inv_txfm_add_8x32_identity_identity_4_8bpc_c:       170.2 ( 1.00x)
      inv_txfm_add_8x32_identity_identity_4_8bpc_lsx:       6.5 (26.17x)
      
      Change-Id: Iabda6efcd8a17d26a205f90757dfea85af48848f
      f6ffdc90
    • Hecai Yuan's avatar
      loongarch: Minor improvement on identity4*, identity8* and dct32* · 5de878a4
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      1. remove the code about identity8 in the 4x8/8x8/8x16 series
      2. modify the code of the function dct_dct_8x32/32x32/64x64
      3. modify the code about identity4 in the 4x4/4x8/8x4 series
      
      After the modification, function performance has been improved by 20%
      
      Change-Id: I1bc2e0fb25e508faf9fc220333460a99be3f5e49
      5de878a4
    • Hecai Yuan's avatar
      loongarch: add lsx implementation of itx_8bpc.add_8x16 series function for 8 bpc · 2fc65660
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Relative speedup over C code:
      
      inv_txfm_add_8x16_adst_adst_0_8bpc_c: 208.1
      inv_txfm_add_8x16_adst_adst_0_8bpc_lsx: 31.3
      inv_txfm_add_8x16_adst_adst_1_8bpc_c: 208.4
      inv_txfm_add_8x16_adst_adst_1_8bpc_lsx: 31.3
      inv_txfm_add_8x16_adst_adst_2_8bpc_c: 208.1
      inv_txfm_add_8x16_adst_adst_2_8bpc_lsx: 31.3
      inv_txfm_add_8x16_adst_dct_0_8bpc_c: 204.0
      inv_txfm_add_8x16_adst_dct_0_8bpc_lsx: 27.2
      inv_txfm_add_8x16_adst_dct_1_8bpc_c: 204.0
      inv_txfm_add_8x16_adst_dct_1_8bpc_lsx: 27.2
      inv_txfm_add_8x16_adst_dct_2_8bpc_c: 204.0
      inv_txfm_add_8x16_adst_dct_2_8bpc_lsx: 27.2
      inv_txfm_add_8x16_adst_flipadst_0_8bpc_c: 207.9
      inv_txfm_add_8x16_adst_flipadst_0_8bpc_lsx: 31.3
      inv_txfm_add_8x16_adst_flipadst_1_8bpc_c: 208.3
      inv_txfm_add_8x16_adst_flipadst_1_8bpc_lsx: 31.3
      inv_txfm_add_8x16_adst_flipadst_2_8bpc_c: 208.6
      inv_txfm_add_8x16_adst_flipadst_2_8bpc_lsx: 31.3
      inv_txfm_add_8x16_adst_identity_0_8bpc_c: 146.6
      inv_txfm_add_8x16_adst_identity_0_8bpc_lsx: 21.8
      inv_txfm_add_8x16_adst_identity_1_8bpc_c: 146.6
      inv_txfm_add_8x16_adst_identity_1_8bpc_lsx: 21.8
      inv_txfm_add_8x16_adst_identity_2_8bpc_c: 146.6
      inv_txfm_add_8x16_adst_identity_2_8bpc_lsx: 21.8
      inv_txfm_add_8x16_dct_adst_0_8bpc_c: 204.8
      inv_txfm_add_8x16_dct_adst_0_8bpc_lsx: 26.2
      inv_txfm_add_8x16_dct_adst_1_8bpc_c: 204.8
      inv_txfm_add_8x16_dct_adst_1_8bpc_lsx: 26.1
      inv_txfm_add_8x16_dct_adst_2_8bpc_c: 204.8
      inv_txfm_add_8x16_dct_adst_2_8bpc_lsx: 26.2
      inv_txfm_add_8x16_dct_dct_0_8bpc_c: 23.1
      inv_txfm_add_8x16_dct_dct_0_8bpc_lsx: 2.3
      inv_txfm_add_8x16_dct_dct_1_8bpc_c: 200.8
      inv_txfm_add_8x16_dct_dct_1_8bpc_lsx: 21.9
      inv_txfm_add_8x16_dct_dct_2_8bpc_c: 200.7
      inv_txfm_add_8x16_dct_dct_2_8bpc_lsx: 21.9
      inv_txfm_add_8x16_dct_flipadst_0_8bpc_c: 204.6
      inv_txfm_add_8x16_dct_flipadst_0_8bpc_lsx: 26.3
      inv_txfm_add_8x16_dct_flipadst_1_8bpc_c: 204.6
      inv_txfm_add_8x16_dct_flipadst_1_8bpc_lsx: 26.3
      inv_txfm_add_8x16_dct_flipadst_2_8bpc_c: 204.6
      inv_txfm_add_8x16_dct_flipadst_2_8bpc_lsx: 26.3
      inv_txfm_add_8x16_dct_identity_0_8bpc_c: 143.2
      inv_txfm_add_8x16_dct_identity_0_8bpc_lsx: 16.7
      inv_txfm_add_8x16_dct_identity_1_8bpc_c: 142.9
      inv_txfm_add_8x16_dct_identity_1_8bpc_lsx: 16.7
      inv_txfm_add_8x16_dct_identity_2_8bpc_c: 143.5
      inv_txfm_add_8x16_dct_identity_2_8bpc_lsx: 16.7
      inv_txfm_add_8x16_flipadst_adst_0_8bpc_c: 206.5
      inv_txfm_add_8x16_flipadst_adst_0_8bpc_lsx: 31.3
      inv_txfm_add_8x16_flipadst_adst_1_8bpc_c: 206.5
      inv_txfm_add_8x16_flipadst_adst_1_8bpc_lsx: 31.3
      inv_txfm_add_8x16_flipadst_adst_2_8bpc_c: 206.5
      inv_txfm_add_8x16_flipadst_adst_2_8bpc_lsx: 31.3
      inv_txfm_add_8x16_flipadst_dct_0_8bpc_c: 202.5
      inv_txfm_add_8x16_flipadst_dct_0_8bpc_lsx: 26.8
      inv_txfm_add_8x16_flipadst_dct_1_8bpc_c: 202.3
      inv_txfm_add_8x16_flipadst_dct_1_8bpc_lsx: 26.8
      inv_txfm_add_8x16_flipadst_dct_2_8bpc_c: 202.3
      inv_txfm_add_8x16_flipadst_dct_2_8bpc_lsx: 26.8
      inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_c: 206.3
      inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_lsx: 31.3
      inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_c: 206.3
      inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_lsx: 31.3
      inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_c: 206.3
      inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_lsx: 31.3
      inv_txfm_add_8x16_identity_adst_0_8bpc_c: 160.7
      inv_txfm_add_8x16_identity_adst_0_8bpc_lsx: 21.8
      inv_txfm_add_8x16_identity_adst_1_8bpc_c: 160.4
      inv_txfm_add_8x16_identity_adst_1_8bpc_lsx: 21.8
      inv_txfm_add_8x16_identity_adst_2_8bpc_c: 160.1
      inv_txfm_add_8x16_identity_adst_2_8bpc_lsx: 21.8
      inv_txfm_add_8x16_identity_dct_0_8bpc_c: 157.9
      inv_txfm_add_8x16_identity_dct_0_8bpc_lsx: 17.7
      inv_txfm_add_8x16_identity_dct_1_8bpc_c: 156.5
      inv_txfm_add_8x16_identity_dct_1_8bpc_lsx: 17.7
      inv_txfm_add_8x16_identity_dct_2_8bpc_c: 156.8
      inv_txfm_add_8x16_identity_dct_2_8bpc_lsx: 17.7
      inv_txfm_add_8x16_identity_flipadst_0_8bpc_c: 159.9
      inv_txfm_add_8x16_identity_flipadst_0_8bpc_lsx: 21.8
      inv_txfm_add_8x16_identity_flipadst_1_8bpc_c: 159.9
      inv_txfm_add_8x16_identity_flipadst_1_8bpc_lsx: 21.8
      inv_txfm_add_8x16_identity_flipadst_2_8bpc_c: 160.0
      inv_txfm_add_8x16_identity_flipadst_2_8bpc_lsx: 21.8
      inv_txfm_add_8x16_identity_identity_0_8bpc_c: 98.3
      inv_txfm_add_8x16_identity_identity_0_8bpc_lsx: 12.3
      inv_txfm_add_8x16_identity_identity_1_8bpc_c: 98.0
      inv_txfm_add_8x16_identity_identity_1_8bpc_lsx: 12.3
      inv_txfm_add_8x16_identity_identity_2_8bpc_c: 98.1
      inv_txfm_add_8x16_identity_identity_2_8bpc_lsx: 12.3
      
      Change-Id: Ida8d71e4eff782b9f81e0ad426eaa078b68528cf
      2fc65660
    • Hecai Yuan's avatar
      loongarch: add lsx implementation of itx_8bpc.add_4x16 series function for 8 bpc · 643ae52b
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Relative speedup over C code:
      
      inv_txfm_add_4x16_adst_adst_0_8bpc_c: 91.1
      inv_txfm_add_4x16_adst_adst_0_8bpc_lsx: 18.2
      inv_txfm_add_4x16_adst_adst_1_8bpc_c: 91.1
      inv_txfm_add_4x16_adst_adst_1_8bpc_lsx: 18.2
      inv_txfm_add_4x16_adst_adst_2_8bpc_c: 91.1
      inv_txfm_add_4x16_adst_adst_2_8bpc_lsx: 18.2
      inv_txfm_add_4x16_adst_dct_0_8bpc_c: 89.5
      inv_txfm_add_4x16_adst_dct_0_8bpc_lsx: 14.3
      inv_txfm_add_4x16_adst_dct_1_8bpc_c: 89.5
      inv_txfm_add_4x16_adst_dct_1_8bpc_lsx: 14.3
      inv_txfm_add_4x16_adst_dct_2_8bpc_c: 89.5
      inv_txfm_add_4x16_adst_dct_2_8bpc_lsx: 14.3
      inv_txfm_add_4x16_adst_flipadst_0_8bpc_c: 91.8
      inv_txfm_add_4x16_adst_flipadst_0_8bpc_lsx: 18.2
      inv_txfm_add_4x16_adst_flipadst_1_8bpc_c: 91.7
      inv_txfm_add_4x16_adst_flipadst_1_8bpc_lsx: 18.2
      inv_txfm_add_4x16_adst_flipadst_2_8bpc_c: 91.8
      inv_txfm_add_4x16_adst_flipadst_2_8bpc_lsx: 18.2
      inv_txfm_add_4x16_adst_identity_0_8bpc_c: 60.5
      inv_txfm_add_4x16_adst_identity_0_8bpc_lsx: 6.3
      inv_txfm_add_4x16_adst_identity_1_8bpc_c: 60.5
      inv_txfm_add_4x16_adst_identity_1_8bpc_lsx: 6.3
      inv_txfm_add_4x16_adst_identity_2_8bpc_c: 60.5
      inv_txfm_add_4x16_adst_identity_2_8bpc_lsx: 6.3
      inv_txfm_add_4x16_dct_adst_0_8bpc_c: 92.7
      inv_txfm_add_4x16_dct_adst_0_8bpc_lsx: 18.4
      inv_txfm_add_4x16_dct_adst_1_8bpc_c: 92.7
      inv_txfm_add_4x16_dct_adst_1_8bpc_lsx: 18.4
      inv_txfm_add_4x16_dct_adst_2_8bpc_c: 92.7
      inv_txfm_add_4x16_dct_adst_2_8bpc_lsx: 18.4
      inv_txfm_add_4x16_dct_dct_0_8bpc_c: 13.7
      inv_txfm_add_4x16_dct_dct_0_8bpc_lsx: 1.9
      inv_txfm_add_4x16_dct_dct_1_8bpc_c: 90.6
      inv_txfm_add_4x16_dct_dct_1_8bpc_lsx: 14.5
      inv_txfm_add_4x16_dct_dct_2_8bpc_c: 90.6
      inv_txfm_add_4x16_dct_dct_2_8bpc_lsx: 14.5
      inv_txfm_add_4x16_dct_flipadst_0_8bpc_c: 93.3
      inv_txfm_add_4x16_dct_flipadst_0_8bpc_lsx: 18.6
      inv_txfm_add_4x16_dct_flipadst_1_8bpc_c: 93.4
      inv_txfm_add_4x16_dct_flipadst_1_8bpc_lsx: 18.6
      inv_txfm_add_4x16_dct_flipadst_2_8bpc_c: 93.4
      inv_txfm_add_4x16_dct_flipadst_2_8bpc_lsx: 18.6
      inv_txfm_add_4x16_dct_identity_0_8bpc_c: 62.1
      inv_txfm_add_4x16_dct_identity_0_8bpc_lsx: 6.5
      inv_txfm_add_4x16_dct_identity_1_8bpc_c: 62.1
      inv_txfm_add_4x16_dct_identity_1_8bpc_lsx: 6.5
      inv_txfm_add_4x16_dct_identity_2_8bpc_c: 62.1
      inv_txfm_add_4x16_dct_identity_2_8bpc_lsx: 6.5
      inv_txfm_add_4x16_flipadst_adst_0_8bpc_c: 92.2
      inv_txfm_add_4x16_flipadst_adst_0_8bpc_lsx: 18.1
      inv_txfm_add_4x16_flipadst_adst_1_8bpc_c: 92.3
      inv_txfm_add_4x16_flipadst_adst_1_8bpc_lsx: 18.1
      inv_txfm_add_4x16_flipadst_adst_2_8bpc_c: 92.2
      inv_txfm_add_4x16_flipadst_adst_2_8bpc_lsx: 18.1
      inv_txfm_add_4x16_flipadst_dct_0_8bpc_c: 90.6
      inv_txfm_add_4x16_flipadst_dct_0_8bpc_lsx: 14.3
      inv_txfm_add_4x16_flipadst_dct_1_8bpc_c: 90.6
      inv_txfm_add_4x16_flipadst_dct_1_8bpc_lsx: 14.3
      inv_txfm_add_4x16_flipadst_dct_2_8bpc_c: 90.6
      inv_txfm_add_4x16_flipadst_dct_2_8bpc_lsx: 14.3
      inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_c: 92.9
      inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_lsx: 18.2
      inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_c: 92.9
      inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_lsx: 18.2
      inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_c: 92.9
      inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_lsx: 18.2
      inv_txfm_add_4x16_flipadst_identity_0_8bpc_c: 61.8
      inv_txfm_add_4x16_flipadst_identity_0_8bpc_lsx: 6.3
      inv_txfm_add_4x16_flipadst_identity_1_8bpc_c: 61.8
      inv_txfm_add_4x16_flipadst_identity_1_8bpc_lsx: 6.3
      inv_txfm_add_4x16_flipadst_identity_2_8bpc_c: 61.8
      inv_txfm_add_4x16_flipadst_identity_2_8bpc_lsx: 6.3
      inv_txfm_add_4x16_identity_adst_0_8bpc_c: 83.1
      inv_txfm_add_4x16_identity_adst_0_8bpc_lsx: 17.8
      inv_txfm_add_4x16_identity_adst_1_8bpc_c: 83.0
      inv_txfm_add_4x16_identity_adst_1_8bpc_lsx: 17.8
      inv_txfm_add_4x16_identity_adst_2_8bpc_c: 83.0
      inv_txfm_add_4x16_identity_adst_2_8bpc_lsx: 17.8
      inv_txfm_add_4x16_identity_dct_0_8bpc_c: 81.4
      inv_txfm_add_4x16_identity_dct_0_8bpc_lsx: 13.9
      inv_txfm_add_4x16_identity_dct_1_8bpc_c: 81.4
      inv_txfm_add_4x16_identity_dct_1_8bpc_lsx: 13.9
      inv_txfm_add_4x16_identity_dct_2_8bpc_c: 81.4
      inv_txfm_add_4x16_identity_dct_2_8bpc_lsx: 13.9
      inv_txfm_add_4x16_identity_flipadst_0_8bpc_c: 84.1
      inv_txfm_add_4x16_identity_flipadst_0_8bpc_lsx: 17.8
      inv_txfm_add_4x16_identity_flipadst_1_8bpc_c: 84.0
      inv_txfm_add_4x16_identity_flipadst_1_8bpc_lsx: 17.8
      inv_txfm_add_4x16_identity_flipadst_2_8bpc_c: 83.9
      inv_txfm_add_4x16_identity_flipadst_2_8bpc_lsx: 17.8
      inv_txfm_add_4x16_identity_identity_0_8bpc_c: 52.4
      inv_txfm_add_4x16_identity_identity_0_8bpc_lsx: 5.5
      inv_txfm_add_4x16_identity_identity_1_8bpc_c: 52.4
      inv_txfm_add_4x16_identity_identity_1_8bpc_lsx: 5.5
      inv_txfm_add_4x16_identity_identity_2_8bpc_c: 52.4
      inv_txfm_add_4x16_identity_identity_2_8bpc_lsx: 5.5
      
      Change-Id: I36322071eeea45df9289f2b1d533ee937904aec2
      643ae52b
    • Hecai Yuan's avatar
      loongarch: add lsx implementation of itx_8bpc.add_4x8 series function for 8 bpc · d60d93a5
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      Relative speedup over C code:
      
      inv_txfm_add_4x8_adst_adst_0_8bpc_c: 43.8
      inv_txfm_add_4x8_adst_adst_0_8bpc_lsx: 8.6
      inv_txfm_add_4x8_adst_adst_1_8bpc_c: 43.8
      inv_txfm_add_4x8_adst_adst_1_8bpc_lsx: 8.6
      inv_txfm_add_4x8_adst_dct_0_8bpc_c: 43.0
      inv_txfm_add_4x8_adst_dct_0_8bpc_lsx: 6.5
      inv_txfm_add_4x8_adst_dct_1_8bpc_c: 43.0
      inv_txfm_add_4x8_adst_dct_1_8bpc_lsx: 6.5
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_c: 44.1
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_lsx: 8.8
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_c: 44.1
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_lsx: 8.8
      inv_txfm_add_4x8_adst_identity_0_8bpc_c: 31.3
      inv_txfm_add_4x8_adst_identity_0_8bpc_lsx: 2.9
      inv_txfm_add_4x8_adst_identity_1_8bpc_c: 31.3
      inv_txfm_add_4x8_adst_identity_1_8bpc_lsx: 2.9
      inv_txfm_add_4x8_dct_adst_0_8bpc_c: 46.3
      inv_txfm_add_4x8_dct_adst_0_8bpc_lsx: 8.8
      inv_txfm_add_4x8_dct_adst_1_8bpc_c: 46.3
      inv_txfm_add_4x8_dct_adst_1_8bpc_lsx: 8.8
      inv_txfm_add_4x8_dct_dct_0_8bpc_c: 7.3
      inv_txfm_add_4x8_dct_dct_0_8bpc_lsx: 1.5
      inv_txfm_add_4x8_dct_dct_1_8bpc_c: 45.7
      inv_txfm_add_4x8_dct_dct_1_8bpc_lsx: 6.7
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_c: 46.7
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_lsx: 8.8
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_c: 46.7
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_lsx: 8.8
      inv_txfm_add_4x8_dct_identity_0_8bpc_c: 33.8
      inv_txfm_add_4x8_dct_identity_0_8bpc_lsx: 2.9
      inv_txfm_add_4x8_dct_identity_1_8bpc_c: 33.8
      inv_txfm_add_4x8_dct_identity_1_8bpc_lsx: 2.9
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_c: 44.0
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_lsx: 8.6
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_c: 43.9
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_lsx: 8.6
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_c: 43.3
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_lsx: 6.5
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_c: 43.4
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_lsx: 6.5
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_c: 44.4
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_lsx: 8.8
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_c: 44.4
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_lsx: 8.8
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_c: 31.5
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_lsx: 2.9
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_c: 31.5
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_lsx: 2.9
      inv_txfm_add_4x8_identity_adst_0_8bpc_c: 38.9
      inv_txfm_add_4x8_identity_adst_0_8bpc_lsx: 8.2
      inv_txfm_add_4x8_identity_adst_1_8bpc_c: 38.9
      inv_txfm_add_4x8_identity_adst_1_8bpc_lsx: 8.2
      inv_txfm_add_4x8_identity_dct_0_8bpc_c: 38.1
      inv_txfm_add_4x8_identity_dct_0_8bpc_lsx: 6.1
      inv_txfm_add_4x8_identity_dct_1_8bpc_c: 38.1
      inv_txfm_add_4x8_identity_dct_1_8bpc_lsx: 6.1
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_c: 39.2
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_lsx: 8.3
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_c: 39.2
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_lsx: 8.3
      inv_txfm_add_4x8_identity_identity_0_8bpc_c: 26.4
      inv_txfm_add_4x8_identity_identity_0_8bpc_lsx: 2.4
      inv_txfm_add_4x8_identity_identity_1_8bpc_c: 26.4
      inv_txfm_add_4x8_identity_identity_1_8bpc_lsx: 2.4
      
      Change-Id: Ibbaeca98118774a261cf55afd581196d93ac2004
      d60d93a5
    • Hecai Yuan's avatar
      loongarch: Opt one functions of itx_8bpc.add_16x32 series · 74e0eeb5
      Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
      1. inv_txfm_add_dct_dct_16x32
      
      Relative speedup over C code:
      
      inv_txfm_add_16x32_dct_dct_0_8bpc_c: 63.4
      inv_txfm_add_16x32_dct_dct_0_8bpc_lsx: 3.3
      inv_txfm_add_16x32_dct_dct_1_8bpc_c: 687.0
      inv_txfm_add_16x32_dct_dct_1_8bpc_lsx: 55.7
      inv_txfm_add_16x32_dct_dct_2_8bpc_c: 686.4
      inv_txfm_add_16x32_dct_dct_2_8bpc_lsx: 55.6
      inv_txfm_add_16x32_dct_dct_3_8bpc_c: 686.4
      inv_txfm_add_16x32_dct_dct_3_8bpc_lsx: 55.5
      inv_txfm_add_16x32_dct_dct_4_8bpc_c: 686.4
      inv_txfm_add_16x32_dct_dct_4_8bpc_lsx: 55.6
      
      Change-Id: I9d22b8b3534b7ba17f6e85e42d08eb3165e2e8cb
      74e0eeb5
Loading