Loongarch: multiple SIMD optimization functions are added
- Sep 30, 2024
-
-
Change-Id: I78fe788113ff2487ba1ce2e7d0c7d7c78c5a8c58
-
Change-Id: I1566e8145d36296f2c76107cf15fc2cc7ac0ecc7
62a51df1 -
The performance data is as follows: save_tmvs_c: 3938.6 ( 1.00x) save_tmvs_lsx: 1355.3 ( 2.91x)
757f294a -
bench performance before: lpf_h_sb_y_w16_8bpc_c: 117.0 ( 1.00x) lpf_h_sb_y_w16_8bpc_lsx: 33.9 ( 3.46x) lpf_v_sb_y_w16_8bpc_c: 132.1 ( 1.00x) lpf_v_sb_y_w16_8bpc_lsx: 59.7 ( 2.21x) bench performance after: lpf_h_sb_y_w16_8bpc_c: 114.9 ( 1.00x) lpf_h_sb_y_w16_8bpc_lsx: 32.0 ( 3.59x) lpf_v_sb_y_w16_8bpc_c: 132.5 ( 1.00x) lpf_v_sb_y_w16_8bpc_lsx: 28.1 ( 4.72x) Change-Id: Ie64e164a9416c438f6b3881ce18fb42e2ddd073d
3d96175d -
sgr_3x3_8bpc_c: 27233.1 ( 1.00x) sgr_3x3_8bpc_lsx: 12874.7 ( 2.12x) sgr_3x3_8bpc_lasx: 10183.7 ( 2.67x) Change-Id: I2aa469e8560733d6191396186bf776a12ad6e4a3
70582027 -
before: warp_8x8_8bpc_c: 109.8 ( 1.00x) warp_8x8_8bpc_lsx: 44.6 ( 2.46x) warp_8x8t_8bpc_c: 97.5 ( 1.00x) warp_8x8t_8bpc_lsx: 43.7 ( 2.23x) after: warp_8x8_8bpc_c: 109.8 ( 1.00x) warp_8x8_8bpc_lsx: 39.2 ( 2.80x) warp_8x8t_8bpc_c: 97.5 ( 1.00x) warp_8x8t_8bpc_lsx: 37.9 ( 2.57x) Change-Id: I11728c2c30821b8e2b1c85208710dfe5d1c1269c
96d6e472 -
mct_8tap_regular_w8_h_8bpc_c: 47.1 ( 1.00x) mct_8tap_regular_w8_h_8bpc_lsx: 6.3 ( 7.46x) mct_8tap_regular_w8_h_8bpc_lasx: 4.4 (10.80x) mct_8tap_regular_w8_hv_8bpc_c: 118.9 ( 1.00x) mct_8tap_regular_w8_hv_8bpc_lsx: 19.2 ( 6.20x) mct_8tap_regular_w8_hv_8bpc_lasx: 13.7 ( 8.69x) mct_8tap_regular_w8_v_8bpc_c: 60.3 ( 1.00x) mct_8tap_regular_w8_v_8bpc_lsx: 5.4 (11.08x) mct_8tap_regular_w8_v_8bpc_lasx: 3.3 (18.33x) Change-Id: I1140f6ffbd738166f2581bc9111ebbdf6f9fa72c
b9e9a0ef -
wiener_5tap_8bpc_c: 18382.0 ( 1.00x) wiener_5tap_8bpc_lsx: 4166.9 ( 4.41x) wiener_5tap_8bpc_lasx: 2832.2 ( 6.49x) wiener_7tap_8bpc_c: 18339.6 ( 1.00x) wiener_7tap_8bpc_lsx: 4168.3 ( 4.40x) wiener_7tap_8bpc_lasx: 2832.5 ( 6.47x) Change-Id: I183a8cb008203fb61683b0543d9409d58d141a2e
af11a10a -
load_tmvs_c: 9702.0 ( 1.00x) load_tmvs_lsx: 7857.0 ( 1.23x)
90a9549b -
intra_pred_z1_w4_8bpc_c: 16.5 ( 1.00x) intra_pred_z1_w4_8bpc_lsx: 7.1 ( 2.31x) intra_pred_z1_w8_8bpc_c: 31.9 ( 1.00x) intra_pred_z1_w8_8bpc_lsx: 10.0 ( 3.20x) intra_pred_z1_w16_8bpc_c: 80.1 ( 1.00x) intra_pred_z1_w16_8bpc_lsx: 20.2 ( 3.96x) intra_pred_z1_w32_8bpc_c: 185.8 ( 1.00x) intra_pred_z1_w32_8bpc_lsx: 40.8 ( 4.55x) intra_pred_z1_w64_8bpc_c: 511.1 ( 1.00x) intra_pred_z1_w64_8bpc_lsx: 99.0 ( 5.16x) Change-Id: Id7591e9b87e5b4d7fc3f438397e25dc6ca8e7f91
411fc219 -
emu_edge_w4_8bpc_c: 9.0 ( 1.00x) emu_edge_w4_8bpc_lsx: 6.7 ( 1.34x) emu_edge_w8_8bpc_c: 12.9 ( 1.00x) emu_edge_w8_8bpc_lsx: 9.2 ( 1.40x) emu_edge_w16_8bpc_c: 20.0 ( 1.00x) emu_edge_w16_8bpc_lsx: 16.3 ( 1.23x) emu_edge_w32_8bpc_c: 44.6 ( 1.00x) emu_edge_w32_8bpc_lsx: 33.3 ( 1.34x) emu_edge_w64_8bpc_c: 79.9 ( 1.00x) emu_edge_w64_8bpc_lsx: 66.2 ( 1.21x) emu_edge_w128_8bpc_c: 193.9 ( 1.00x) emu_edge_w128_8bpc_lsx: 197.8 ( 0.98x) Change-Id: I180c94d311509740b03793419d5790a931532980
7c63bb1b -
Now checkasm calls the test function 'func_new' through the wrapper 'checked_call' instead of calling it directly. The purpose of the wrapper is to check if 'func_new' correctly saves and restores static registers. The wrapper writes dirty values to the static registers, and after calling 'func_new', it checks if the dirty values in the static registers remain consistent. Change-Id: Ia9290b55ab0f2dd87801f6fd175813d3f717d851
e3101ddc -
intra_pred_filter_w4_8bpc_c: 17.9 ( 1.00x) intra_pred_filter_w4_8bpc_lsx: 8.9 ( 2.00x) intra_pred_filter_w8_8bpc_c: 55.3 ( 1.00x) intra_pred_filter_w8_8bpc_lsx: 23.8 ( 2.33x) intra_pred_filter_w16_8bpc_c: 109.4 ( 1.00x) intra_pred_filter_w16_8bpc_lsx: 49.1 ( 2.23x) intra_pred_filter_w32_8bpc_c: 270.2 ( 1.00x) intra_pred_filter_w32_8bpc_lsx: 126.1 ( 2.14x) Change-Id: Ic4c23cb1d54d5f8557c31cdfbbd54f8beaaa32c2
7f891597 -
1. inv_txfm_add_dct_dct_32x16_8bpc_lsx 2. inv_txfm_add_dct_dct_32x8_8bpc_lsx 3. inv_txfm_add_dct_dct_64x32_8bpc_lsx 4. inv_txfm_add_adst_flipadst_16x16_8bpc_lsx 5. inv_txfm_add_flipadst_adst_16x16_8bpc_lsx 6. inv_txfm_add_adst_adst_16x16_8bpc_lasx Relative speedup over C code: inv_txfm_add_32x16_dct_dct_0_8bpc_c: 78.4 ( 1.00x) inv_txfm_add_32x16_dct_dct_0_8bpc_lsx: 5.7 (13.81x) inv_txfm_add_32x16_dct_dct_1_8bpc_c: 710.1 ( 1.00x) inv_txfm_add_32x16_dct_dct_1_8bpc_lsx: 102.9 ( 6.90x) inv_txfm_add_32x16_dct_dct_2_8bpc_c: 918.0 ( 1.00x) inv_txfm_add_32x16_dct_dct_2_8bpc_lsx: 103.2 ( 8.90x) inv_txfm_add_32x16_dct_dct_3_8bpc_c: 914.3 ( 1.00x) inv_txfm_add_32x16_dct_dct_3_8bpc_lsx: 103.2 ( 8.86x) inv_txfm_add_32x16_dct_dct_4_8bpc_c: 929.8 ( 1.00x) inv_txfm_add_32x16_dct_dct_4_8bpc_lsx: 102.9 ( 9.03x) inv_txfm_add_32x8_dct_dct_0_8bpc_c: 39.6 ( 1.00x) inv_txfm_add_32x8_dct_dct_0_8bpc_lsx: 3.0 (13.10x) inv_txfm_add_32x8_dct_dct_1_8bpc_c: 431.6 ( 1.00x) inv_txfm_add_32x8_dct_dct_1_8bpc_lsx: 42.6 (10.13x) inv_txfm_add_32x8_dct_dct_2_8bpc_c: 431.5 ( 1.00x) inv_txfm_add_32x8_dct_dct_2_8bpc_lsx: 42.6 (10.13x) inv_txfm_add_32x8_dct_dct_3_8bpc_c: 432.0 ( 1.00x) inv_txfm_add_32x8_dct_dct_3_8bpc_lsx: 42.6 (10.14x) inv_txfm_add_32x8_dct_dct_4_8bpc_c: 431.3 ( 1.00x) inv_txfm_add_32x8_dct_dct_4_8bpc_lsx: 42.6 (10.13x) inv_txfm_add_64x32_dct_dct_0_8bpc_c: 304.3 ( 1.00x) inv_txfm_add_64x32_dct_dct_0_8bpc_lsx: 20.3 (15.01x) inv_txfm_add_64x32_dct_dct_1_8bpc_c: 2743.1 ( 1.00x) inv_txfm_add_64x32_dct_dct_1_8bpc_lsx: 270.9 (10.13x) inv_txfm_add_64x32_dct_dct_2_8bpc_c: 3197.1 ( 1.00x) inv_txfm_add_64x32_dct_dct_2_8bpc_lsx: 327.7 ( 9.76x) inv_txfm_add_64x32_dct_dct_3_8bpc_c: 3638.3 ( 1.00x) inv_txfm_add_64x32_dct_dct_3_8bpc_lsx: 383.7 ( 9.48x) inv_txfm_add_64x32_dct_dct_4_8bpc_c: 4084.5 ( 1.00x) inv_txfm_add_64x32_dct_dct_4_8bpc_lsx: 441.7 ( 9.25x) inv_txfm_add_16x16_adst_flipadst_0_8bpc_c: 277.3 ( 1.00x) inv_txfm_add_16x16_adst_flipadst_0_8bpc_lsx: 58.7 ( 4.72x) inv_txfm_add_16x16_adst_flipadst_1_8bpc_c: 358.1 ( 1.00x) inv_txfm_add_16x16_adst_flipadst_1_8bpc_lsx: 58.7 ( 6.10x) inv_txfm_add_16x16_adst_flipadst_2_8bpc_c: 449.3 ( 1.00x) inv_txfm_add_16x16_adst_flipadst_2_8bpc_lsx: 58.7 ( 7.65x) inv_txfm_add_16x16_flipadst_adst_0_8bpc_c: 277.2 ( 1.00x) inv_txfm_add_16x16_flipadst_adst_0_8bpc_lsx: 58.7 ( 4.72x) inv_txfm_add_16x16_flipadst_adst_1_8bpc_c: 358.7 ( 1.00x) inv_txfm_add_16x16_flipadst_adst_1_8bpc_lsx: 58.7 ( 6.11x) inv_txfm_add_16x16_flipadst_adst_2_8bpc_c: 450.4 ( 1.00x) inv_txfm_add_16x16_flipadst_adst_2_8bpc_lsx: 58.7 ( 7.67x) inv_txfm_add_16x16_adst_adst_0_8bpc_c: 253.4 ( 1.00x) inv_txfm_add_16x16_adst_adst_0_8bpc_lasx: 23.1 (10.98x) inv_txfm_add_16x16_adst_adst_1_8bpc_c: 325.2 ( 1.00x) inv_txfm_add_16x16_adst_adst_1_8bpc_lasx: 23.1 (14.08x) inv_txfm_add_16x16_adst_adst_2_8bpc_c: 405.9 ( 1.00x) inv_txfm_add_16x16_adst_adst_2_8bpc_lasx: 23.1 (17.56x) Change-Id: Iaa5419a830c3308e2c4c9ac5b3699c3a971301ed
f398bf96 -
Relative speedup over C code: inv_txfm_add_16x8_adst_adst_0_8bpc_c: 127.7 ( 1.00x) inv_txfm_add_16x8_adst_adst_0_8bpc_lsx: 29.6 ( 4.32x) inv_txfm_add_16x8_adst_adst_1_8bpc_c: 206.6 ( 1.00x) inv_txfm_add_16x8_adst_adst_1_8bpc_lsx: 29.6 ( 6.98x) inv_txfm_add_16x8_adst_adst_2_8bpc_c: 206.6 ( 1.00x) inv_txfm_add_16x8_adst_adst_2_8bpc_lsx: 29.6 ( 6.99x) inv_txfm_add_16x8_adst_dct_0_8bpc_c: 126.7 ( 1.00x) inv_txfm_add_16x8_adst_dct_0_8bpc_lsx: 25.8 ( 4.91x) inv_txfm_add_16x8_adst_dct_1_8bpc_c: 205.1 ( 1.00x) inv_txfm_add_16x8_adst_dct_1_8bpc_lsx: 25.8 ( 7.94x) inv_txfm_add_16x8_adst_dct_2_8bpc_c: 205.2 ( 1.00x) inv_txfm_add_16x8_adst_dct_2_8bpc_lsx: 25.8 ( 7.94x) inv_txfm_add_16x8_adst_flipadst_0_8bpc_c: 128.3 ( 1.00x) inv_txfm_add_16x8_adst_flipadst_0_8bpc_lsx: 29.8 ( 4.30x) inv_txfm_add_16x8_adst_flipadst_1_8bpc_c: 207.2 ( 1.00x) inv_txfm_add_16x8_adst_flipadst_1_8bpc_lsx: 29.9 ( 6.94x) inv_txfm_add_16x8_adst_flipadst_2_8bpc_c: 207.1 ( 1.00x) inv_txfm_add_16x8_adst_flipadst_2_8bpc_lsx: 29.8 ( 6.94x) inv_txfm_add_16x8_adst_identity_0_8bpc_c: 78.3 ( 1.00x) inv_txfm_add_16x8_adst_identity_0_8bpc_lsx: 18.6 ( 4.21x) inv_txfm_add_16x8_adst_identity_1_8bpc_c: 157.1 ( 1.00x) inv_txfm_add_16x8_adst_identity_1_8bpc_lsx: 18.6 ( 8.45x) inv_txfm_add_16x8_adst_identity_2_8bpc_c: 157.2 ( 1.00x) inv_txfm_add_16x8_adst_identity_2_8bpc_lsx: 18.6 ( 8.46x) inv_txfm_add_16x8_dct_adst_0_8bpc_c: 127.4 ( 1.00x) inv_txfm_add_16x8_dct_adst_0_8bpc_lsx: 25.4 ( 5.02x) inv_txfm_add_16x8_dct_adst_1_8bpc_c: 201.2 ( 1.00x) inv_txfm_add_16x8_dct_adst_1_8bpc_lsx: 25.4 ( 7.93x) inv_txfm_add_16x8_dct_adst_2_8bpc_c: 201.2 ( 1.00x) inv_txfm_add_16x8_dct_adst_2_8bpc_lsx: 25.4 ( 7.93x) inv_txfm_add_16x8_dct_dct_0_8bpc_c: 21.8 ( 1.00x) inv_txfm_add_16x8_dct_dct_0_8bpc_lsx: 2.1 (10.52x) inv_txfm_add_16x8_dct_dct_1_8bpc_c: 200.2 ( 1.00x) inv_txfm_add_16x8_dct_dct_1_8bpc_lsx: 21.6 ( 9.28x) inv_txfm_add_16x8_dct_dct_2_8bpc_c: 200.2 ( 1.00x) inv_txfm_add_16x8_dct_dct_2_8bpc_lsx: 21.6 ( 9.28x) inv_txfm_add_16x8_dct_flipadst_0_8bpc_c: 127.2 ( 1.00x) inv_txfm_add_16x8_dct_flipadst_0_8bpc_lsx: 25.6 ( 4.96x) inv_txfm_add_16x8_dct_flipadst_1_8bpc_c: 201.2 ( 1.00x) inv_txfm_add_16x8_dct_flipadst_1_8bpc_lsx: 25.7 ( 7.84x) inv_txfm_add_16x8_dct_flipadst_2_8bpc_c: 201.7 ( 1.00x) inv_txfm_add_16x8_dct_flipadst_2_8bpc_lsx: 25.7 ( 7.86x) inv_txfm_add_16x8_dct_identity_0_8bpc_c: 77.3 ( 1.00x) inv_txfm_add_16x8_dct_identity_0_8bpc_lsx: 14.5 ( 5.35x) inv_txfm_add_16x8_dct_identity_1_8bpc_c: 151.2 ( 1.00x) inv_txfm_add_16x8_dct_identity_1_8bpc_lsx: 14.5 (10.46x) inv_txfm_add_16x8_dct_identity_2_8bpc_c: 151.5 ( 1.00x) inv_txfm_add_16x8_dct_identity_2_8bpc_lsx: 14.5 (10.48x) inv_txfm_add_16x8_flipadst_adst_0_8bpc_c: 128.5 ( 1.00x) inv_txfm_add_16x8_flipadst_adst_0_8bpc_lsx: 29.7 ( 4.32x) inv_txfm_add_16x8_flipadst_adst_1_8bpc_c: 207.3 ( 1.00x) inv_txfm_add_16x8_flipadst_adst_1_8bpc_lsx: 29.7 ( 6.97x) inv_txfm_add_16x8_flipadst_adst_2_8bpc_c: 207.4 ( 1.00x) inv_txfm_add_16x8_flipadst_adst_2_8bpc_lsx: 29.7 ( 6.98x) inv_txfm_add_16x8_flipadst_dct_0_8bpc_c: 126.8 ( 1.00x) inv_txfm_add_16x8_flipadst_dct_0_8bpc_lsx: 25.9 ( 4.90x) inv_txfm_add_16x8_flipadst_dct_1_8bpc_c: 204.8 ( 1.00x) inv_txfm_add_16x8_flipadst_dct_1_8bpc_lsx: 25.9 ( 7.92x) inv_txfm_add_16x8_flipadst_dct_2_8bpc_c: 205.4 ( 1.00x) inv_txfm_add_16x8_flipadst_dct_2_8bpc_lsx: 25.9 ( 7.94x) inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_c: 128.6 ( 1.00x) inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_lsx: 30.0 ( 4.29x) inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_c: 206.6 ( 1.00x) inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_lsx: 29.9 ( 6.90x) inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_c: 206.5 ( 1.00x) inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_lsx: 29.9 ( 6.90x) inv_txfm_add_16x8_flipadst_identity_0_8bpc_c: 77.8 ( 1.00x) inv_txfm_add_16x8_flipadst_identity_0_8bpc_lsx: 18.6 ( 4.18x) inv_txfm_add_16x8_flipadst_identity_1_8bpc_c: 156.3 ( 1.00x) inv_txfm_add_16x8_flipadst_identity_1_8bpc_lsx: 18.6 ( 8.40x) inv_txfm_add_16x8_flipadst_identity_2_8bpc_c: 156.6 ( 1.00x) inv_txfm_add_16x8_flipadst_identity_2_8bpc_lsx: 18.6 ( 8.42x) inv_txfm_add_16x8_identity_adst_0_8bpc_c: 120.7 ( 1.00x) inv_txfm_add_16x8_identity_adst_0_8bpc_lsx: 21.1 ( 5.71x) inv_txfm_add_16x8_identity_adst_1_8bpc_c: 120.8 ( 1.00x) inv_txfm_add_16x8_identity_adst_1_8bpc_lsx: 21.1 ( 5.71x) inv_txfm_add_16x8_identity_adst_2_8bpc_c: 145.5 ( 1.00x) inv_txfm_add_16x8_identity_adst_2_8bpc_lsx: 21.2 ( 6.88x) inv_txfm_add_16x8_identity_dct_0_8bpc_c: 119.1 ( 1.00x) inv_txfm_add_16x8_identity_dct_0_8bpc_lsx: 17.9 ( 6.67x) inv_txfm_add_16x8_identity_dct_1_8bpc_c: 119.1 ( 1.00x) inv_txfm_add_16x8_identity_dct_1_8bpc_lsx: 17.9 ( 6.67x) inv_txfm_add_16x8_identity_dct_2_8bpc_c: 143.8 ( 1.00x) inv_txfm_add_16x8_identity_dct_2_8bpc_lsx: 17.9 ( 8.06x) inv_txfm_add_16x8_identity_flipadst_0_8bpc_c: 120.7 ( 1.00x) inv_txfm_add_16x8_identity_flipadst_0_8bpc_lsx: 21.3 ( 5.66x) inv_txfm_add_16x8_identity_flipadst_1_8bpc_c: 120.4 ( 1.00x) inv_txfm_add_16x8_identity_flipadst_1_8bpc_lsx: 21.3 ( 5.65x) inv_txfm_add_16x8_identity_flipadst_2_8bpc_c: 144.9 ( 1.00x) inv_txfm_add_16x8_identity_flipadst_2_8bpc_lsx: 21.3 ( 6.80x) inv_txfm_add_16x8_identity_identity_0_8bpc_c: 70.2 ( 1.00x) inv_txfm_add_16x8_identity_identity_0_8bpc_lsx: 9.5 ( 7.38x) inv_txfm_add_16x8_identity_identity_1_8bpc_c: 95.6 ( 1.00x) inv_txfm_add_16x8_identity_identity_1_8bpc_lsx: 9.5 (10.06x) inv_txfm_add_16x8_identity_identity_2_8bpc_c: 95.6 ( 1.00x) inv_txfm_add_16x8_identity_identity_2_8bpc_lsx: 9.5 (10.06x) Change-Id: If1e274cab0e8441297a1eb44bd86be580f4c8f62
13a857d0 -
Relative speedup over C code: inv_txfm_add_16x4_adst_dct_0_8bpc_c: 61.7 ( 1.00x) inv_txfm_add_16x4_adst_dct_0_8bpc_lsx: 17.8 ( 3.46x) inv_txfm_add_16x4_adst_dct_1_8bpc_c: 96.2 ( 1.00x) inv_txfm_add_16x4_adst_dct_1_8bpc_lsx: 17.8 ( 5.39x) inv_txfm_add_16x4_adst_dct_2_8bpc_c: 96.2 ( 1.00x) inv_txfm_add_16x4_adst_dct_2_8bpc_lsx: 17.8 ( 5.39x) inv_txfm_add_16x4_dct_dct_0_8bpc_c: 10.8 ( 1.00x) inv_txfm_add_16x4_dct_dct_0_8bpc_lsx: 0.9 (12.23x) inv_txfm_add_16x4_dct_dct_1_8bpc_c: 94.5 ( 1.00x) inv_txfm_add_16x4_dct_dct_1_8bpc_lsx: 13.6 ( 6.94x) inv_txfm_add_16x4_dct_dct_2_8bpc_c: 94.7 ( 1.00x) inv_txfm_add_16x4_dct_dct_2_8bpc_lsx: 13.6 ( 6.95x) inv_txfm_add_16x4_identity_identity_0_8bpc_c: 42.1 ( 1.00x) inv_txfm_add_16x4_identity_identity_0_8bpc_lsx: 5.1 ( 8.21x) inv_txfm_add_16x4_identity_identity_1_8bpc_c: 53.0 ( 1.00x) inv_txfm_add_16x4_identity_identity_1_8bpc_lsx: 5.1 (10.35x) inv_txfm_add_16x4_identity_identity_2_8bpc_c: 53.0 ( 1.00x) inv_txfm_add_16x4_identity_identity_2_8bpc_lsx: 5.1 (10.35x) Change-Id: I0be4f77e381da390e300070337fff404dcdcb862
843f00e5 -
Loongarch: Optimized cfl_pred_cfl, cfl_pred_cfl_128, cfl_pred_cfl_top and cfl_pred_cfl_left 8bpc functions by LSX cfl_pred_cfl_128_w4_8bpc_c: 19.4 ( 1.00x) cfl_pred_cfl_128_w4_8bpc_lsx: 4.2 ( 4.63x) cfl_pred_cfl_128_w8_8bpc_c: 66.3 ( 1.00x) cfl_pred_cfl_128_w8_8bpc_lsx: 7.3 ( 9.11x) cfl_pred_cfl_128_w16_8bpc_c: 150.1 ( 1.00x) cfl_pred_cfl_128_w16_8bpc_lsx: 14.4 (10.45x) cfl_pred_cfl_128_w32_8bpc_c: 403.6 ( 1.00x) cfl_pred_cfl_128_w32_8bpc_lsx: 34.7 (11.65x) cfl_pred_cfl_left_w4_8bpc_c: 20.5 ( 1.00x) cfl_pred_cfl_left_w4_8bpc_lsx: 4.4 ( 4.63x) cfl_pred_cfl_left_w8_8bpc_c: 67.9 ( 1.00x) cfl_pred_cfl_left_w8_8bpc_lsx: 7.6 ( 8.94x) cfl_pred_cfl_left_w16_8bpc_c: 152.0 ( 1.00x) cfl_pred_cfl_left_w16_8bpc_lsx: 14.6 (10.38x) cfl_pred_cfl_left_w32_8bpc_c: 405.8 ( 1.00x) cfl_pred_cfl_left_w32_8bpc_lsx: 35.0 (11.58x) cfl_pred_cfl_top_w4_8bpc_c: 20.0 ( 1.00x) cfl_pred_cfl_top_w4_8bpc_lsx: 4.4 ( 4.51x) cfl_pred_cfl_top_w8_8bpc_c: 67.6 ( 1.00x) cfl_pred_cfl_top_w8_8bpc_lsx: 7.5 ( 8.99x) cfl_pred_cfl_top_w16_8bpc_c: 152.5 ( 1.00x) cfl_pred_cfl_top_w16_8bpc_lsx: 14.6 (10.41x) cfl_pred_cfl_top_w32_8bpc_c: 408.0 ( 1.00x) cfl_pred_cfl_top_w32_8bpc_lsx: 35.2 (11.58x) cfl_pred_cfl_w4_8bpc_c: 21.1 ( 1.00x) cfl_pred_cfl_w4_8bpc_lsx: 4.8 ( 4.43x) cfl_pred_cfl_w8_8bpc_c: 68.6 ( 1.00x) cfl_pred_cfl_w8_8bpc_lsx: 7.9 ( 8.73x) cfl_pred_cfl_w16_8bpc_c: 154.4 ( 1.00x) cfl_pred_cfl_w16_8bpc_lsx: 15.0 (10.29x) cfl_pred_cfl_w32_8bpc_c: 410.3 ( 1.00x) cfl_pred_cfl_w32_8bpc_lsx: 35.6 (11.54x) Change-Id: I4ec7cc71483298d28379bfbd824e97a0d74d0c23
083cf424 -
pal_pred_w4_8bpc_c: 3.0 ( 1.00x) pal_pred_w4_8bpc_lsx: 0.6 ( 5.46x) pal_pred_w8_8bpc_c: 8.8 ( 1.00x) pal_pred_w8_8bpc_lsx: 0.9 ( 9.49x) pal_pred_w16_8bpc_c: 26.0 ( 1.00x) pal_pred_w16_8bpc_lsx: 1.9 (13.70x) pal_pred_w32_8bpc_c: 60.6 ( 1.00x) pal_pred_w32_8bpc_lsx: 4.0 (15.10x) pal_pred_w64_8bpc_c: 146.9 ( 1.00x) pal_pred_w64_8bpc_lsx: 9.2 (15.97x) Change-Id: I5414f096a23b09c3a512e727b93fa22104d141f9
3f6c845d -
mct_8tap_regular_w4_0_8bpc_c: 3.7 ( 1.00x) mct_8tap_regular_w4_0_8bpc_lsx: 0.9 ( 4.21x) mct_8tap_regular_w4_h_8bpc_c: 15.7 ( 1.00x) mct_8tap_regular_w4_h_8bpc_lsx: 1.7 ( 9.24x) mct_8tap_regular_w4_hv_8bpc_c: 44.1 ( 1.00x) mct_8tap_regular_w4_hv_8bpc_lsx: 6.3 ( 6.96x) mct_8tap_regular_w4_v_8bpc_c: 19.8 ( 1.00x) mct_8tap_regular_w4_v_8bpc_lsx: 2.4 ( 8.21x) mct_8tap_regular_w8_0_8bpc_c: 10.5 ( 1.00x) mct_8tap_regular_w8_0_8bpc_lsx: 1.3 ( 8.27x) mct_8tap_regular_w8_h_8bpc_c: 47.2 ( 1.00x) mct_8tap_regular_w8_h_8bpc_lsx: 6.2 ( 7.61x) mct_8tap_regular_w8_hv_8bpc_c: 119.5 ( 1.00x) mct_8tap_regular_w8_hv_8bpc_lsx: 18.9 ( 6.32x) mct_8tap_regular_w8_v_8bpc_c: 60.5 ( 1.00x) mct_8tap_regular_w8_v_8bpc_lsx: 5.4 (11.12x) mct_8tap_regular_w16_0_8bpc_c: 28.8 ( 1.00x) mct_8tap_regular_w16_0_8bpc_lsx: 2.8 (10.32x) mct_8tap_regular_w16_h_8bpc_c: 151.9 ( 1.00x) mct_8tap_regular_w16_h_8bpc_lsx: 19.8 ( 7.67x) mct_8tap_regular_w16_hv_8bpc_c: 357.5 ( 1.00x) mct_8tap_regular_w16_hv_8bpc_lsx: 57.6 ( 6.21x) mct_8tap_regular_w16_v_8bpc_c: 195.6 ( 1.00x) mct_8tap_regular_w16_v_8bpc_lsx: 16.9 (11.61x) mct_8tap_regular_w32_0_8bpc_c: 104.6 ( 1.00x) mct_8tap_regular_w32_0_8bpc_lsx: 11.6 ( 9.03x) mct_8tap_regular_w32_h_8bpc_c: 596.3 ( 1.00x) mct_8tap_regular_w32_h_8bpc_lsx: 77.8 ( 7.67x) mct_8tap_regular_w32_hv_8bpc_c: 1329.0 ( 1.00x) mct_8tap_regular_w32_hv_8bpc_lsx: 217.9 ( 6.10x) mct_8tap_regular_w32_v_8bpc_c: 771.0 ( 1.00x) mct_8tap_regular_w32_v_8bpc_lsx: 65.7 (11.73x) mct_8tap_regular_w64_0_8bpc_c: 242.0 ( 1.00x) mct_8tap_regular_w64_0_8bpc_lsx: 27.0 ( 8.95x) mct_8tap_regular_w64_h_8bpc_c: 1455.9 ( 1.00x) mct_8tap_regular_w64_h_8bpc_lsx: 186.9 ( 7.79x) mct_8tap_regular_w64_hv_8bpc_c: 3221.7 ( 1.00x) mct_8tap_regular_w64_hv_8bpc_lsx: 521.8 ( 6.17x) mct_8tap_regular_w64_v_8bpc_c: 1836.1 ( 1.00x) mct_8tap_regular_w64_v_8bpc_lsx: 158.2 (11.61x) mct_8tap_regular_w128_0_8bpc_c: 629.0 ( 1.00x) mct_8tap_regular_w128_0_8bpc_lsx: 66.3 ( 9.49x) mct_8tap_regular_w128_h_8bpc_c: 3617.5 ( 1.00x) mct_8tap_regular_w128_h_8bpc_lsx: 463.6 ( 7.80x) mct_8tap_regular_w128_hv_8bpc_c: 7881.7 ( 1.00x) mct_8tap_regular_w128_hv_8bpc_lsx: 1290.3 ( 6.11x) mct_8tap_regular_w128_v_8bpc_c: 4552.9 ( 1.00x) mct_8tap_regular_w128_v_8bpc_lsx: 391.1 (11.64x) Change-Id: I8c6046e4bd6c1fb19d5712234abece0355fb77fa
b26f315d -
blend_h_w2_8bpc_c: 3.8 ( 1.00x) blend_h_w2_8bpc_lsx: 1.9 ( 1.98x) blend_h_w2_8bpc_lasx: 1.9 ( 1.98x) blend_h_w4_8bpc_c: 6.4 ( 1.00x) blend_h_w4_8bpc_lsx: 1.8 ( 3.49x) blend_h_w4_8bpc_lasx: 1.8 ( 3.49x) blend_h_w8_8bpc_c: 11.6 ( 1.00x) blend_h_w8_8bpc_lsx: 1.8 ( 6.45x) blend_h_w8_8bpc_lasx: 1.8 ( 6.48x) blend_h_w16_8bpc_c: 21.5 ( 1.00x) blend_h_w16_8bpc_lsx: 2.1 (10.47x) blend_h_w16_8bpc_lasx: 2.1 (10.48x) blend_h_w32_8bpc_c: 41.9 ( 1.00x) blend_h_w32_8bpc_lsx: 3.8 (11.08x) blend_h_w32_8bpc_lasx: 3.9 (10.67x) blend_h_w64_8bpc_c: 82.0 ( 1.00x) blend_h_w64_8bpc_lsx: 6.9 (11.89x) blend_h_w64_8bpc_lasx: 4.6 (17.93x) blend_h_w128_8bpc_c: 202.3 ( 1.00x) blend_h_w128_8bpc_lsx: 16.4 (12.30x) blend_h_w128_8bpc_lasx: 11.4 (17.77x) Change-Id: I6d6599ccbaba8a62a629c4a52254b2369dba60f6
ce45ebde -
blend_v_w2_8bpc_c: 5.7 ( 1.00x) blend_v_w2_8bpc_lsx: 3.6 ( 1.60x) blend_v_w4_8bpc_c: 22.8 ( 1.00x) blend_v_w4_8bpc_lsx: 7.1 ( 3.20x) blend_v_w8_8bpc_c: 40.2 ( 1.00x) blend_v_w8_8bpc_lsx: 7.1 ( 5.63x) blend_v_w16_8bpc_c: 74.6 ( 1.00x) blend_v_w16_8bpc_lsx: 8.1 ( 9.26x) blend_v_w32_8bpc_c: 144.0 ( 1.00x) blend_v_w32_8bpc_lsx: 13.3 (10.83x) blend_w4_8bpc_c: 4.9 ( 1.00x) blend_w4_8bpc_lsx: 1.9 ( 2.49x) blend_w8_8bpc_c: 14.1 ( 1.00x) blend_w8_8bpc_lsx: 3.2 ( 4.37x) blend_w16_8bpc_c: 51.5 ( 1.00x) blend_w16_8bpc_lsx: 7.9 ( 6.51x) blend_w32_8bpc_c: 127.5 ( 1.00x) blend_w32_8bpc_lsx: 19.6 ( 6.52x) Change-Id: I95e2dbc1f0735688f5473687f1a7e8d37ffbe417
5319278d -
intra_pred_smooth_h_w4_8bpc_c: 7.3 ( 1.00x) intra_pred_smooth_h_w4_8bpc_lsx: 3.1 ( 2.36x) intra_pred_smooth_h_w8_8bpc_c: 21.3 ( 1.00x) intra_pred_smooth_h_w8_8bpc_lsx: 4.5 ( 4.71x) intra_pred_smooth_h_w16_8bpc_c: 66.3 ( 1.00x) intra_pred_smooth_h_w16_8bpc_lsx: 13.4 ( 4.96x) intra_pred_smooth_h_w32_8bpc_c: 160.0 ( 1.00x) intra_pred_smooth_h_w32_8bpc_lsx: 29.3 ( 5.46x) intra_pred_smooth_h_w64_8bpc_c: 400.2 ( 1.00x) intra_pred_smooth_h_w64_8bpc_lsx: 68.3 ( 5.86x) intra_pred_smooth_v_w4_8bpc_c: 6.6 ( 1.00x) intra_pred_smooth_v_w4_8bpc_lsx: 3.1 ( 2.10x) intra_pred_smooth_v_w8_8bpc_c: 19.3 ( 1.00x) intra_pred_smooth_v_w8_8bpc_lsx: 4.9 ( 3.95x) intra_pred_smooth_v_w16_8bpc_c: 58.6 ( 1.00x) intra_pred_smooth_v_w16_8bpc_lsx: 24.0 ( 2.44x) intra_pred_smooth_v_w32_8bpc_c: 139.4 ( 1.00x) intra_pred_smooth_v_w32_8bpc_lsx: 27.0 ( 5.17x) intra_pred_smooth_v_w64_8bpc_c: 344.8 ( 1.00x) intra_pred_smooth_v_w64_8bpc_lsx: 70.8 ( 4.87x) intra_pred_smooth_w4_8bpc_c: 10.2 ( 1.00x) intra_pred_smooth_w4_8bpc_lsx: 7.9 ( 1.30x) intra_pred_smooth_w8_8bpc_c: 30.3 ( 1.00x) intra_pred_smooth_w8_8bpc_lsx: 20.0 ( 1.51x) intra_pred_smooth_w16_8bpc_c: 96.3 ( 1.00x) intra_pred_smooth_w16_8bpc_lsx: 58.3 ( 1.65x) intra_pred_smooth_w32_8bpc_c: 231.1 ( 1.00x) intra_pred_smooth_w32_8bpc_lsx: 134.3 ( 1.72x) intra_pred_smooth_w64_8bpc_c: 571.5 ( 1.00x) intra_pred_smooth_w64_8bpc_lsx: 326.5 ( 1.75x) Change-Id: I22b6c2dcf27c5393bba374b4fbe8879c0463f828
0b9c756f -
intra_pred_paeth_w4_8bpc_c: 12.3 ( 1.00x) intra_pred_paeth_w4_8bpc_lsx: 3.9 ( 3.12x) intra_pred_paeth_w8_8bpc_c: 39.7 ( 1.00x) intra_pred_paeth_w8_8bpc_lsx: 6.4 ( 6.20x) intra_pred_paeth_w16_8bpc_c: 133.6 ( 1.00x) intra_pred_paeth_w16_8bpc_lsx: 17.0 ( 7.85x) intra_pred_paeth_w32_8bpc_c: 342.8 ( 1.00x) intra_pred_paeth_w32_8bpc_lsx: 52.7 ( 6.50x) intra_pred_paeth_w64_8bpc_c: 903.8 ( 1.00x) intra_pred_paeth_w64_8bpc_lsx: 107.3 ( 8.42x) Change-Id: I457bdb24fdd6b5400ec030bffbdd40c79d8165c1
7463c2af -
intra_pred_h_w4_8bpc_c: 4.3 ( 1.00x) intra_pred_h_w4_8bpc_lsx: 3.5 ( 1.21x) intra_pred_h_w8_8bpc_c: 5.7 ( 1.00x) intra_pred_h_w8_8bpc_lsx: 5.1 ( 1.11x) intra_pred_h_w16_8bpc_c: 13.2 ( 1.00x) intra_pred_h_w16_8bpc_lsx: 7.1 ( 1.86x) intra_pred_h_w32_8bpc_c: 12.4 ( 1.00x) intra_pred_h_w32_8bpc_lsx: 6.3 ( 1.96x) intra_pred_h_w64_8bpc_c: 25.9 ( 1.00x) intra_pred_h_w64_8bpc_lsx: 5.8 ( 4.44x) intra_pred_v_w4_8bpc_c: 4.6 ( 1.00x) intra_pred_v_w4_8bpc_lsx: 2.5 ( 1.85x) intra_pred_v_w8_8bpc_c: 6.9 ( 1.00x) intra_pred_v_w8_8bpc_lsx: 4.5 ( 1.53x) intra_pred_v_w16_8bpc_c: 13.3 ( 1.00x) intra_pred_v_w16_8bpc_lsx: 5.2 ( 2.56x) intra_pred_v_w32_8bpc_c: 16.1 ( 1.00x) intra_pred_v_w32_8bpc_lsx: 5.1 ( 3.13x) intra_pred_v_w64_8bpc_c: 21.7 ( 1.00x) intra_pred_v_w64_8bpc_lsx: 7.7 ( 2.80x) Change-Id: I51b3dd13877315b9c1c64590c19f1ad38bfc4bdf
3e9d80d8 -
intra_pred_dc_w4_8bpc_c: 2.1 ( 1.00x) intra_pred_dc_w4_8bpc_lsx: 1.3 ( 1.54x) intra_pred_dc_w8_8bpc_c: 3.6 ( 1.00x) intra_pred_dc_w8_8bpc_lsx: 3.7 ( 0.97x) intra_pred_dc_w16_8bpc_c: 6.9 ( 1.00x) intra_pred_dc_w16_8bpc_lsx: 7.8 ( 0.88x) intra_pred_dc_w32_8bpc_c: 14.1 ( 1.00x) intra_pred_dc_w32_8bpc_lsx: 7.1 ( 1.97x) intra_pred_dc_w64_8bpc_c: 25.3 ( 1.00x) intra_pred_dc_w64_8bpc_lsx: 7.4 ( 3.41x) intra_pred_dc_128_w4_8bpc_c: 0.6 ( 1.00x) intra_pred_dc_128_w4_8bpc_lsx: 0.8 ( 0.76x) intra_pred_dc_128_w8_8bpc_c: 1.4 ( 1.00x) intra_pred_dc_128_w8_8bpc_lsx: 3.2 ( 0.45x) intra_pred_dc_128_w16_8bpc_c: 3.4 ( 1.00x) intra_pred_dc_128_w16_8bpc_lsx: 7.3 ( 0.47x) intra_pred_dc_128_w32_8bpc_c: 8.8 ( 1.00x) intra_pred_dc_128_w32_8bpc_lsx: 6.4 ( 1.38x) intra_pred_dc_128_w64_8bpc_c: 17.0 ( 1.00x) intra_pred_dc_128_w64_8bpc_lsx: 6.2 ( 2.74x) intra_pred_dc_left_w4_8bpc_c: 1.1 ( 1.00x) intra_pred_dc_left_w4_8bpc_lsx: 1.1 ( 1.00x) intra_pred_dc_left_w8_8bpc_c: 2.1 ( 1.00x) intra_pred_dc_left_w8_8bpc_lsx: 3.4 ( 0.64x) intra_pred_dc_left_w16_8bpc_c: 4.6 ( 1.00x) intra_pred_dc_left_w16_8bpc_lsx: 7.5 ( 0.62x) intra_pred_dc_left_w32_8bpc_c: 10.3 ( 1.00x) intra_pred_dc_left_w32_8bpc_lsx: 7.8 ( 1.32x) intra_pred_dc_left_w64_8bpc_c: 18.7 ( 1.00x) intra_pred_dc_left_w64_8bpc_lsx: 6.6 ( 2.83x) intra_pred_dc_top_w4_8bpc_c: 0.9 ( 1.00x) intra_pred_dc_top_w4_8bpc_lsx: 0.8 ( 1.10x) intra_pred_dc_top_w8_8bpc_c: 1.9 ( 1.00x) intra_pred_dc_top_w8_8bpc_lsx: 2.8 ( 0.67x) intra_pred_dc_top_w16_8bpc_c: 4.2 ( 1.00x) intra_pred_dc_top_w16_8bpc_lsx: 5.5 ( 0.77x) intra_pred_dc_top_w32_8bpc_c: 10.4 ( 1.00x) intra_pred_dc_top_w32_8bpc_lsx: 6.7 ( 1.54x) intra_pred_dc_top_w64_8bpc_c: 19.9 ( 1.00x) intra_pred_dc_top_w64_8bpc_lsx: 6.9 ( 2.87x) Change-Id: Ib5349e2430302da0424a474ce0fedc457439c761
2a9cbcc2 -
cdef_filter_4x4_01_8bpc_c: 420.8 ( 1.00x) cdef_filter_4x4_01_8bpc_lsx: 117.2 ( 3.59x) cdef_filter_4x4_10_8bpc_c: 265.8 ( 1.00x) cdef_filter_4x4_10_8bpc_lsx: 98.9 ( 2.69x) cdef_filter_4x4_11_8bpc_c: 1036.2 ( 1.00x) cdef_filter_4x4_11_8bpc_lsx: 169.6 ( 6.11x) cdef_filter_4x8_01_8bpc_c: 802.6 ( 1.00x) cdef_filter_4x8_01_8bpc_lsx: 206.1 ( 3.89x) cdef_filter_4x8_10_8bpc_c: 489.1 ( 1.00x) cdef_filter_4x8_10_8bpc_lsx: 167.4 ( 2.92x) cdef_filter_4x8_11_8bpc_c: 2028.9 ( 1.00x) cdef_filter_4x8_11_8bpc_lsx: 309.4 ( 6.56x) cdef_filter_8x8_01_8bpc_c: 1562.2 ( 1.00x) cdef_filter_8x8_01_8bpc_lsx: 295.3 ( 5.29x) cdef_filter_8x8_10_8bpc_c: 949.4 ( 1.00x) cdef_filter_8x8_10_8bpc_lsx: 207.6 ( 4.57x) cdef_filter_8x8_11_8bpc_c: 4009.6 ( 1.00x) cdef_filter_8x8_11_8bpc_lsx: 466.8 ( 8.59x) Change-Id: I8cd43426a27055e18c44a7701fa50f8835c712be
62c47f35 -
Performance speedup over lsx is around 68%~156%. Change-Id: I0b39cd0e05e3cbd84fded121d29a91ea2a620f03
fa7b72d0 -
The performance data is as follows: msac_decode_bool_equi_c: 0.4 ( 1.00x) msac_decode_bool_equi_lsx: 0.3 ( 1.07x) msac_decode_hi_tok_c: 1.8 ( 1.00x) msac_decode_hi_tok_lsx: 1.4 ( 1.27x) Change-Id: Ic2f2678cf699bb22c579424af71ae2603e228482
02309b9f -
cdef_dir_8bpc_c: 28.8 ( 1.00x) cdef_dir_8bpc_lsx: 19.1 ( 1.51x) Change-Id: Ic7c1f32c5b1733b011f4c448cffc93f745b564f5
2154425f -
Relative speedup over C code: inv_txfm_add_8x32_identity_identity_0_8bpc_c: 126.1 ( 1.00x) inv_txfm_add_8x32_identity_identity_0_8bpc_lsx: 1.6 (78.59x) inv_txfm_add_8x32_identity_identity_1_8bpc_c: 136.9 ( 1.00x) inv_txfm_add_8x32_identity_identity_1_8bpc_lsx: 1.6 (85.31x) inv_txfm_add_8x32_identity_identity_2_8bpc_c: 148.0 ( 1.00x) inv_txfm_add_8x32_identity_identity_2_8bpc_lsx: 3.3 (45.47x) inv_txfm_add_8x32_identity_identity_3_8bpc_c: 159.4 ( 1.00x) inv_txfm_add_8x32_identity_identity_3_8bpc_lsx: 4.9 (32.78x) inv_txfm_add_8x32_identity_identity_4_8bpc_c: 170.2 ( 1.00x) inv_txfm_add_8x32_identity_identity_4_8bpc_lsx: 6.5 (26.17x) Change-Id: Iabda6efcd8a17d26a205f90757dfea85af48848f
f6ffdc90 -
1. remove the code about identity8 in the 4x8/8x8/8x16 series 2. modify the code of the function dct_dct_8x32/32x32/64x64 3. modify the code about identity4 in the 4x4/4x8/8x4 series After the modification, function performance has been improved by 20% Change-Id: I1bc2e0fb25e508faf9fc220333460a99be3f5e49
5de878a4 -
Relative speedup over C code: inv_txfm_add_8x16_adst_adst_0_8bpc_c: 208.1 inv_txfm_add_8x16_adst_adst_0_8bpc_lsx: 31.3 inv_txfm_add_8x16_adst_adst_1_8bpc_c: 208.4 inv_txfm_add_8x16_adst_adst_1_8bpc_lsx: 31.3 inv_txfm_add_8x16_adst_adst_2_8bpc_c: 208.1 inv_txfm_add_8x16_adst_adst_2_8bpc_lsx: 31.3 inv_txfm_add_8x16_adst_dct_0_8bpc_c: 204.0 inv_txfm_add_8x16_adst_dct_0_8bpc_lsx: 27.2 inv_txfm_add_8x16_adst_dct_1_8bpc_c: 204.0 inv_txfm_add_8x16_adst_dct_1_8bpc_lsx: 27.2 inv_txfm_add_8x16_adst_dct_2_8bpc_c: 204.0 inv_txfm_add_8x16_adst_dct_2_8bpc_lsx: 27.2 inv_txfm_add_8x16_adst_flipadst_0_8bpc_c: 207.9 inv_txfm_add_8x16_adst_flipadst_0_8bpc_lsx: 31.3 inv_txfm_add_8x16_adst_flipadst_1_8bpc_c: 208.3 inv_txfm_add_8x16_adst_flipadst_1_8bpc_lsx: 31.3 inv_txfm_add_8x16_adst_flipadst_2_8bpc_c: 208.6 inv_txfm_add_8x16_adst_flipadst_2_8bpc_lsx: 31.3 inv_txfm_add_8x16_adst_identity_0_8bpc_c: 146.6 inv_txfm_add_8x16_adst_identity_0_8bpc_lsx: 21.8 inv_txfm_add_8x16_adst_identity_1_8bpc_c: 146.6 inv_txfm_add_8x16_adst_identity_1_8bpc_lsx: 21.8 inv_txfm_add_8x16_adst_identity_2_8bpc_c: 146.6 inv_txfm_add_8x16_adst_identity_2_8bpc_lsx: 21.8 inv_txfm_add_8x16_dct_adst_0_8bpc_c: 204.8 inv_txfm_add_8x16_dct_adst_0_8bpc_lsx: 26.2 inv_txfm_add_8x16_dct_adst_1_8bpc_c: 204.8 inv_txfm_add_8x16_dct_adst_1_8bpc_lsx: 26.1 inv_txfm_add_8x16_dct_adst_2_8bpc_c: 204.8 inv_txfm_add_8x16_dct_adst_2_8bpc_lsx: 26.2 inv_txfm_add_8x16_dct_dct_0_8bpc_c: 23.1 inv_txfm_add_8x16_dct_dct_0_8bpc_lsx: 2.3 inv_txfm_add_8x16_dct_dct_1_8bpc_c: 200.8 inv_txfm_add_8x16_dct_dct_1_8bpc_lsx: 21.9 inv_txfm_add_8x16_dct_dct_2_8bpc_c: 200.7 inv_txfm_add_8x16_dct_dct_2_8bpc_lsx: 21.9 inv_txfm_add_8x16_dct_flipadst_0_8bpc_c: 204.6 inv_txfm_add_8x16_dct_flipadst_0_8bpc_lsx: 26.3 inv_txfm_add_8x16_dct_flipadst_1_8bpc_c: 204.6 inv_txfm_add_8x16_dct_flipadst_1_8bpc_lsx: 26.3 inv_txfm_add_8x16_dct_flipadst_2_8bpc_c: 204.6 inv_txfm_add_8x16_dct_flipadst_2_8bpc_lsx: 26.3 inv_txfm_add_8x16_dct_identity_0_8bpc_c: 143.2 inv_txfm_add_8x16_dct_identity_0_8bpc_lsx: 16.7 inv_txfm_add_8x16_dct_identity_1_8bpc_c: 142.9 inv_txfm_add_8x16_dct_identity_1_8bpc_lsx: 16.7 inv_txfm_add_8x16_dct_identity_2_8bpc_c: 143.5 inv_txfm_add_8x16_dct_identity_2_8bpc_lsx: 16.7 inv_txfm_add_8x16_flipadst_adst_0_8bpc_c: 206.5 inv_txfm_add_8x16_flipadst_adst_0_8bpc_lsx: 31.3 inv_txfm_add_8x16_flipadst_adst_1_8bpc_c: 206.5 inv_txfm_add_8x16_flipadst_adst_1_8bpc_lsx: 31.3 inv_txfm_add_8x16_flipadst_adst_2_8bpc_c: 206.5 inv_txfm_add_8x16_flipadst_adst_2_8bpc_lsx: 31.3 inv_txfm_add_8x16_flipadst_dct_0_8bpc_c: 202.5 inv_txfm_add_8x16_flipadst_dct_0_8bpc_lsx: 26.8 inv_txfm_add_8x16_flipadst_dct_1_8bpc_c: 202.3 inv_txfm_add_8x16_flipadst_dct_1_8bpc_lsx: 26.8 inv_txfm_add_8x16_flipadst_dct_2_8bpc_c: 202.3 inv_txfm_add_8x16_flipadst_dct_2_8bpc_lsx: 26.8 inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_c: 206.3 inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_lsx: 31.3 inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_c: 206.3 inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_lsx: 31.3 inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_c: 206.3 inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_lsx: 31.3 inv_txfm_add_8x16_identity_adst_0_8bpc_c: 160.7 inv_txfm_add_8x16_identity_adst_0_8bpc_lsx: 21.8 inv_txfm_add_8x16_identity_adst_1_8bpc_c: 160.4 inv_txfm_add_8x16_identity_adst_1_8bpc_lsx: 21.8 inv_txfm_add_8x16_identity_adst_2_8bpc_c: 160.1 inv_txfm_add_8x16_identity_adst_2_8bpc_lsx: 21.8 inv_txfm_add_8x16_identity_dct_0_8bpc_c: 157.9 inv_txfm_add_8x16_identity_dct_0_8bpc_lsx: 17.7 inv_txfm_add_8x16_identity_dct_1_8bpc_c: 156.5 inv_txfm_add_8x16_identity_dct_1_8bpc_lsx: 17.7 inv_txfm_add_8x16_identity_dct_2_8bpc_c: 156.8 inv_txfm_add_8x16_identity_dct_2_8bpc_lsx: 17.7 inv_txfm_add_8x16_identity_flipadst_0_8bpc_c: 159.9 inv_txfm_add_8x16_identity_flipadst_0_8bpc_lsx: 21.8 inv_txfm_add_8x16_identity_flipadst_1_8bpc_c: 159.9 inv_txfm_add_8x16_identity_flipadst_1_8bpc_lsx: 21.8 inv_txfm_add_8x16_identity_flipadst_2_8bpc_c: 160.0 inv_txfm_add_8x16_identity_flipadst_2_8bpc_lsx: 21.8 inv_txfm_add_8x16_identity_identity_0_8bpc_c: 98.3 inv_txfm_add_8x16_identity_identity_0_8bpc_lsx: 12.3 inv_txfm_add_8x16_identity_identity_1_8bpc_c: 98.0 inv_txfm_add_8x16_identity_identity_1_8bpc_lsx: 12.3 inv_txfm_add_8x16_identity_identity_2_8bpc_c: 98.1 inv_txfm_add_8x16_identity_identity_2_8bpc_lsx: 12.3 Change-Id: Ida8d71e4eff782b9f81e0ad426eaa078b68528cf
2fc65660 -
Relative speedup over C code: inv_txfm_add_4x16_adst_adst_0_8bpc_c: 91.1 inv_txfm_add_4x16_adst_adst_0_8bpc_lsx: 18.2 inv_txfm_add_4x16_adst_adst_1_8bpc_c: 91.1 inv_txfm_add_4x16_adst_adst_1_8bpc_lsx: 18.2 inv_txfm_add_4x16_adst_adst_2_8bpc_c: 91.1 inv_txfm_add_4x16_adst_adst_2_8bpc_lsx: 18.2 inv_txfm_add_4x16_adst_dct_0_8bpc_c: 89.5 inv_txfm_add_4x16_adst_dct_0_8bpc_lsx: 14.3 inv_txfm_add_4x16_adst_dct_1_8bpc_c: 89.5 inv_txfm_add_4x16_adst_dct_1_8bpc_lsx: 14.3 inv_txfm_add_4x16_adst_dct_2_8bpc_c: 89.5 inv_txfm_add_4x16_adst_dct_2_8bpc_lsx: 14.3 inv_txfm_add_4x16_adst_flipadst_0_8bpc_c: 91.8 inv_txfm_add_4x16_adst_flipadst_0_8bpc_lsx: 18.2 inv_txfm_add_4x16_adst_flipadst_1_8bpc_c: 91.7 inv_txfm_add_4x16_adst_flipadst_1_8bpc_lsx: 18.2 inv_txfm_add_4x16_adst_flipadst_2_8bpc_c: 91.8 inv_txfm_add_4x16_adst_flipadst_2_8bpc_lsx: 18.2 inv_txfm_add_4x16_adst_identity_0_8bpc_c: 60.5 inv_txfm_add_4x16_adst_identity_0_8bpc_lsx: 6.3 inv_txfm_add_4x16_adst_identity_1_8bpc_c: 60.5 inv_txfm_add_4x16_adst_identity_1_8bpc_lsx: 6.3 inv_txfm_add_4x16_adst_identity_2_8bpc_c: 60.5 inv_txfm_add_4x16_adst_identity_2_8bpc_lsx: 6.3 inv_txfm_add_4x16_dct_adst_0_8bpc_c: 92.7 inv_txfm_add_4x16_dct_adst_0_8bpc_lsx: 18.4 inv_txfm_add_4x16_dct_adst_1_8bpc_c: 92.7 inv_txfm_add_4x16_dct_adst_1_8bpc_lsx: 18.4 inv_txfm_add_4x16_dct_adst_2_8bpc_c: 92.7 inv_txfm_add_4x16_dct_adst_2_8bpc_lsx: 18.4 inv_txfm_add_4x16_dct_dct_0_8bpc_c: 13.7 inv_txfm_add_4x16_dct_dct_0_8bpc_lsx: 1.9 inv_txfm_add_4x16_dct_dct_1_8bpc_c: 90.6 inv_txfm_add_4x16_dct_dct_1_8bpc_lsx: 14.5 inv_txfm_add_4x16_dct_dct_2_8bpc_c: 90.6 inv_txfm_add_4x16_dct_dct_2_8bpc_lsx: 14.5 inv_txfm_add_4x16_dct_flipadst_0_8bpc_c: 93.3 inv_txfm_add_4x16_dct_flipadst_0_8bpc_lsx: 18.6 inv_txfm_add_4x16_dct_flipadst_1_8bpc_c: 93.4 inv_txfm_add_4x16_dct_flipadst_1_8bpc_lsx: 18.6 inv_txfm_add_4x16_dct_flipadst_2_8bpc_c: 93.4 inv_txfm_add_4x16_dct_flipadst_2_8bpc_lsx: 18.6 inv_txfm_add_4x16_dct_identity_0_8bpc_c: 62.1 inv_txfm_add_4x16_dct_identity_0_8bpc_lsx: 6.5 inv_txfm_add_4x16_dct_identity_1_8bpc_c: 62.1 inv_txfm_add_4x16_dct_identity_1_8bpc_lsx: 6.5 inv_txfm_add_4x16_dct_identity_2_8bpc_c: 62.1 inv_txfm_add_4x16_dct_identity_2_8bpc_lsx: 6.5 inv_txfm_add_4x16_flipadst_adst_0_8bpc_c: 92.2 inv_txfm_add_4x16_flipadst_adst_0_8bpc_lsx: 18.1 inv_txfm_add_4x16_flipadst_adst_1_8bpc_c: 92.3 inv_txfm_add_4x16_flipadst_adst_1_8bpc_lsx: 18.1 inv_txfm_add_4x16_flipadst_adst_2_8bpc_c: 92.2 inv_txfm_add_4x16_flipadst_adst_2_8bpc_lsx: 18.1 inv_txfm_add_4x16_flipadst_dct_0_8bpc_c: 90.6 inv_txfm_add_4x16_flipadst_dct_0_8bpc_lsx: 14.3 inv_txfm_add_4x16_flipadst_dct_1_8bpc_c: 90.6 inv_txfm_add_4x16_flipadst_dct_1_8bpc_lsx: 14.3 inv_txfm_add_4x16_flipadst_dct_2_8bpc_c: 90.6 inv_txfm_add_4x16_flipadst_dct_2_8bpc_lsx: 14.3 inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_c: 92.9 inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_lsx: 18.2 inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_c: 92.9 inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_lsx: 18.2 inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_c: 92.9 inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_lsx: 18.2 inv_txfm_add_4x16_flipadst_identity_0_8bpc_c: 61.8 inv_txfm_add_4x16_flipadst_identity_0_8bpc_lsx: 6.3 inv_txfm_add_4x16_flipadst_identity_1_8bpc_c: 61.8 inv_txfm_add_4x16_flipadst_identity_1_8bpc_lsx: 6.3 inv_txfm_add_4x16_flipadst_identity_2_8bpc_c: 61.8 inv_txfm_add_4x16_flipadst_identity_2_8bpc_lsx: 6.3 inv_txfm_add_4x16_identity_adst_0_8bpc_c: 83.1 inv_txfm_add_4x16_identity_adst_0_8bpc_lsx: 17.8 inv_txfm_add_4x16_identity_adst_1_8bpc_c: 83.0 inv_txfm_add_4x16_identity_adst_1_8bpc_lsx: 17.8 inv_txfm_add_4x16_identity_adst_2_8bpc_c: 83.0 inv_txfm_add_4x16_identity_adst_2_8bpc_lsx: 17.8 inv_txfm_add_4x16_identity_dct_0_8bpc_c: 81.4 inv_txfm_add_4x16_identity_dct_0_8bpc_lsx: 13.9 inv_txfm_add_4x16_identity_dct_1_8bpc_c: 81.4 inv_txfm_add_4x16_identity_dct_1_8bpc_lsx: 13.9 inv_txfm_add_4x16_identity_dct_2_8bpc_c: 81.4 inv_txfm_add_4x16_identity_dct_2_8bpc_lsx: 13.9 inv_txfm_add_4x16_identity_flipadst_0_8bpc_c: 84.1 inv_txfm_add_4x16_identity_flipadst_0_8bpc_lsx: 17.8 inv_txfm_add_4x16_identity_flipadst_1_8bpc_c: 84.0 inv_txfm_add_4x16_identity_flipadst_1_8bpc_lsx: 17.8 inv_txfm_add_4x16_identity_flipadst_2_8bpc_c: 83.9 inv_txfm_add_4x16_identity_flipadst_2_8bpc_lsx: 17.8 inv_txfm_add_4x16_identity_identity_0_8bpc_c: 52.4 inv_txfm_add_4x16_identity_identity_0_8bpc_lsx: 5.5 inv_txfm_add_4x16_identity_identity_1_8bpc_c: 52.4 inv_txfm_add_4x16_identity_identity_1_8bpc_lsx: 5.5 inv_txfm_add_4x16_identity_identity_2_8bpc_c: 52.4 inv_txfm_add_4x16_identity_identity_2_8bpc_lsx: 5.5 Change-Id: I36322071eeea45df9289f2b1d533ee937904aec2
643ae52b -
Relative speedup over C code: inv_txfm_add_4x8_adst_adst_0_8bpc_c: 43.8 inv_txfm_add_4x8_adst_adst_0_8bpc_lsx: 8.6 inv_txfm_add_4x8_adst_adst_1_8bpc_c: 43.8 inv_txfm_add_4x8_adst_adst_1_8bpc_lsx: 8.6 inv_txfm_add_4x8_adst_dct_0_8bpc_c: 43.0 inv_txfm_add_4x8_adst_dct_0_8bpc_lsx: 6.5 inv_txfm_add_4x8_adst_dct_1_8bpc_c: 43.0 inv_txfm_add_4x8_adst_dct_1_8bpc_lsx: 6.5 inv_txfm_add_4x8_adst_flipadst_0_8bpc_c: 44.1 inv_txfm_add_4x8_adst_flipadst_0_8bpc_lsx: 8.8 inv_txfm_add_4x8_adst_flipadst_1_8bpc_c: 44.1 inv_txfm_add_4x8_adst_flipadst_1_8bpc_lsx: 8.8 inv_txfm_add_4x8_adst_identity_0_8bpc_c: 31.3 inv_txfm_add_4x8_adst_identity_0_8bpc_lsx: 2.9 inv_txfm_add_4x8_adst_identity_1_8bpc_c: 31.3 inv_txfm_add_4x8_adst_identity_1_8bpc_lsx: 2.9 inv_txfm_add_4x8_dct_adst_0_8bpc_c: 46.3 inv_txfm_add_4x8_dct_adst_0_8bpc_lsx: 8.8 inv_txfm_add_4x8_dct_adst_1_8bpc_c: 46.3 inv_txfm_add_4x8_dct_adst_1_8bpc_lsx: 8.8 inv_txfm_add_4x8_dct_dct_0_8bpc_c: 7.3 inv_txfm_add_4x8_dct_dct_0_8bpc_lsx: 1.5 inv_txfm_add_4x8_dct_dct_1_8bpc_c: 45.7 inv_txfm_add_4x8_dct_dct_1_8bpc_lsx: 6.7 inv_txfm_add_4x8_dct_flipadst_0_8bpc_c: 46.7 inv_txfm_add_4x8_dct_flipadst_0_8bpc_lsx: 8.8 inv_txfm_add_4x8_dct_flipadst_1_8bpc_c: 46.7 inv_txfm_add_4x8_dct_flipadst_1_8bpc_lsx: 8.8 inv_txfm_add_4x8_dct_identity_0_8bpc_c: 33.8 inv_txfm_add_4x8_dct_identity_0_8bpc_lsx: 2.9 inv_txfm_add_4x8_dct_identity_1_8bpc_c: 33.8 inv_txfm_add_4x8_dct_identity_1_8bpc_lsx: 2.9 inv_txfm_add_4x8_flipadst_adst_0_8bpc_c: 44.0 inv_txfm_add_4x8_flipadst_adst_0_8bpc_lsx: 8.6 inv_txfm_add_4x8_flipadst_adst_1_8bpc_c: 43.9 inv_txfm_add_4x8_flipadst_adst_1_8bpc_lsx: 8.6 inv_txfm_add_4x8_flipadst_dct_0_8bpc_c: 43.3 inv_txfm_add_4x8_flipadst_dct_0_8bpc_lsx: 6.5 inv_txfm_add_4x8_flipadst_dct_1_8bpc_c: 43.4 inv_txfm_add_4x8_flipadst_dct_1_8bpc_lsx: 6.5 inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_c: 44.4 inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_lsx: 8.8 inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_c: 44.4 inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_lsx: 8.8 inv_txfm_add_4x8_flipadst_identity_0_8bpc_c: 31.5 inv_txfm_add_4x8_flipadst_identity_0_8bpc_lsx: 2.9 inv_txfm_add_4x8_flipadst_identity_1_8bpc_c: 31.5 inv_txfm_add_4x8_flipadst_identity_1_8bpc_lsx: 2.9 inv_txfm_add_4x8_identity_adst_0_8bpc_c: 38.9 inv_txfm_add_4x8_identity_adst_0_8bpc_lsx: 8.2 inv_txfm_add_4x8_identity_adst_1_8bpc_c: 38.9 inv_txfm_add_4x8_identity_adst_1_8bpc_lsx: 8.2 inv_txfm_add_4x8_identity_dct_0_8bpc_c: 38.1 inv_txfm_add_4x8_identity_dct_0_8bpc_lsx: 6.1 inv_txfm_add_4x8_identity_dct_1_8bpc_c: 38.1 inv_txfm_add_4x8_identity_dct_1_8bpc_lsx: 6.1 inv_txfm_add_4x8_identity_flipadst_0_8bpc_c: 39.2 inv_txfm_add_4x8_identity_flipadst_0_8bpc_lsx: 8.3 inv_txfm_add_4x8_identity_flipadst_1_8bpc_c: 39.2 inv_txfm_add_4x8_identity_flipadst_1_8bpc_lsx: 8.3 inv_txfm_add_4x8_identity_identity_0_8bpc_c: 26.4 inv_txfm_add_4x8_identity_identity_0_8bpc_lsx: 2.4 inv_txfm_add_4x8_identity_identity_1_8bpc_c: 26.4 inv_txfm_add_4x8_identity_identity_1_8bpc_lsx: 2.4 Change-Id: Ibbaeca98118774a261cf55afd581196d93ac2004
d60d93a5 -
1. inv_txfm_add_dct_dct_16x32 Relative speedup over C code: inv_txfm_add_16x32_dct_dct_0_8bpc_c: 63.4 inv_txfm_add_16x32_dct_dct_0_8bpc_lsx: 3.3 inv_txfm_add_16x32_dct_dct_1_8bpc_c: 687.0 inv_txfm_add_16x32_dct_dct_1_8bpc_lsx: 55.7 inv_txfm_add_16x32_dct_dct_2_8bpc_c: 686.4 inv_txfm_add_16x32_dct_dct_2_8bpc_lsx: 55.6 inv_txfm_add_16x32_dct_dct_3_8bpc_c: 686.4 inv_txfm_add_16x32_dct_dct_3_8bpc_lsx: 55.5 inv_txfm_add_16x32_dct_dct_4_8bpc_c: 686.4 inv_txfm_add_16x32_dct_dct_4_8bpc_lsx: 55.6 Change-Id: I9d22b8b3534b7ba17f6e85e42d08eb3165e2e8cb
74e0eeb5
-