Skip to content
Snippets Groups Projects
  • Hecai Yuan's avatar
    loongarch: Add the some optimization function about itx for 8bpc · f398bf96
    Hecai Yuan authored and Hecai Yuan's avatar Hecai Yuan committed
    1. inv_txfm_add_dct_dct_32x16_8bpc_lsx
    2. inv_txfm_add_dct_dct_32x8_8bpc_lsx
    3. inv_txfm_add_dct_dct_64x32_8bpc_lsx
    4. inv_txfm_add_adst_flipadst_16x16_8bpc_lsx
    5. inv_txfm_add_flipadst_adst_16x16_8bpc_lsx
    6. inv_txfm_add_adst_adst_16x16_8bpc_lasx
    
    Relative speedup over C code:
    
    inv_txfm_add_32x16_dct_dct_0_8bpc_c:                 78.4 ( 1.00x)
    inv_txfm_add_32x16_dct_dct_0_8bpc_lsx:                5.7 (13.81x)
    inv_txfm_add_32x16_dct_dct_1_8bpc_c:                710.1 ( 1.00x)
    inv_txfm_add_32x16_dct_dct_1_8bpc_lsx:              102.9 ( 6.90x)
    inv_txfm_add_32x16_dct_dct_2_8bpc_c:                918.0 ( 1.00x)
    inv_txfm_add_32x16_dct_dct_2_8bpc_lsx:              103.2 ( 8.90x)
    inv_txfm_add_32x16_dct_dct_3_8bpc_c:                914.3 ( 1.00x)
    inv_txfm_add_32x16_dct_dct_3_8bpc_lsx:              103.2 ( 8.86x)
    inv_txfm_add_32x16_dct_dct_4_8bpc_c:                929.8 ( 1.00x)
    inv_txfm_add_32x16_dct_dct_4_8bpc_lsx:              102.9 ( 9.03x)
    
    inv_txfm_add_32x8_dct_dct_0_8bpc_c:                  39.6 ( 1.00x)
    inv_txfm_add_32x8_dct_dct_0_8bpc_lsx:                 3.0 (13.10x)
    inv_txfm_add_32x8_dct_dct_1_8bpc_c:                 431.6 ( 1.00x)
    inv_txfm_add_32x8_dct_dct_1_8bpc_lsx:                42.6 (10.13x)
    inv_txfm_add_32x8_dct_dct_2_8bpc_c:                 431.5 ( 1.00x)
    inv_txfm_add_32x8_dct_dct_2_8bpc_lsx:                42.6 (10.13x)
    inv_txfm_add_32x8_dct_dct_3_8bpc_c:                 432.0 ( 1.00x)
    inv_txfm_add_32x8_dct_dct_3_8bpc_lsx:                42.6 (10.14x)
    inv_txfm_add_32x8_dct_dct_4_8bpc_c:                 431.3 ( 1.00x)
    inv_txfm_add_32x8_dct_dct_4_8bpc_lsx:                42.6 (10.13x)
    
    inv_txfm_add_64x32_dct_dct_0_8bpc_c:                304.3 ( 1.00x)
    inv_txfm_add_64x32_dct_dct_0_8bpc_lsx:               20.3 (15.01x)
    inv_txfm_add_64x32_dct_dct_1_8bpc_c:               2743.1 ( 1.00x)
    inv_txfm_add_64x32_dct_dct_1_8bpc_lsx:              270.9 (10.13x)
    inv_txfm_add_64x32_dct_dct_2_8bpc_c:               3197.1 ( 1.00x)
    inv_txfm_add_64x32_dct_dct_2_8bpc_lsx:              327.7 ( 9.76x)
    inv_txfm_add_64x32_dct_dct_3_8bpc_c:               3638.3 ( 1.00x)
    inv_txfm_add_64x32_dct_dct_3_8bpc_lsx:              383.7 ( 9.48x)
    inv_txfm_add_64x32_dct_dct_4_8bpc_c:               4084.5 ( 1.00x)
    inv_txfm_add_64x32_dct_dct_4_8bpc_lsx:              441.7 ( 9.25x)
    
    inv_txfm_add_16x16_adst_flipadst_0_8bpc_c:          277.3 ( 1.00x)
    inv_txfm_add_16x16_adst_flipadst_0_8bpc_lsx:         58.7 ( 4.72x)
    inv_txfm_add_16x16_adst_flipadst_1_8bpc_c:          358.1 ( 1.00x)
    inv_txfm_add_16x16_adst_flipadst_1_8bpc_lsx:         58.7 ( 6.10x)
    inv_txfm_add_16x16_adst_flipadst_2_8bpc_c:          449.3 ( 1.00x)
    inv_txfm_add_16x16_adst_flipadst_2_8bpc_lsx:         58.7 ( 7.65x)
    
    inv_txfm_add_16x16_flipadst_adst_0_8bpc_c:          277.2 ( 1.00x)
    inv_txfm_add_16x16_flipadst_adst_0_8bpc_lsx:         58.7 ( 4.72x)
    inv_txfm_add_16x16_flipadst_adst_1_8bpc_c:          358.7 ( 1.00x)
    inv_txfm_add_16x16_flipadst_adst_1_8bpc_lsx:         58.7 ( 6.11x)
    inv_txfm_add_16x16_flipadst_adst_2_8bpc_c:          450.4 ( 1.00x)
    inv_txfm_add_16x16_flipadst_adst_2_8bpc_lsx:         58.7 ( 7.67x)
    
    inv_txfm_add_16x16_adst_adst_0_8bpc_c:              253.4 ( 1.00x)
    inv_txfm_add_16x16_adst_adst_0_8bpc_lasx:            23.1 (10.98x)
    inv_txfm_add_16x16_adst_adst_1_8bpc_c:              325.2 ( 1.00x)
    inv_txfm_add_16x16_adst_adst_1_8bpc_lasx:            23.1 (14.08x)
    inv_txfm_add_16x16_adst_adst_2_8bpc_c:              405.9 ( 1.00x)
    inv_txfm_add_16x16_adst_adst_2_8bpc_lasx:            23.1 (17.56x)
    
    Change-Id: Iaa5419a830c3308e2c4c9ac5b3699c3a971301ed
    f398bf96