Skip to content
  • Martin Storsjö's avatar
    arm64: itx: Add NEON optimized inverse transforms · ef1ea008
    Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
    The speedup for most non-dc-only dct functions is around 9-12x
    over the C code generated by GCC 7.3.
    
    Relative speedups vs C for a few functions:
    
                                                  Cortex A53    A72    A73
    inv_txfm_add_4x4_dct_dct_0_8bpc_neon:               3.90   4.16   5.65
    inv_txfm_add_4x4_dct_dct_1_8bpc_neon:               7.20   8.05  11.19
    inv_txfm_add_8x8_dct_dct_0_8bpc_neon:               5.09   6.73   6.45
    inv_txfm_add_8x8_dct_dct_1_8bpc_neon:              12.18  10.80  13.05
    inv_txfm_add_16x16_dct_dct_0_8bpc_neon:             7.31   9.35  11.17
    inv_txfm_add_16x16_dct_dct_1_8bpc_neon:            14.36  13.06  15.93
    inv_txfm_add_16x16_dct_dct_2_8bpc_neon:            11.00  10.09  12.05
    inv_txfm_add_32x32_dct_dct_0_8bpc_neon:             4.41   5.40   5.77
    inv_txfm_add_32x32_dct_dct_1_8bpc_neon:            13.84  13.81  18.04
    inv_txfm_add_32x32_dct_dct_2_8bpc_neon:            11.75  11.87  15.22
    inv_txfm_add_32x32_dct_dct_3_8bpc_neon:            10.20  10.40  13.13
    inv_txfm_add_32x32_dct_dct_4_8bpc_neon:             9.01   9.21  11.56
    inv_txfm_add_64x64_dct_dct_0_8bpc_neon:             3.84   4.82   5.28
    inv_txfm_add_64x64_dct_dct_1_8bpc_neon:            14.40  12.69  16.71
    inv_txfm_add_64x64_dct_dct_4_8bpc_neon:            10.91   9.63  12.67
    
    Some of the specialcased identity_identity transforms for 32x32
    give insane speedups over the generic C code:
    
    inv_txfm_add_32x32_identity_identity_0_8bpc_neon: 225.26 238.11 247.07
    inv_txfm_add_32x32_identity_identity_1_8bpc_neon: 225.33 238.53 247.69
    inv_txfm_add_32x32_identity_identity_2_8bpc_neon:  59.60  61.94  64.63
    inv_txfm_add_32x32_identity_identity_3_8bpc_neon:  26.98  27.99  29.21
    inv_txfm_add_32x32_identity_identity_4_8bpc_neon:  15.08  15.93  16.56
    ef1ea008
Loading