Skip to content
  • Martin Storsjö's avatar
    arm64: itx16: Use usqadd to avoid separate clamping of negative values · 6f9f3391
    Martin Storsjö authored
    Before:                                Cortex A53     A72      A73
    inv_txfm_add_4x4_dct_dct_0_10bpc_neon:       40.7    23.0     24.0
    inv_txfm_add_4x4_dct_dct_1_10bpc_neon:      116.0    71.5     78.2
    inv_txfm_add_8x8_dct_dct_0_10bpc_neon:       85.7    50.7     53.8
    inv_txfm_add_8x8_dct_dct_1_10bpc_neon:      287.0   203.5    215.2
    inv_txfm_add_16x16_dct_dct_0_10bpc_neon:    255.7   129.1    140.4
    inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   1401.4  1026.7   1039.2
    inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   1913.2  1407.3   1479.6
    After:
    inv_txfm_add_4x4_dct_dct_0_10bpc_neon:       38.7    21.5     22.2
    inv_txfm_add_4x4_dct_dct_1_10bpc_neon:      116.0    71.3     77.2
    inv_txfm_add_8x8_dct_dct_0_10bpc_neon:       76.7    44.7     43.5
    inv_txfm_add_8x8_dct_dct_1_10bpc_neon:      278.0   203.0    203.9
    inv_txfm_add_16x16_dct_dct_0_10bpc_neon:    236.9   106.2    116.2
    inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   1368.7   999.7   1008.4
    inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   1880.5  1381.2   1459.4
    6f9f3391