Skip to content

arm64: itx16: Use usqadd to avoid separate clamping of negative values

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-usq into master
Before:                                Cortex A53     A72      A73
inv_txfm_add_4x4_dct_dct_0_10bpc_neon:       40.7    23.0     24.0
inv_txfm_add_4x4_dct_dct_1_10bpc_neon:      116.0    71.5     78.2
inv_txfm_add_8x8_dct_dct_0_10bpc_neon:       85.7    50.7     53.8
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:      287.0   203.5    215.2
inv_txfm_add_16x16_dct_dct_0_10bpc_neon:    255.7   129.1    140.4
inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   1401.4  1026.7   1039.2
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   1913.2  1407.3   1479.6
After:
inv_txfm_add_4x4_dct_dct_0_10bpc_neon:       38.7    21.5     22.2
inv_txfm_add_4x4_dct_dct_1_10bpc_neon:      116.0    71.3     77.2
inv_txfm_add_8x8_dct_dct_0_10bpc_neon:       76.7    44.7     43.5
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:      278.0   203.0    203.9
inv_txfm_add_16x16_dct_dct_0_10bpc_neon:    236.9   106.2    116.2
inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   1368.7   999.7   1008.4
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   1880.5  1381.2   1459.4

Merge request reports