Skip to content

x86: itx: Port 10-bit 4x4 transforms to SSE4

Matthias Dressel requested to merge another/dav1d:itxfm-4x4 into master
                                                 64-bit  32-bit
inv_txfm_add_4x4_adst_adst_0_10bpc_c:            257.0   346.3
inv_txfm_add_4x4_adst_adst_0_10bpc_sse4:          47.1    51.7
inv_txfm_add_4x4_adst_adst_0_10bpc_avx2:          57.4
inv_txfm_add_4x4_adst_adst_1_10bpc_c:            259.8   345.6
inv_txfm_add_4x4_adst_adst_1_10bpc_sse4:          47.1    52.0
inv_txfm_add_4x4_adst_adst_1_10bpc_avx2:          56.9
inv_txfm_add_4x4_adst_dct_0_10bpc_c:             284.6   369.9
inv_txfm_add_4x4_adst_dct_0_10bpc_sse4:           42.2    46.0
inv_txfm_add_4x4_adst_dct_0_10bpc_avx2:           51.9
inv_txfm_add_4x4_adst_dct_1_10bpc_c:             285.2   369.8
inv_txfm_add_4x4_adst_dct_1_10bpc_sse4:           42.4    45.9
inv_txfm_add_4x4_adst_dct_1_10bpc_avx2:           51.9
inv_txfm_add_4x4_adst_flipadst_0_10bpc_c:        262.9   345.0
inv_txfm_add_4x4_adst_flipadst_0_10bpc_sse4:      46.8    50.1
inv_txfm_add_4x4_adst_flipadst_0_10bpc_avx2:      57.0
inv_txfm_add_4x4_adst_flipadst_1_10bpc_c:        262.1   345.6
inv_txfm_add_4x4_adst_flipadst_1_10bpc_sse4:      46.8    50.3
inv_txfm_add_4x4_adst_flipadst_1_10bpc_avx2:      57.1
inv_txfm_add_4x4_adst_identity_0_10bpc_c:        225.6   302.9
inv_txfm_add_4x4_adst_identity_0_10bpc_sse4:      38.0    42.3
inv_txfm_add_4x4_adst_identity_0_10bpc_avx2:      41.4
inv_txfm_add_4x4_adst_identity_1_10bpc_c:        225.7   303.1
inv_txfm_add_4x4_adst_identity_1_10bpc_sse4:      37.8    42.3
inv_txfm_add_4x4_adst_identity_1_10bpc_avx2:      41.4
inv_txfm_add_4x4_dct_adst_0_10bpc_c:             274.6   378.0
inv_txfm_add_4x4_dct_adst_0_10bpc_sse4:           44.8    48.5
inv_txfm_add_4x4_dct_adst_0_10bpc_avx2:           50.7
inv_txfm_add_4x4_dct_adst_1_10bpc_c:             274.0   377.4
inv_txfm_add_4x4_dct_adst_1_10bpc_sse4:           44.6    48.6
inv_txfm_add_4x4_dct_adst_1_10bpc_avx2:           51.0
inv_txfm_add_4x4_dct_dct_0_10bpc_c:               39.2    50.6
inv_txfm_add_4x4_dct_dct_0_10bpc_sse4:            29.1    33.8
inv_txfm_add_4x4_dct_dct_0_10bpc_avx2:            29.3
inv_txfm_add_4x4_dct_dct_1_10bpc_c:              300.6   399.0
inv_txfm_add_4x4_dct_dct_1_10bpc_sse4:            39.7    44.3
inv_txfm_add_4x4_dct_dct_1_10bpc_avx2:            48.6
inv_txfm_add_4x4_dct_flipadst_0_10bpc_c:         278.6   377.8
inv_txfm_add_4x4_dct_flipadst_0_10bpc_sse4:       45.3    49.6
inv_txfm_add_4x4_dct_flipadst_0_10bpc_avx2:       50.2
inv_txfm_add_4x4_dct_flipadst_1_10bpc_c:         277.1   378.3
inv_txfm_add_4x4_dct_flipadst_1_10bpc_sse4:       45.0    49.7
inv_txfm_add_4x4_dct_flipadst_1_10bpc_avx2:       50.2
inv_txfm_add_4x4_dct_identity_0_10bpc_c:         246.9   335.8
inv_txfm_add_4x4_dct_identity_0_10bpc_sse4:       37.1    41.7
inv_txfm_add_4x4_dct_identity_0_10bpc_avx2:       37.4
inv_txfm_add_4x4_dct_identity_1_10bpc_c:         247.2   336.2
inv_txfm_add_4x4_dct_identity_1_10bpc_sse4:       37.1    41.6
inv_txfm_add_4x4_dct_identity_1_10bpc_avx2:       37.3
inv_txfm_add_4x4_flipadst_adst_0_10bpc_c:        259.4   351.7
inv_txfm_add_4x4_flipadst_adst_0_10bpc_sse4:      47.1    51.8
inv_txfm_add_4x4_flipadst_adst_0_10bpc_avx2:      57.9
inv_txfm_add_4x4_flipadst_adst_1_10bpc_c:        258.7   350.8
inv_txfm_add_4x4_flipadst_adst_1_10bpc_sse4:      47.1    51.8
inv_txfm_add_4x4_flipadst_adst_1_10bpc_avx2:      57.4
inv_txfm_add_4x4_flipadst_dct_0_10bpc_c:         282.3   375.4
inv_txfm_add_4x4_flipadst_dct_0_10bpc_sse4:       42.2    45.8
inv_txfm_add_4x4_flipadst_dct_0_10bpc_avx2:       52.5
inv_txfm_add_4x4_flipadst_dct_1_10bpc_c:         283.0   375.8
inv_txfm_add_4x4_flipadst_dct_1_10bpc_sse4:       42.5    45.9
inv_txfm_add_4x4_flipadst_dct_1_10bpc_avx2:       52.4
inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_c:    258.8   356.1
inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_sse4:  47.3    50.1
inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_avx2:  57.4
inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_c:    259.0   355.3
inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_sse4:  47.8    50.2
inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_avx2:  57.4
inv_txfm_add_4x4_flipadst_identity_0_10bpc_c:    228.6   309.4
inv_txfm_add_4x4_flipadst_identity_0_10bpc_sse4:  37.8    42.0
inv_txfm_add_4x4_flipadst_identity_0_10bpc_avx2:  41.4
inv_txfm_add_4x4_flipadst_identity_1_10bpc_c:    229.1   309.6
inv_txfm_add_4x4_flipadst_identity_1_10bpc_sse4:  37.9    42.2
inv_txfm_add_4x4_flipadst_identity_1_10bpc_avx2:  41.3
inv_txfm_add_4x4_identity_adst_0_10bpc_c:        200.8   275.8
inv_txfm_add_4x4_identity_adst_0_10bpc_sse4:      39.0    43.9
inv_txfm_add_4x4_identity_adst_0_10bpc_avx2:      47.4
inv_txfm_add_4x4_identity_adst_1_10bpc_c:        200.8   276.5
inv_txfm_add_4x4_identity_adst_1_10bpc_sse4:      39.0    44.0
inv_txfm_add_4x4_identity_adst_1_10bpc_avx2:      47.2
inv_txfm_add_4x4_identity_dct_0_10bpc_c:         226.4   300.3
inv_txfm_add_4x4_identity_dct_0_10bpc_sse4:       36.9    41.7
inv_txfm_add_4x4_identity_dct_0_10bpc_avx2:       42.8
inv_txfm_add_4x4_identity_dct_1_10bpc_c:         229.0   300.6
inv_txfm_add_4x4_identity_dct_1_10bpc_sse4:       36.8    41.6
inv_txfm_add_4x4_identity_dct_1_10bpc_avx2:       42.7
inv_txfm_add_4x4_identity_flipadst_0_10bpc_c:    202.6   278.9
inv_txfm_add_4x4_identity_flipadst_0_10bpc_sse4:  39.2    43.7
inv_txfm_add_4x4_identity_flipadst_0_10bpc_avx2:  47.1
inv_txfm_add_4x4_identity_flipadst_1_10bpc_c:    202.6   279.3
inv_txfm_add_4x4_identity_flipadst_1_10bpc_sse4:  39.2    43.8
inv_txfm_add_4x4_identity_flipadst_1_10bpc_avx2:  47.0
inv_txfm_add_4x4_identity_identity_0_10bpc_c:    168.7   235.9
inv_txfm_add_4x4_identity_identity_0_10bpc_sse4:  31.7    37.6
inv_txfm_add_4x4_identity_identity_0_10bpc_avx2:  33.9
inv_txfm_add_4x4_identity_identity_1_10bpc_c:    169.1   235.7
inv_txfm_add_4x4_identity_identity_1_10bpc_sse4:  31.7    37.4
inv_txfm_add_4x4_identity_identity_1_10bpc_avx2:  33.8
Edited by Matthias Dressel

Merge request reports