x86: add AVX512-IceLake implementation of HBD 64x32 DCT^2
1 unresolved thread
1 unresolved thread
inv_txfm_add_64x32_dct_dct_0_10bpc_c: 1760.6 ( 1.00x)
inv_txfm_add_64x32_dct_dct_0_10bpc_sse4: 271.1 ( 6.49x)
inv_txfm_add_64x32_dct_dct_0_10bpc_avx2: 121.3 (14.52x)
inv_txfm_add_64x32_dct_dct_0_10bpc_avx512icl: 116.3 (15.14x)
inv_txfm_add_64x32_dct_dct_1_10bpc_c: 66507.4 ( 1.00x)
inv_txfm_add_64x32_dct_dct_1_10bpc_sse4: 3712.4 (17.91x)
inv_txfm_add_64x32_dct_dct_1_10bpc_avx2: 1830.5 (36.33x)
inv_txfm_add_64x32_dct_dct_1_10bpc_avx512icl: 805.4 (82.58x)
inv_txfm_add_64x32_dct_dct_2_10bpc_c: 66491.6 ( 1.00x)
inv_txfm_add_64x32_dct_dct_2_10bpc_sse4: 5325.3 (12.49x)
inv_txfm_add_64x32_dct_dct_2_10bpc_avx2: 2578.5 (25.79x)
inv_txfm_add_64x32_dct_dct_2_10bpc_avx512icl: 1394.5 (47.68x)
inv_txfm_add_64x32_dct_dct_3_10bpc_c: 66490.2 ( 1.00x)
inv_txfm_add_64x32_dct_dct_3_10bpc_sse4: 6418.5 (10.36x)
inv_txfm_add_64x32_dct_dct_3_10bpc_avx2: 3305.6 (20.11x)
inv_txfm_add_64x32_dct_dct_3_10bpc_avx512icl: 2571.5 (25.86x)
inv_txfm_add_64x32_dct_dct_4_10bpc_c: 66508.6 ( 1.00x)
inv_txfm_add_64x32_dct_dct_4_10bpc_sse4: 8671.2 ( 7.67x)
inv_txfm_add_64x32_dct_dct_4_10bpc_avx2: 4054.2 (16.40x)
inv_txfm_add_64x32_dct_dct_4_10bpc_avx512icl: 2691.6 (24.71x)
Merge request reports
Activity
Filter activity
requested review from @gramner
assigned to @rbultje
mentioned in issue #316
5517 call m(inv_txfm_add_dct_dct_32x32_10bpc).pass2_fast2_start 5518 mov r7d, 16*4 5519 mov r4, dstq 5520 pxor m12, m12 5521 call m(inv_txfm_add_dct_dct_32x32_10bpc).pass2_end 5522 lea dstq, [r4+64] 5523 mova m0, [rsp+16*mmsize] 5524 mova m1, [rsp+17*mmsize] 5525 mova m2, [rsp+18*mmsize] 5526 mova m3, [rsp+19*mmsize] 5527 mova m4, [rsp+20*mmsize] 5528 mova m5, [rsp+21*mmsize] 5529 mova m6, [rsp+22*mmsize] 5530 mova m7, [rsp+23*mmsize] 5531 lea r5, [o_base] 5532 vpbroadcastd m13, [o(pd_2048)] - Resolved by Ronald S. Bultje
- Resolved by Ronald S. Bultje
- Resolved by Ronald S. Bultje
added 1 commit
- 68d7a76d - x86: add AVX512-IceLake implementation of HBD 64x32 DCT^2
changed milestone to %1.2.0
added performance x86 labels