Skip to content

riscv64/mc: Add 8bpc RVV blend{,_h/v} functions

Nathan E. Egge requested to merge unlord/dav1d:blend into master

Kendryte K230

blend_w4_8bpc_c:          204.8 ( 1.00x)
blend_w4_8bpc_rvv:         59.8 ( 3.42x)
blend_w8_8bpc_c:          608.9 ( 1.00x)
blend_w8_8bpc_rvv:         87.2 ( 6.98x)
blend_w16_8bpc_c:        2362.4 ( 1.00x)
blend_w16_8bpc_rvv:       225.2 (10.49x)
blend_w32_8bpc_c:        5990.4 ( 1.00x)
blend_w32_8bpc_rvv:       518.3 (11.56x)

blend_h_w2_8bpc_c:        165.9 ( 1.00x)
blend_h_w2_8bpc_rvv:       83.8 ( 1.98x)
blend_h_w4_8bpc_c:        295.2 ( 1.00x)
blend_h_w4_8bpc_rvv:       83.8 ( 3.52x)
blend_h_w8_8bpc_c:        557.9 ( 1.00x)
blend_h_w8_8bpc_rvv:       92.5 ( 6.03x)
blend_h_w16_8bpc_c:      1078.8 ( 1.00x)
blend_h_w16_8bpc_rvv:     117.3 ( 9.19x)
blend_h_w32_8bpc_c:      2117.8 ( 1.00x)
blend_h_w32_8bpc_rvv:     200.5 (10.57x)
blend_h_w64_8bpc_c:      4194.7 ( 1.00x)
blend_h_w64_8bpc_rvv:     363.2 (11.55x)
blend_h_w128_8bpc_c:    10271.4 ( 1.00x)
blend_h_w128_8bpc_rvv:    844.5 (12.16x)

blend_v_w2_8bpc_c:        221.4 ( 1.00x)
blend_v_w2_8bpc_rvv:      147.7 ( 1.50x)
blend_v_w4_8bpc_c:        945.3 ( 1.00x)
blend_v_w4_8bpc_rvv:      243.3 ( 3.89x)
blend_v_w8_8bpc_c:       1786.9 ( 1.00x)
blend_v_w8_8bpc_rvv:      256.1 ( 6.98x)
blend_v_w16_8bpc_c:      3472.1 ( 1.00x)
blend_v_w16_8bpc_rvv:     351.1 ( 9.89x)
blend_v_w32_8bpc_c:      6832.1 ( 1.00x)
blend_v_w32_8bpc_rvv:     635.4 (10.75x)

SpacemiT K1

blend_w4_8bpc_c:          201.6 ( 1.00x)
blend_w4_8bpc_rvv:         58.0 ( 3.48x)
blend_w8_8bpc_c:          595.1 ( 1.00x)
blend_w8_8bpc_rvv:         82.1 ( 7.25x)
blend_w16_8bpc_c:        2308.8 ( 1.00x)
blend_w16_8bpc_rvv:       189.0 (12.22x)
blend_w32_8bpc_c:        5853.1 ( 1.00x)
blend_w32_8bpc_rvv:       339.5 (17.24x)

blend_h_w2_8bpc_c:        162.5 ( 1.00x)
blend_h_w2_8bpc_rvv:       83.9 ( 1.94x)
blend_h_w4_8bpc_c:        288.6 ( 1.00x)
blend_h_w4_8bpc_rvv:       83.7 ( 3.45x)
blend_h_w8_8bpc_c:        544.7 ( 1.00x)
blend_h_w8_8bpc_rvv:       84.0 ( 6.48x)
blend_h_w16_8bpc_c:      1052.8 ( 1.00x)
blend_h_w16_8bpc_rvv:     102.9 (10.23x)
blend_h_w32_8bpc_c:      2068.0 ( 1.00x)
blend_h_w32_8bpc_rvv:     131.4 (15.73x)
blend_h_w64_8bpc_c:      4093.7 ( 1.00x)
blend_h_w64_8bpc_rvv:     220.3 (18.58x)
blend_h_w128_8bpc_c:    10023.1 ( 1.00x)
blend_h_w128_8bpc_rvv:    467.3 (21.45x)

blend_v_w2_8bpc_c:        218.0 ( 1.00x)
blend_v_w2_8bpc_rvv:      144.3 ( 1.51x)
blend_v_w4_8bpc_c:        921.7 ( 1.00x)
blend_v_w4_8bpc_rvv:      237.1 ( 3.89x)
blend_v_w8_8bpc_c:       1739.8 ( 1.00x)
blend_v_w8_8bpc_rvv:      237.4 ( 7.33x)
blend_v_w16_8bpc_c:      3376.6 ( 1.00x)
blend_v_w16_8bpc_rvv:     296.3 (11.40x)
blend_v_w32_8bpc_c:      6647.2 ( 1.00x)
blend_v_w32_8bpc_rvv:     408.1 (16.29x)
Edited by Nathan E. Egge

Merge request reports