Skip to content

riscv64/mc: Add w_mask functions

Sungjoon Moon requested to merge OctopusET/dav1d:rvv_w_mask into master

Closes: !1801 (closed) !1805 (closed) !1806 (closed) !1797

Merged work of !1801 (closed), !1805 (closed), !1806 (closed)

Benchmarks:

Kendryte K230 SpacemiT K1
w_mask_420_w4_8bpc_c:        845.1 ( 1.00x)
w_mask_420_w4_8bpc_rvv:      313.1 ( 2.70x)
w_mask_420_w8_8bpc_c:       2589.9 ( 1.00x)
w_mask_420_w8_8bpc_rvv:      549.3 ( 4.72x)
w_mask_420_w16_8bpc_c:      8389.9 ( 1.00x)
w_mask_420_w16_8bpc_rvv:    1250.4 ( 6.71x)
w_mask_420_w32_8bpc_c:     33485.7 ( 1.00x)
w_mask_420_w32_8bpc_rvv:    4276.9 ( 7.83x)
w_mask_420_w64_8bpc_c:     81934.2 ( 1.00x)
w_mask_420_w64_8bpc_rvv:   11243.9 ( 7.29x)
w_mask_420_w128_8bpc_c:   205865.8 ( 1.00x)
w_mask_420_w128_8bpc_rvv:  28098.0 ( 7.33x)
w_mask_422_w4_8bpc_c:        838.6 ( 1.00x)
w_mask_422_w4_8bpc_rvv:      315.9 ( 2.65x)
w_mask_422_w8_8bpc_c:       2576.4 ( 1.00x)
w_mask_422_w8_8bpc_rvv:      564.2 ( 4.57x)
w_mask_422_w16_8bpc_c:      8378.7 ( 1.00x)
w_mask_422_w16_8bpc_rvv:    1305.4 ( 6.42x)
w_mask_422_w32_8bpc_c:     33512.4 ( 1.00x)
w_mask_422_w32_8bpc_rvv:    4487.6 ( 7.47x)
w_mask_422_w64_8bpc_c:     82489.8 ( 1.00x)
w_mask_422_w64_8bpc_rvv:   11895.3 ( 6.93x)
w_mask_422_w128_8bpc_c:   207116.2 ( 1.00x)
w_mask_422_w128_8bpc_rvv:  29541.4 ( 7.01x)
w_mask_444_w4_8bpc_c:        822.7 ( 1.00x)
w_mask_444_w4_8bpc_rvv:      265.3 ( 3.10x)
w_mask_444_w8_8bpc_c:       2542.5 ( 1.00x)
w_mask_444_w8_8bpc_rvv:      429.2 ( 5.92x)
w_mask_444_w16_8bpc_c:      8290.8 ( 1.00x)
w_mask_444_w16_8bpc_rvv:     965.7 ( 8.59x)
w_mask_444_w32_8bpc_c:     33229.6 ( 1.00x)
w_mask_444_w32_8bpc_rvv:    3289.2 (10.10x)
w_mask_444_w64_8bpc_c:     81404.6 ( 1.00x)
w_mask_444_w64_8bpc_rvv:    9126.6 ( 8.92x)
w_mask_444_w128_8bpc_c:   204438.4 ( 1.00x)
w_mask_444_w128_8bpc_rvv:  22424.9 ( 9.12x)
w_mask_420_w4_8bpc_c:        747.9 ( 1.00x)
w_mask_420_w4_8bpc_rvv:      290.4 ( 2.58x)
w_mask_420_w8_8bpc_c:       2312.3 ( 1.00x)
w_mask_420_w8_8bpc_rvv:      478.9 ( 4.83x)
w_mask_420_w16_8bpc_c:      7509.3 ( 1.00x)
w_mask_420_w16_8bpc_rvv:     885.2 ( 8.48x)
w_mask_420_w32_8bpc_c:     30087.8 ( 1.00x)
w_mask_420_w32_8bpc_rvv:    2595.6 (11.59x)
w_mask_420_w64_8bpc_c:     72313.0 ( 1.00x)
w_mask_420_w64_8bpc_rvv:    6020.9 (12.01x)
w_mask_420_w128_8bpc_c:   179297.0 ( 1.00x)
w_mask_420_w128_8bpc_rvv:  15659.1 (11.45x)
w_mask_422_w4_8bpc_c:        735.0 ( 1.00x)
w_mask_422_w4_8bpc_rvv:      299.0 ( 2.46x)
w_mask_422_w8_8bpc_c:       2285.6 ( 1.00x)
w_mask_422_w8_8bpc_rvv:      488.5 ( 4.68x)
w_mask_422_w16_8bpc_c:      7459.3 ( 1.00x)
w_mask_422_w16_8bpc_rvv:     946.3 ( 7.88x)
w_mask_422_w32_8bpc_c:     29996.7 ( 1.00x)
w_mask_422_w32_8bpc_rvv:    2812.7 (10.66x)
w_mask_422_w64_8bpc_c:     71809.4 ( 1.00x)
w_mask_422_w64_8bpc_rvv:    6253.7 (11.48x)
w_mask_422_w128_8bpc_c:   178081.9 ( 1.00x)
w_mask_422_w128_8bpc_rvv:  16087.8 (11.07x)
w_mask_444_w4_8bpc_c:        726.2 ( 1.00x)
w_mask_444_w4_8bpc_rvv:      255.9 ( 2.84x)
w_mask_444_w8_8bpc_c:       2250.7 ( 1.00x)
w_mask_444_w8_8bpc_rvv:      403.9 ( 5.57x)
w_mask_444_w16_8bpc_c:      7341.4 ( 1.00x)
w_mask_444_w16_8bpc_rvv:     744.7 ( 9.86x)
w_mask_444_w32_8bpc_c:     29658.4 ( 1.00x)
w_mask_444_w32_8bpc_rvv:    2295.9 (12.92x)
w_mask_444_w64_8bpc_c:     70695.9 ( 1.00x)
w_mask_444_w64_8bpc_rvv:    4879.0 (14.49x)
w_mask_444_w128_8bpc_c:   175483.6 ( 1.00x)
w_mask_444_w128_8bpc_rvv:  13021.9 (13.48x)
Edited by Sungjoon Moon

Merge request reports

Loading