riscv64/mc: warp_8x8 and warp_8x8t 8bpc (!1707) · Merge requests · VideoLAN / dav1d

Bogdan Gligorijević requested to merge BogdanW3/dav1d:warp_8x8 into master Aug 26, 2024

Benchmarks:
- Kendryte K230:
warp_8x8_8bpc_c:      4549.7 ( 1.00x)
warp_8x8_8bpc_rvv:    2504.7 ( 1.82x)
warp_8x8t_8bpc_c:     4414.7 ( 1.00x)
warp_8x8t_8bpc_rvv:   2465.7 ( 1.79x)

- Banana Pi BPI-F3:
warp_8x8_8bpc_c:      4431.2 ( 1.00x)
warp_8x8_8bpc_rvv:    3297.4 ( 1.34x)
warp_8x8t_8bpc_c:     4299.3 ( 1.00x)
warp_8x8t_8bpc_rvv:   3255.7 ( 1.32x)

Due to using segmented indexed loads, this function currently doesn't give as major a boost to current hardware as the hardware itself seems to have a penalty for these loads. New implementations might reap more benefits.

riscv64/mc: warp_8x8 and warp_8x8t 8bpc

Merge request reports