AVX512 SIMD
See Wikipedia. The target will be Ice Lake (F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES). You can find Ice Lake instances (m6i) on Amazon's EC-2.
8-bit:
-
mc -
avg/mask/w_avg (!921 (merged)) -
w_mask (!921 (merged)) -
blend{,_h/v} (!1301 (merged)) -
warp8x8{,t} (!1301 (merged)) -
emu_edge -
8tap put (!1301 (merged)) -
8tap prep (!1301 (merged)) -
bilinear put (!1301 (merged)) -
bilinear prep
-
-
intra_pred -
h/v/dc/dc_128 (!1301 (merged)) -
paeth (!1301 (merged)) -
smooth{,_h/v} (!1301 (merged)) -
z1 (!1562 (merged)) -
z2 (!1570 (merged)) -
z3 (!1566 (merged)) -
filter (!1301 (merged)) -
cfl_ac -
4:2:0 -
4:4:4 -
4:2:2
-
-
cfl_pred -
pal_pred (!1301 (merged))
-
-
itx (!1301 (merged)) -
deblock -
CDEF -
dir -
filter (!905 (merged), !932 (merged)),
-
-
loop restoration (!1301 (merged)) -
SVC/super_res -
mc.scaled_put/prep -
mc.resize (!1355 (merged), @psilokos)
-
-
grain (!1374 (merged) ) -
generate_grain_y -
generate_grain_uv_420/422/444 -
fgy_32x32xn -
fguv_32x32xn_420/422/444
-
10/12-bit:
-
mc (!1314 (merged)) -
avg/mask/w_avg -
w_mask -
blend{,_h/v} -
warp8x8{,t} -
emu_edge -
8tap put -
8tap prep -
bilinear put -
bilinear prep
-
-
intra_pred -
h/v/dc/dc_128 -
paeth (!1363 (merged)) -
smooth{,_h/v} (!1363 (merged)) -
z1 (!1572 (merged)) -
z2 (!1590 (merged)) -
z3 (!1580 (merged)) -
filter (!1363 (merged)) -
cfl_ac -
4:2:0 -
4:4:4 -
4:2:2
-
-
cfl_pred -
pal_pred (!1363 (merged))
-
-
itx -
10-bit -
8x8, 8x16, 16x8, 16x16 (!1454 (merged), @gramner) -
8x32, 32x8 (!1466 (merged), @gramner) -
16x32, 32x16, 32x32 (!1475 (merged), @gramner) -
16x64 (!1503 (merged), @rbultje), 64x16 (!1509 (merged)), 32x64 (!1504 (merged)), 64x32 (!1510 (merged)), 64x64 (!1512 (merged))
-
-
12-bit
-
-
deblock (!1427 (merged), @gramner) -
CDEF -
dir -
filter (!1421 (merged))
-
-
loop restoration -
wiener (!1320 (merged)) -
SGR -
10-bit (!1327 (merged)) -
12-bit
-
-
-
SVC/super_res -
mc.scaled_put/prep -
mc.resize (!1355 (merged), @psilokos)
-
-
grain (@gramner, !1396 (merged)) -
generate_grain_y -
generate_grain_uv_420/422/444 -
fgy_32x32xn -
fguv_32x32xn_420/422/444
-