Skip to content

Memory usage reductions 2.0

Henrik Gramner requested to merge gramner/dav1d:memory_reduction2 into master

Decoding Chimera 1080p 8bpc with --threads 16 --framedelay 4, before and after:

 Type                    Allocs    Reuses    Share    Peak size
---------------------------------------------------------------------
 Palette data                16         0    14.9%     17 694 720
---------------------------------------------------------------------
                           9101     49096             118 416 488

 Type                    Allocs    Reuses    Share    Peak size
---------------------------------------------------------------------
 Palette data                16         0     8.1%      8 847 360
---------------------------------------------------------------------
                           9101     49096             109 569 256

Checkasm numbers for the new pal_idx_finish function on x86-64:

pal_idx_finish_w4_c:              41.8 ( 1.00x)
pal_idx_finish_w4_ssse3:           9.1 ( 4.62x)
pal_idx_finish_w4_avx2:            9.5 ( 4.38x)
pal_idx_finish_w4_avx512icl:       9.4 ( 4.44x)

pal_idx_finish_w8_c:              85.6 ( 1.00x)
pal_idx_finish_w8_ssse3:          11.5 ( 7.44x)
pal_idx_finish_w8_avx2:           11.3 ( 7.57x)
pal_idx_finish_w8_avx512icl:      11.0 ( 7.79x)

pal_idx_finish_w16_c:            162.5 ( 1.00x)
pal_idx_finish_w16_ssse3:         29.3 ( 5.54x)
pal_idx_finish_w16_avx2:          17.9 ( 9.08x)
pal_idx_finish_w16_avx512icl:     16.4 ( 9.90x)

pal_idx_finish_w32_c:            202.8 ( 1.00x)
pal_idx_finish_w32_ssse3:         61.1 ( 3.32x)
pal_idx_finish_w32_avx2:          36.9 ( 5.49x)
pal_idx_finish_w32_avx512icl:     20.4 ( 9.94x)

pal_idx_finish_w64_c:            336.0 ( 1.00x)
pal_idx_finish_w64_ssse3:        120.2 ( 2.80x)
pal_idx_finish_w64_avx2:          82.1 ( 4.09x)
pal_idx_finish_w64_avx512icl:     42.1 ( 7.97x)

Merge request reports