mem: fix unaligned allocation using malloc() for 64-bit platforms (!1794) · Merge requests · VideoLAN / dav1d

For 64-bit processors, it is better to align all allocated memory blocks.

I do not exclude that for ARM with their NEON instructions, it may be necessary to select a different value, but in the case of x64, (64-byte, 32-byte) alignment is often optimized by AVX compiler with instructions.

SSE with 16-byte aligment.

So it is advisable to test on older PCs where AVX is not available, perhaps 16-byte alignment will be better there.

Compiler does not optimize such things, as they can disrupt behavior application.

Im testing on latest GCC (gcc version 14.2.0 (Debian 14.2.0-19)) with -O3 optimize flag

1080p increase ~1.5%
4K increase ~1%

Please do not be lazy to test other architectures in the same way, I tested it on a common x64.

Benchmarks

Master

debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"                                                  
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
  Time (mean ± σ):     24.515 s ±  0.154 s    [User: 190.221 s, System: 0.671 s]
  Range (min … max):   24.301 s … 24.821 s    10 runs                                                                                                                                                             
                                                                                                                                                                                                                  
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8"                                                  
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8
  Time (mean ± σ):     32.224 s ±  0.173 s    [User: 256.601 s, System: 0.819 s]
  Range (min … max):   32.000 s … 32.546 s    10 runs

PR

debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
  Time (mean ± σ):     24.207 s ±  0.155 s    [User: 187.911 s, System: 0.717 s]
  Range (min … max):   23.949 s … 24.394 s    10 runs
 
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8
  Time (mean ± σ):     31.880 s ±  0.070 s    [User: 253.965 s, System: 0.775 s]
  Range (min … max):   31.790 s … 31.999 s    10 runs

References

https://fylux.github.io/2017/07/11/Memory_Alignment/

https://stackoverflow.com/questions/32139051/what-are-benefits-of-allocating-a-page-aligned-memory-chunk

https://www.reddit.com/r/C_Programming/comments/1b14jht/when_should_someone_use_aligned_alloc_over_malloc/

Edited Jun 04, 2025 by Herman Semenoff

mem: fix unaligned allocation using malloc() for 64-bit platforms

Benchmarks

Master

PR

References

Merge request reports