mem: fix unaligned allocation using malloc() for 64-bit platforms
For 64-bit processors, it is better to align all allocated memory blocks.
I do not exclude that for ARM with their NEON instructions, it may be necessary to select a different value, but in the case of x64, (64-byte, 32-byte) alignment is often optimized by AVX compiler with instructions.
SSE with 16-byte aligment.
So it is advisable to test on older PCs where AVX is not available, perhaps 16-byte alignment will be better there.
Compiler does not optimize such things, as they can disrupt behavior application.
Im testing on latest GCC (gcc version 14.2.0 (Debian 14.2.0-19)) with -O3 optimize flag
-
1080p increase ~1.5% -
4K increase ~1%
Please do not be lazy to test other architectures in the same way, I tested it on a common x64.
Benchmarks
Master
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
Time (mean ± σ): 24.515 s ± 0.154 s [User: 190.221 s, System: 0.671 s]
Range (min … max): 24.301 s … 24.821 s 10 runs
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8
Time (mean ± σ): 32.224 s ± 0.173 s [User: 256.601 s, System: 0.819 s]
Range (min … max): 32.000 s … 32.546 s 10 runs
PR
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
Time (mean ± σ): 24.207 s ± 0.155 s [User: 187.911 s, System: 0.717 s]
Range (min … max): 23.949 s … 24.394 s 10 runs
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/summer_nature_4k.ivf -o /dev/null --threads 8
Time (mean ± σ): 31.880 s ± 0.070 s [User: 253.965 s, System: 0.775 s]
Range (min … max): 31.790 s … 31.999 s 10 runs
References
Edited by Herman Semenoff