refmvs/recon_tmpl: optimization branch prediction for 64bit cpu, simplify loops, replace macros
Debian clang version 19.1.7 (3) with Release configuration and -O3 flag
4k has not been tested yet. Same hardware server motherboard with 2xE5-2699
Shall we break away even further from rav1d? These changes can be quickly code reviewed and merged.
-
1080p increase ~1% on my hardware
Who is not too lazy to test on your hardware.
master:
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
Time (mean ± σ): 24.641 s ± 0.149 s [User: 191.796 s, System: 0.730 s]
Range (min … max): 24.437 s … 24.980 s 10 runs
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72
Time (mean ± σ): 40.976 s ± 0.219 s [User: 491.213 s, System: 11.112 s]
Range (min … max): 40.610 s … 41.293 s 10 runs
PR:
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
Time (mean ± σ): 24.537 s ± 0.111 s [User: 190.968 s, System: 0.718 s]
Range (min … max): 24.322 s … 24.748 s 10 runs
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72
Time (mean ± σ): 40.812 s ± 0.131 s [User: 489.487 s, System: 11.118 s]
Range (min … max): 40.664 s … 41.026 s 10 runs