Skip to content

refmvs/recon_tmpl: optimization branch prediction for 64bit cpu, simplify loops, replace macros

Herman Semenoff requested to merge GermanAizek/dav1d:optimize-refmvs-find into master

Debian clang version 19.1.7 (3) with Release configuration and -O3 flag

4k has not been tested yet. Same hardware server motherboard with 2xE5-2699

Shall we break away even further from rav1d? These changes can be quickly code reviewed and merged.

  • 1080p increase ~1% on my hardware

Who is not too lazy to test on your hardware.

master:

debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
  Time (mean ± σ):     24.641 s ±  0.149 s    [User: 191.796 s, System: 0.730 s]
  Range (min … max):   24.437 s … 24.980 s    10 runs
  
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72
  Time (mean ± σ):     40.976 s ±  0.219 s    [User: 491.213 s, System: 11.112 s]
  Range (min … max):   40.610 s … 41.293 s    10 runs

PR:

debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
  Time (mean ± σ):     24.537 s ±  0.111 s    [User: 190.968 s, System: 0.718 s]
  Range (min … max):   24.322 s … 24.748 s    10 runs
 
debian@debian-lenovo:~/GIT/dav1d/buildDir$ hyperfine --warmup 2 "tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72"
Benchmark 1: tools/dav1d -q -i ~/GIT/dav1d/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 72
  Time (mean ± σ):     40.812 s ±  0.131 s    [User: 489.487 s, System: 11.118 s]
  Range (min … max):   40.664 s … 41.026 s    10 runs

Merge request reports

Loading