Commits on Source (6)
-
This makes `#include <dav1d/dav1d.h>` work correctly as we point to the parent include directory, same as in the normal installation. Also fixes conflict of including "version.h" which may already exist in parent project or another subproject. Be more specific about the headers. Normally it works, but when building as subproject version.h is generated in build directory, so it no longer is prioritized when including from dav1d.h and other header with the same name may be included.
7629402b -
d2687884
-
The reduction parts of the horizontal HBD MC filters use SRSHL+SQXTUN+ SRSHL instruction sequences. In the horizontal case this can be rewritten using a single SQSHRUN instruction with an additional rounding value (34 for 10-bit and 40 for 12-bit). Relative runtime of micro benchmarks after this patch on some Cortex CPU cores: regular: X1 A78 A76 A55 mc w2: 0.847x 0.864x 0.822x 0.859x mc w4: 0.889x 0.994x 0.868x 0.917x mc w8: 0.857x 0.911x 0.915x 0.978x mc w16: 0.890x 0.982x 0.868x 0.974x mc w32: 0.904x 0.991x 0.873x 0.967x mc w64: 0.919x 1.003x 0.860x 0.970x
109b2427 -
The 6-tap horizontal subpel filters can be further improved by some pointer arithmetic and saving some instructions (EXTs) in their data rearrangement codes. Relative runtime of micro benchmarks after this patch on some Cortex CPU cores: regular: X1 A78 A76 A55 mc w8: 0.915x 0.937x 0.900x 0.982x mc w16: 0.917x 0.947x 0.911x 0.971x mc w32: 0.914x 0.938x 0.873x 0.961x mc w64: 0.918x 0.932x 0.882x 0.964x
93339ce8 -
The horizontal parts of 6-tap HV subpel filters can be further improved by some pointer arithmetic and saving some instructions (EXTs) in their data rearrangement codes. Relative runtime of micro benchmarks after this patch on Cortex CPU cores: HBD mct hv X1 A78 A76 A72 A55 regular w8: 0.952x 0.989x 0.924x 0.973x 0.976x regular w16: 0.961x 0.993x 0.928x 0.952x 0.971x regular w32: 0.964x 0.996x 0.930x 0.973x 0.972x regular w64: 0.963x 0.997x 0.930x 0.969x 0.974x
2d808de1 -
The 6-tap horizontal and the horizontal parts of 6-tap HV subpel filters can be further improved by some pointer arithmetic and saving some instructions (EXTs) in their data rearrangement codes. Relative runtime of micro benchmarks after this patch on Cortex CPU cores: SBD mct h X1 A78 A76 A72 A55 regular w8: 0.878x 0.894x 0.990x 0.923x 0.944x regular w16: 0.962x 0.931x 0.943x 0.949x 0.949x regular w32: 0.937x 0.937x 0.972x 0.938x 0.947x regular w64: 0.920x 0.965x 0.992x 0.936x 0.944x SBD mct hv X1 A78 A76 A72 A55 regular w8: 0.931x 0.970x 0.951x 0.950x 0.971x regular w16: 0.940x 0.971x 0.941x 0.952x 0.967x regular w32: 0.943x 0.972x 0.946x 0.961x 0.974x regular w64: 0.943x 0.973x 0.952x 0.944x 0.975x
a992a9be
Showing
- include/dav1d/dav1d.h 4 additions, 4 deletionsinclude/dav1d/dav1d.h
- meson.build 6 additions, 0 deletionsmeson.build
- src/arm/64/mc.S 49 additions, 45 deletionssrc/arm/64/mc.S
- src/arm/64/mc16.S 115 additions, 70 deletionssrc/arm/64/mc16.S
- src/mem.c 2 additions, 16 deletionssrc/mem.c
- src/mem.h 26 additions, 18 deletionssrc/mem.h
- src/meson.build 1 addition, 1 deletionsrc/meson.build