AArch64: Optimize the init of DotProd+ 2D subpel filters

Side note - if the output here after shrn is 64 bit vectors, we technically should have used mov .8b before. But that probably wouldn't have made any difference performance wise anyway.

True, thanks! I will be more aware of these! (Some CPUs however can only do zero latency move/copy on full registers.)

LGTM, nice!

approved this merge request

requested review from @mstorsjo

changed milestone to %1.4.2

added 4 commits

85c1a213...2d2c6c65 - 3 commits from branch videolan:master
a6d57b11 - AArch64: Optimize the init of DotProd+ 2D subpel filters

Compare with previous version

enabled an automatic merge when the pipeline for a6d57b11 succeeds

added ARM performance labels

merged

         add             \src, \src, #2
         bl              L(\type\()_hv_filter4_\isa)
         mov             v16.16b, v22.16b
         shrn            v16.4h, v22.4s, #2

AArch64: Optimize the init of DotProd+ 2D subpel filters

Merge request reports

Activity