Skip to content
Snippets Groups Projects
Commit 79db1624 authored by Kyle Siefring's avatar Kyle Siefring Committed by Martin Storsjö
Browse files

AArch64: New method for calculating sgr table

For the 3x3 part, double the width of the vertical loop. This is done to
provide more latency in the new sgr calculation.

Initial (master):  Cortex A53        A55        A72        A73       A76   Apple M1
sgr_3x3_8bpc_neon:   387702.8   383154.2   295742.4   302100.1  185420.7   472.2
sgr_5x5_8bpc_neon:   261725.1   256919.8   194205.1   197585.6  128311.3   332.9
sgr_mix_8bpc_neon:   628085.0   593664.2   453551.8   450553.8  281956.0   711.2

Current:
sgr_3x3_8bpc_neon:   368331.4   363949.7   275499.0   272056.3  169614.4   432.7
sgr_5x5_8bpc_neon:   257866.7   255265.5   195962.5   199557.8  120481.3   319.2
sgr_mix_8bpc_neon:   598234.1   572896.4   418500.4   438910.7  258977.7   659.3

Include a minor improvement that gets rid of a dup instruction.
parent ec5c3052
Loading
Pipeline #510867 passed with stages
in 50 minutes and 33 seconds
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment