AArch64: New method for calculating sgr table (79db1624) · Commits · VideoLAN / dav1d · GitLab

Snippets Groups Projects

Commit 79db1624 authored 7 months ago by Kyle Siefring Committed by Martin Storsjö 6 months ago

AArch64: New method for calculating sgr table

For the 3x3 part, double the width of the vertical loop. This is done to
provide more latency in the new sgr calculation.

Initial (master): Cortex A53 A55 A72 A73 A76 Apple M1
sgr_3x3_8bpc_neon: 387702.8 383154.2 295742.4 302100.1 185420.7 472.2
sgr_5x5_8bpc_neon: 261725.1 256919.8 194205.1 197585.6 128311.3 332.9
sgr_mix_8bpc_neon: 628085.0 593664.2 453551.8 450553.8 281956.0 711.2

Current:
sgr_3x3_8bpc_neon: 368331.4 363949.7 275499.0 272056.3 169614.4 432.7
sgr_5x5_8bpc_neon: 257866.7 255265.5 195962.5 199557.8 120481.3 319.2
sgr_mix_8bpc_neon: 598234.1 572896.4 418500.4 438910.7 258977.7 659.3

Include a minor improvement that gets rid of a dup instruction.

parent ec5c3052

Pipeline #510867 passed with stages

in 50 minutes and 33 seconds

Hide whitespace changes

Inline Side-by-side

Showing with 179 additions and 112 deletions

Martin Storsjö @mstorsjo
mentioned in merge request !1756
· 4 months ago

mentioned in merge request !1756

mentioned in merge request !1756

Toggle commit list
Martin Storsjö @mstorsjo
mentioned in commit 30c3dd8e
· 4 months ago

mentioned in commit 30c3dd8e

mentioned in commit 30c3dd8eddfe210ee7eff8b96fc70d4fa0b039af

Toggle commit list

Please register or to comment

VideoLAN code repository instance