Skip to content
Commit 345127a7 authored by Martin Storsjö's avatar Martin Storsjö
Browse files

arm: itx: Add clipping to row_clip_min/max in the 10 bpc codepaths

This fixes conformance with the argon test samples, in particular
with these samples:
    profile0_core/streams/test10100_579_8614.obu
    profile0_core/streams/test10218_6914.obu

This gives a pretty notable slowdown to these transforms - some
examples:

Before:                                 Cortex A53       A72       A73    Apple M1
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:       365.7     290.2     299.8    0.3
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:    1865.2    1384.1    1457.5    2.6
inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   33976.3   26817.0   24864.2   40.4
After:
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:       397.7     322.2     335.1    0.4
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:    2121.9    1336.7    1664.6    2.6
inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   38569.4   27622.6   28176.0   51.0

Thus, for the transforms alone, it makes them around 10-13% slower
(the Apple M1 measurements are too noisy to be conclusive here).

Measured on actual full decoding, it makes decoding of 10 bpc
Chimera around maybe 1% slower on an Apple M1 - close to measurement
noise anyway.
parent 9c74a9b0
Loading
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment