Commits on Source (33)
-
Jean-Baptiste Kempf authored07dab8cb
-
Ronald S. Bultje authored
Before: gen_grain_uv_ar2_8bpc_420_avx2: 29176.2 After: gen_grain_uv_ar2_8bpc_420_avx2: 26794.0
a1647a59 -
Martin Storsjö authored
Otherwise the macro would interfere with local labels 1 and 2 in the context where the macro is expanded.
c02ec6cf -
Martin Storsjö authored
The code is a fairly exact 1:1 port of the ARM64 code, but operating on 8 pixels at a time, instead of 16. Relative speedup over C code according to checkasm: Cortex A7 A8 A9 A53 A72 A73 lpf_h_sb_uv_w4_8bpc_neon: 1.36 1.40 1.25 1.71 1.55 1.59 lpf_h_sb_uv_w6_8bpc_neon: 2.18 2.11 1.74 2.65 2.32 2.34 lpf_h_sb_y_w4_8bpc_neon: 1.48 1.43 1.20 1.91 1.49 1.64 lpf_h_sb_y_w8_8bpc_neon: 2.34 2.05 1.78 2.84 2.35 2.69 lpf_h_sb_y_w16_8bpc_neon: 2.13 1.83 1.63 2.51 2.10 2.35 lpf_v_sb_uv_w4_8bpc_neon: 1.69 1.66 1.60 2.16 2.24 2.24 lpf_v_sb_uv_w6_8bpc_neon: 2.68 2.43 2.22 3.53 3.44 3.35 lpf_v_sb_y_w4_8bpc_neon: 1.74 1.74 1.43 2.34 2.14 2.18 lpf_v_sb_y_w8_8bpc_neon: 2.92 2.47 2.19 3.55 3.22 3.54 lpf_v_sb_y_w16_8bpc_neon: 2.62 2.19 1.98 3.25 2.80 3.10 Comparison to the original ARM64 assembly: ARM64: A53 A72 A73 lpf_h_sb_uv_w4_8bpc_neon: 702.5 518.2 529.1 lpf_h_sb_uv_w6_8bpc_neon: 1007.3 672.6 736.6 lpf_h_sb_y_w4_8bpc_neon: 1652.8 1261.2 1276.5 lpf_h_sb_y_w8_8bpc_neon: 2144.7 1559.8 1638.7 lpf_h_sb_y_w16_8bpc_neon: 2318.3 1757.2 1792.8 lpf_v_sb_uv_w4_8bpc_neon: 447.1 302.0 292.4 lpf_v_sb_uv_w6_8bpc_neon: 600.0 397.7 406.9 lpf_v_sb_y_w4_8bpc_neon: 1212.6 840.1 818.4 lpf_v_sb_y_w8_8bpc_neon: 1623.3 1167.4 1156.7 lpf_v_sb_y_w16_8bpc_neon: 1694.9 1237.9 1182.3 ARM32: lpf_h_sb_uv_w4_8bpc_neon: 821.2 501.1 500.8 lpf_h_sb_uv_w6_8bpc_neon: 1232.0 715.7 746.6 lpf_h_sb_y_w4_8bpc_neon: 2208.1 1373.2 1414.7 lpf_h_sb_y_w8_8bpc_neon: 3138.3 1843.1 1915.2 lpf_h_sb_y_w16_8bpc_neon: 3293.1 1842.5 1975.9 lpf_v_sb_uv_w4_8bpc_neon: 619.9 326.7 324.9 lpf_v_sb_uv_w6_8bpc_neon: 855.9 446.7 468.2 lpf_v_sb_y_w4_8bpc_neon: 1737.6 935.5 1007.0 lpf_v_sb_y_w8_8bpc_neon: 2346.7 1232.8 1298.3 lpf_v_sb_y_w16_8bpc_neon: 2353.4 1283.4 1379.9
9a100261 -
Martin Storsjö authoredabd07c67
-
Martin Storsjö authored
This doesn't change performance measurably, but eases potential future maintainance of the code.
3069ab94 -
Martin Storsjö authored
This removes one redundant instruction for loop filters smaller than 16.
564482b6 -
Martin Storsjö authored
This was requested in the review of the arm32 version of the same.
dcbbf775 -
A73 A53 Earlier Now Earlier Now intra_pred_dc_top_w64_8bpc_neon: 344.4 344.6 253.4 252.3
91d324eb -
Enforces software engineering best practices
5dc8503f -
Ronald S. Bultje authoredfc968cc9
-
Ronald S. Bultje authored
Fixes #309.
564d3d91 -
7f30c67f
-
Ronald S. Bultje authored4bf52cb5
-
Ronald S. Bultje authored
Fixes #304.
eb4a8f6d -
Ronald S. Bultje authored
This allows auto-detection between section5 and annexb files, which share the same extension.
46d092ae -
Martin Storsjö authored
Should fix failures of 'section5' sample on 32-bit systems.
4d9c990e -
Ronald S. Bultje authored
Prevents the following compiler warning: ../src/decode.c:1979:32: warning: implicit conversion loses integer precision: 'const ptrdiff_t' (aka 'const long') to 'int' [-Wshorten-64-to-32] const int stride = f->cur.stride[!!p]; ~~~~~~ ^~~~~~~~~~~~~~~~~~ 1 warning generated.
35d3d2b6 -
Ronald S. Bultje authoredc99c27ea
-
Martin Storsjö authored
This fixes these warnings with MSVC: warning C4267: '+=': conversion from 'size_t' to 'int', possible loss of data
52c7427e -
The latter is marked as obsolete by POSIX.
59a28b19 -
This enables releasing stable versions on the snap store.
5e8eccf2 -
Applying non-zero offset to a NULL pointer is undefined behavior
b9a43c60 -
eaa3be9a
-
cf4b381a
-
tests/checkasm/checkasm.c:55:5: warning: implicit declaration of function 'gettimeofday' is invalid in C99 [-Wimplicit-function-declaration] gettimeofday(&tv, NULL); ^
a52459b3 -
Only a number that changes on every run is required.
f59b5713 -
246c2803
-
Henrik Gramner authoredfc6c2578
-
Also prefer clock_gettime over mach_absolute_time on darwin. clock_gettime is only available in darwin 10.12 and later. Hopefully fixes #283.
2e5e05b7 -
Memory addresses with certain power-of-two offsets will map to the same set of cache lines. Using such offsets as strides will cause excessive cache evictions resulting in more cache misses. Avoid this by adding a small padding when the stride is a multiple of 1024 (somewhat arbitrarily chosen as the specific number depends on the hardware implementation) when allocating picture buffers.
82eda83a -
Martin Storsjö authored162ba33d
-
Jean-Baptiste Kempf authored
src/arm/32/ipred.S
0 → 100644
src/arm/32/loopfilter.S
0 → 100644