- 10 Apr, 2019 1 commit
-
-
Xuefeng Jiang authored
intra_pred_paeth_w4_8bpc_c: 561.6 intra_pred_paeth_w4_8bpc_ssse3: 49.2 intra_pred_paeth_w8_8bpc_c: 1475.8 intra_pred_paeth_w8_8bpc_ssse3: 103.0 intra_pred_paeth_w16_8bpc_c: 4697.8 intra_pred_paeth_w16_8bpc_ssse3: 279.0 intra_pred_paeth_w32_8bpc_c: 13245.1 intra_pred_paeth_w32_8bpc_ssse3: 614.7 intra_pred_paeth_w64_8bpc_c: 32638.9 intra_pred_paeth_w64_8bpc_ssse3: 1477.6
-
- 08 Apr, 2019 1 commit
-
-
Martin Storsjö authored
This eases disambiguating these functions when looking at perf profiles.
-
- 07 Apr, 2019 1 commit
-
-
Martin Storsjö authored
The width register has been set to clz(w)-24, not the other way around. And the 32 bit prep function has got the h parameter in r4, not in r5.
-
- 04 Apr, 2019 2 commits
-
-
Martin Storsjö authored
For cases with indented, nested .if/.macro in asm.S, ident those by 4 chars. Some initial assembly files were indented to 4/16 columns, while all the actual implementation files, starting with src/arm/64/mc.S, have used 8/24 for indentation.
-
Xuefeng Jiang authored
cfl_ac_444_w4_8bpc_c: 978.2 cfl_ac_444_w4_8bpc_ssse3: 110.4 cfl_ac_444_w8_8bpc_c: 2312.3 cfl_ac_444_w8_8bpc_ssse3: 197.5 cfl_ac_444_w16_8bpc_c: 4081.1 cfl_ac_444_w16_8bpc_ssse3: 274.1 cfl_ac_444_w32_8bpc_c: 9544.3 cfl_ac_444_w32_8bpc_ssse3: 617.1
-
- 28 Mar, 2019 5 commits
-
-
Henrik Gramner authored
-
Victorien Le Couviour--Tuffet authored
Port of 65ee1233 for AVX-2 from Kyle Siefring to SSE41, and optimize SSSE3. --------------------- x86_64: ------------------------------------------ before: cdef_dir_8bpc_ssse3: 110.3 after: cdef_dir_8bpc_ssse3: 105.9 new: cdef_dir_8bpc_sse4: 96.4 ------------------------------------------ --------------------- x86_32: ------------------------------------------ before: cdef_dir_8bpc_ssse3: 120.6 after: cdef_dir_8bpc_ssse3: 110.7 new: cdef_dir_8bpc_sse4: 106.5 ------------------------------------------
-
Victorien Le Couviour--Tuffet authored
Port of c204da0f for AVX-2 from Kyle Siefring. --------------------- x86_64: ------------------------------------------ before: cdef_filter_4x4_8bpc_ssse3: 141.7 after: cdef_filter_4x4_8bpc_ssse3: 131.6 before: cdef_filter_4x4_8bpc_sse4: 128.3 after: cdef_filter_4x4_8bpc_sse4: 119.0 ------------------------------------------ before: cdef_filter_4x8_8bpc_ssse3: 253.4 after: cdef_filter_4x8_8bpc_ssse3: 236.1 before: cdef_filter_4x8_8bpc_sse4: 228.5 after: cdef_filter_4x8_8bpc_sse4: 213.2 ------------------------------------------ before: cdef_filter_8x8_8bpc_ssse3: 429.6 after: cdef_filter_8x8_8bpc_ssse3: 386.9 before: cdef_filter_8x8_8bpc_sse4: 379.9 after: cdef_filter_8x8_8bpc_sse4: 335.9 ------------------------------------------ --------------------- x86_32: ------------------------------------------ before: cdef_filter_4x4_8bpc_ssse3: 184.3 after: cdef_filter_4x4_8bpc_ssse3: 163.3 before: cdef_filter_4x4_8bpc_sse4: 168.9 after: cdef_filter_4x4_8bpc_sse4: 146.1 ------------------------------------------ before: cdef_filter_4x8_8bpc_ssse3: 335.3 after: cdef_filter_4x8_8bpc_ssse3: 280.7 before: cdef_filter_4x8_8bpc_sse4: 305.1 after: cdef_filter_4x8_8bpc_sse4: 257.9 ------------------------------------------ before: cdef_filter_8x8_8bpc_ssse3: 579.1 after: cdef_filter_8x8_8bpc_ssse3: 500.5 before: cdef_filter_8x8_8bpc_sse4: 517.0 after: cdef_filter_8x8_8bpc_sse4: 455.8 ------------------------------------------
-
Victorien Le Couviour--Tuffet authored
Port of dc2ae517 for AVX-2 from Kyle Siefring. --------------------- x86_64: ------------------------------------------ cdef_filter_4x4_8bpc_ssse3: 141.7 cdef_filter_4x4_8bpc_sse4: 128.3 ------------------------------------------ cdef_filter_4x8_8bpc_ssse3: 253.4 cdef_filter_4x8_8bpc_sse4: 228.5 ------------------------------------------ cdef_filter_8x8_8bpc_ssse3: 429.6 cdef_filter_8x8_8bpc_sse4: 379.9 ------------------------------------------ --------------------- x86_32: ------------------------------------------ cdef_filter_4x4_8bpc_ssse3: 184.3 cdef_filter_4x4_8bpc_sse4: 168.9 ------------------------------------------ cdef_filter_4x8_8bpc_ssse3: 335.3 cdef_filter_4x8_8bpc_sse4: 305.1 ------------------------------------------ cdef_filter_8x8_8bpc_ssse3: 579.1 cdef_filter_8x8_8bpc_sse4: 517.0 ------------------------------------------
-
Victorien Le Couviour--Tuffet authored
-
- 27 Mar, 2019 1 commit
-
-
Liwei Wang authored
Cycle times: inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6 inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6 inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6 inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9 inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0 inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2 inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6 inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3 inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6 inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0 inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7 inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7 inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6 inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9 inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9 inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9 inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4 inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7 inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4 inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0 inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4 inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3 inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9 inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1 inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8 inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4 inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9 inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0 inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7 inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0 inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7 inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3 inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5 inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5 inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2 inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9 inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0 inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0 inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0 inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0 inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2 inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0 inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2 inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0 inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2 inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2 inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1 inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4 inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1 inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9 inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1 inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0 inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0 inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2 inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4 inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1 inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5 inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8 inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6 inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
-
- 26 Mar, 2019 1 commit
-
-
Henrik Gramner authored
-
- 24 Mar, 2019 2 commits
-
-
Martin Storsjö authored
As meson still doesn't allow specifying different cflags between static and dynamic libraries, this still includes the dllexport in the static library when built with default_library=both, but it at least is avoided in static-only builds, and avoids defining these symbols as dllexport in the callers' translation units.
-
Henrik Gramner authored
The second shift is constant.
-
- 20 Mar, 2019 1 commit
-
-
Henrik Gramner authored
-
- 19 Mar, 2019 1 commit
-
-
Liwei Wang authored
Cycle times: inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7 inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5 inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6 inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5 inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4 inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9 inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5 inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2 inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3 inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9 inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0 inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7 inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7 inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7 inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7 inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8 inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5 inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4 inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1 inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8 inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4 inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6 inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7 inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8 inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3 inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7 inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3 inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1 inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8 inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5 inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1 inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6 inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1 inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3 inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6 inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3 inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3 inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2 inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0 inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
-
- 18 Mar, 2019 1 commit
-
-
Xuefeng Jiang authored
cfl_ac_420_w4_8bpc_c: 1621.0 cfl_ac_420_w4_8bpc_ssse3: 92.5 cfl_ac_420_w8_8bpc_c: 3344.1 cfl_ac_420_w8_8bpc_ssse3: 115.4 cfl_ac_420_w16_8bpc_c: 6024.9 cfl_ac_420_w16_8bpc_ssse3: 187.8 cfl_ac_422_w4_8bpc_c: 1762.5 cfl_ac_422_w4_8bpc_ssse3: 81.4 cfl_ac_422_w8_8bpc_c: 4941.2 cfl_ac_422_w8_8bpc_ssse3: 166.5 cfl_ac_422_w16_8bpc_c: 8261.8 cfl_ac_422_w16_8bpc_ssse3: 272.3
-
- 16 Mar, 2019 2 commits
-
-
James Almer authored
This check was already done in dav1d_parse_obus(), so it's added as an assert here for extra precaution.
-
James Almer authored
Its previous contents don't need to be preserved.
-
- 14 Mar, 2019 2 commits
-
-
Janne Grunau authored
-
Janne Grunau authored
Fixes tests on big endian architectures.
-
- 13 Mar, 2019 1 commit
-
-
Jean-Baptiste Kempf authored
-
- 12 Mar, 2019 1 commit
-
-
James Almer authored
And the API version as the file version.
-
- 11 Mar, 2019 5 commits
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
This optimization is so small 10 runs with a fixed seed were needed to get some relevant numbers. This has been done for 3x3 case only. before: mean=113265.42 stddev=954.392 after: mean=112654.71 stddev=884.833
-
Victorien Le Couviour--Tuffet authored
This optimization is so tiny we can't even see it in checkasm. The only actual difference being the removal of a memory load, it has to be better.
-
Jean-Baptiste Kempf authored
-
- 09 Mar, 2019 2 commits
-
-
Janne Grunau authored
Refs #241, Closes #255.
-
Jean-Baptiste Kempf authored
-
- 08 Mar, 2019 2 commits
-
-
Janne Grunau authored
Increments the soname revision number for this behavior change. Removes the DAV1D_VERSION and DAV1D_VERSION_INT defines and dav1d_version_vcs() and dav1d_version_int(). Also cleans up the version usage in dav1d CLI. Refs #241, #255.
-
Victorien Le Couviour--Tuffet authored
```------------------ x86_64: ``` --------------------------------------- cdef_dir_8bpc_c: 1023.1 cdef_dir_8bpc_ssse3: 110.3 cdef_dir_8bpc_avx2: 71.1 ------------------------------------------ --------------------- x86_32: ------------------------------------------ cdef_dir_8bpc_c: 1074.8 cdef_dir_8bpc_ssse3: 120.6 ------------------------------------------ Thanks to Ronald for the AVX2 XMM version which was a very good starting point.
-
- 06 Mar, 2019 3 commits
-
-
Martin Storsjö authored
-
Kyle Siefring authored
Before: cdef_filter_8x8_8bpc_avx2: 252.3 cdef_filter_4x8_8bpc_avx2: 182.1 cdef_filter_4x4_8bpc_avx2: 105.7 After: cdef_filter_8x8_8bpc_avx2: 235.5 cdef_filter_4x8_8bpc_avx2: 174.8 cdef_filter_4x4_8bpc_avx2: 101.8
-
Martin Storsjö authored
-
- 05 Mar, 2019 4 commits
-
-
Martin Storsjö authored
This might have said pri_taps[k]/sec_taps[k] at some earlier time.
-
Martin Storsjö authored
Pad with a value which works both as a large unsigned value and a negative signed value. This allows doing the max operation using signed max, avoiding the conditional altogether. Based on the same idea for x86 by Kyle Siefring. Before: Cortex A53 A72 A73 cdef_filter_4x4_8bpc_neon: 645.5 401.9 422.5 cdef_filter_4x8_8bpc_neon: 1193.7 756.6 782.4 cdef_filter_8x8_8bpc_neon: 2162.4 1361.9 1375.6 After: cdef_filter_4x4_8bpc_neon: 596.3 377.8 384.8 cdef_filter_4x8_8bpc_neon: 1097.4 705.5 707.1 cdef_filter_8x8_8bpc_neon: 1967.4 1232.3 1239.9
-
Martin Storsjö authored
Before: Cortex A53 A72 A73 cdef_filter_4x4_8bpc_neon: 677.4 433.9 452.9 cdef_filter_4x8_8bpc_neon: 1255.0 815.2 841.8 cdef_filter_8x8_8bpc_neon: 2278.5 1440.0 1505.0 After: cdef_filter_4x4_8bpc_neon: 645.5 401.9 422.5 cdef_filter_4x8_8bpc_neon: 1193.7 756.6 782.4 cdef_filter_8x8_8bpc_neon: 2162.4 1361.9 1375.6
-
Kyle Siefring authored
Before: ``` cdef_filter_8x8_8bpc_avx2: 275.5 cdef_filter_4x8_8bpc_avx2: 193.3 cdef_filter_4x4_8bpc_avx2: 113.5 ``` After: ``` cdef_filter_8x8_8bpc_avx2: 252.3 cdef_filter_4x8_8bpc_avx2: 182.1 cdef_filter_4x4_8bpc_avx2: 105.7 ```
-
- 04 Mar, 2019 1 commit
-
-
Jean-Baptiste Kempf authored
-