- Jun 08, 2025
-
-
hpel_filter_c: 47995 hpel_filter_neon: 9670 hpel_filter_i8mm: 9643 previously: hpel_filter_neon: 10222 In the Neon implementation, replaced SSHR+SUB+ADD with a single SSRA
-
- Jun 06, 2025
-
-
The _wstati64 call succeeds on named pipes, so to check correctly you must first check the result of WaitNamedPipeW.
-
Martin Storsjö authored
The performance counters themselves are accessible, but the PMNC (control register) that we try to read to see if the performance counters are accessible, is not readable, causing illegal instructions in cpu_enable_armv7_counter. As an alternative, we could also modify cpu_fast_neon_mrc_test to not inspect the PMNC at all (skip calling cpu_enable_armv7_counter) but just assume that the counters are available, in high resolution mode. However just not calling this codepath is the simplest, as Windows on 32 bit ARM isn't very relevant these days.
-
- May 27, 2025
-
-
- May 21, 2025
-
-
This updates to the version from commit 7380ac24e1cd23a5e6d76c6af083d8fc5ab9e943 from https://github.com/ffmpeg/gas-preprocessor. The previous version was from 2017, from commit ee12830747ff0b97ec6b41f4263fec63d1711365. This includes support for assembling aarch64 code with register ranges, such as {v0.8b-v3.8b} with armasm64 (rewritten into an explicit list of registers), and fixes deprecated Perl syntax broken by more modern versions of Perl.
-
- May 18, 2025
-
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
libpostproc has been removed from the ffmpeg repository.
-
- Apr 04, 2025
-
-
Martin Storsjö authored
-
- Mar 12, 2025
-
-
Konstantinos Margaritis authored
Provide implementations for functions using the instructions SDOT/UDOT in the DotProd Armv8 extension. Functions implemented: sad_16x8, sad_16x16, sad_x3_16x8_neon, sad_x3_16x16_neon, sad_x4_16x8_neon, sad_x4_16x16_neon, ssd_8x4, ssd_8x8, ssd_8x16, ssd_16x8, ssd_16x16, pixel_vsad Performance improvement against Neon ranges from 5% to 188%. Following is the output of ./checkasm8 --bench (run on a Graviton4 system): sad_16x8_c: 1323 sad_16x8_neon: 224 sad_16x8_dotprod: 211 sad_16x16_c: 2619 sad_16x16_neon: 365 sad_16x16_dotprod: 320 sad_x3_16x8_c: 3836 sad_x3_16x8_neon: 403 sad_x3_16x8_dotprod: 317 sad_x3_16x16_c: 7725 sad_x3_16x16_neon: 714 sad_x3_16x16_dotprod: 532 sad_x4_16x8_c: 5080 sad_x4_16x8_neon: 438 sad_x4_16x8_dotprod: 375 sad_x4_16x16_c: 10260 sad_x4_16x16_neon: 794 sad_x4_16x16_dotprod: 655 ssd_8x4_c: 381 ssd_8x4_neon: 157 ssd_8x4_dotprod: 115 ssd_8x4_sve: 150 ssd_8x8_c: 695 ssd_8x8_neon: 238 ssd_8x8_dotprod: 161 ssd_8x8_sve: 228 ssd_8x16_c: 1335 ssd_8x16_neon: 388 ssd_8x16_dotprod: 267 ssd_16x8_c: 1342 ssd_16x8_neon: 285 ssd_16x8_dotprod: 166 ssd_16x16_c: 2623 ssd_16x16_neon: 503 ssd_16x16_dotprod: 277 vsad_c: 2786 vsad_neon: 311 vsad_dotprod: 235
-
Martin Storsjö authored
-
Martin Storsjö authored
Also add code for detecting them on Linux.
-
Martin Storsjö authored
-
Martin Storsjö authored
By using .arch_extension (if supported) to enable the relevant extensions, we can also disable them afterwards, so we can e.g. cleanly enable one extension only for one subsection of a file. This also makes it easier to enable various combinations of supported architecture extensions.
-
Martin Storsjö authored
This hasn't been needed for SVE/SVE2, as all toolchains have supported just enabling it via ".arch armv8.2-a+sve". For other arch extensions, like dotprod/i8mm, there's more combinations of toolchain bugs in slightly older toolchains; try to detect what is supported. Additionally, when involving more than one architecture extension, we may want to enable/disable individual extensions one at a time, without needing to specify the full list in one single .arch statement. This is a preparatory commit for adding support for the dotprod/i8mm extensions. We intentionally don't add AS_ARCH_LEVEL to the CONFIG_HAVE list, as this define isn't prefixed with "HAVE_", and we don't use the define except in the case where we actually do set it. (It's not a regular 0/1 define like the others.)
-
Martin Storsjö authored
This requires adding the "-c" flag to ASFLAGS before doing the check. This also makes sure to validate the gas-preprocessor is functional for MSVC configurations, by testing whether the "cmeq" instruction can be assembled at this point.
-
Martin Storsjö authored
This is more correct than using cc_check; we're going to assemble standalone external assembly - thus check for whether we can build it in that form, not using inline assembly. This allows sharing checks with the MSVC codepath (where inline assembly isn't supported, and where assembly is built using a tool different from the regular compiler).
-
- Mar 11, 2025
-
-
Martin Storsjö authored
This updates the dependecy information on each successive recompile. When building with MSVC, dependency information is generated with a separate command just like before, but done together with compiling each object file. (This is quite similar to how ffmpeg does the same.) This avoids the serial dependency generation step. In slow environments (in particular if using MSVC) it could take a notable amount of time; this can now all be done in parallel. In one example, this reduces the time for a full build from clean with MSVC (wrapped in wine) from 23 seconds down to 9 seconds, thanks to parallelism. (For non-parallel builds, it doesn't make much of a difference.)
-
- Mar 04, 2025
-
-
Martin Storsjö authored
Previously, MSVC would warn that the .S source is unrecognized, and the script would only produce a depenency on the main source file itself.
-
- Jan 03, 2025
-
-
Anton Mitrofanov authored
-
- Dec 29, 2024
-
-
Brad Smith authored
https://android.googlesource.com/platform/bionic/+/72e6fd42421dca80fb2776a9185c186d4a04e5f7 Android has had sched_getaffinity since Android 3.0. Builds need to use _GNU_SOURCE.
-
Martin Storsjö authored
-
Brad Smith authored
Use __sync_fetch_and_add() wherever detected instead of being limited to just X86.
-
Use of hw.ncpu has long been deprecated.
-
- Nov 04, 2024
-
-
Brad Smith authored
-
- Oct 27, 2024
-
-
Brad Smith authored
Make use of _SC_NPROCESSORS_ONLN if it exists and fallback to _SC_NPROCESSORS_CONF for really old operating systems. This adds support for retrieving the number of CPUs on a few OS's such as NetBSD, DragonFly and a few others.
-
- Oct 26, 2024
-
-
- Oct 22, 2024
-
-
Anton Mitrofanov authored
fseeko() is not available before API 24 with _FILE_OFFSET_BITS=64. x264.c: x264cli.h must be first as it contains _FILE_OFFSET_BITS define.
-
- Oct 20, 2024
-
-
Brad Smith authored
-
- Oct 17, 2024
-
-
Brad Smith authored
-
- Oct 07, 2024
-
-
Brad Smith authored
-
- Sep 17, 2024
-
-
Martin Storsjö authored
This is mostly supported in armasm64 since MSVC 2022 17.10.
-
- May 13, 2024
-
-
Henrik Gramner authored
PLT/GOT indirections are required in some cases. Most commonly when calling functions from other shared libraries, but also in some scenarios when calling functions with default symbol visibility even within the same component on certain elf64 platforms. On elf64 we can simply use PLT relocations for all calls to external functions. Since the linker is able to eliminate unnecessary PLT indirections with the final output binary being identical to non-PLT relocations there isn't really any downside to doing so. This mimics what regular compilers normally do for calls to external functions. On elf32 with PIC we can use a function pointer from the GOT when calling external functions, similar to what regular compilers do when using -fno-plt. Since this both introduces overhead and clobbers one register, which could potentially have been used for custom calling conventions when calling other asm functions within the same library, it's only performed for functions declared using 'cextern_naked'.
-
- Mar 21, 2024
-
- Mar 14, 2024
-
-
Prior to this change dealing with the scenario where the number of XMM registers spilled depends on if a branch is taken or not was complicated to handle well. There was essentially three options: 1) Always spill the largest number of XMM register. Results in unnecessary spills. 2) Do the spilling after the branch. Results in code duplication for the shared subset of spills. 3) Do the spilling manually. Optimal, but overly complex and vexing. This adds an additional optional argument to the WIN64_SPILL_XMM and WIN64_PUSH_XMM macros to make it possible to allocate space for a certain number of registers but initially only push a subset of those, with the option of pushing additional register later.
-
Allows the use of multiple independent stack allocations within a function without having to manually fiddle with stack offsets.
-
-
- Mar 12, 2024
-
-
Anton Mitrofanov authored
Use correct return type for pixel_sad_x3/x4 functions. Bug report by Dominik 'Rathann' Mierzejewski .
-