Commits · master · VideoLAN / x264

Jan 03, 2025
- Bump dates to 2025 · 373697b4
  Anton Mitrofanov authored 2 weeks ago
  
  373697b4
Dec 29, 2024
- Use sched_getaffinity on Android · 52f7694d
  Brad Smith authored 2 months ago
```
https://android.googlesource.com/platform/bionic/+/72e6fd42421dca80fb2776a9185c186d4a04e5f7

Android has had sched_getaffinity since Android 3.0. Builds need
to use _GNU_SOURCE.
```
  52f7694d
- ci: Test compiling for Android · 450946f9
  Martin Storsjö authored 3 weeks ago
  
  450946f9
- Enable use of __sync_fetch_and_add() wherever detected instead of just X86 · a64111b1
  Brad Smith authored 3 weeks ago
```
Use __sync_fetch_and_add() wherever detected instead of being limited to
just X86.
```
  a64111b1
- Use sysctlbyname(3) hw.logicalcpu on macOS · 938601b9
  Brad Smith authored 2 months ago and Anton Mitrofanov committed 3 weeks ago
```
Use of hw.ncpu has long been deprecated.
```
  938601b9
Nov 04, 2024
- aarch64: defines involving bit shifts should be unsigned · 023112c6
  Brad Smith authored 2 months ago
  
  023112c6
Oct 27, 2024

Make use of sysconf(3) _SC_NPROCESSORS_ONLN and _SC_NPROCESSORS_CONF · da14df55

Brad Smith authored 3 months ago

Make use of _SC_NPROCESSORS_ONLN if it exists and fallback to
_SC_NPROCESSORS_CONF for really old operating systems. This adds
support for retrieving the number of CPUs on a few OS's such as
NetBSD, DragonFly and a few others.

da14df55

Oct 26, 2024
- Use getauxval() on Linux and elf_aux_info() on FreeBSD/OpenBSD on arm/ppc · b1d2de88
  Brad Smith authored 3 months ago and Anton Mitrofanov committed 2 months ago
  
  b1d2de88
Oct 22, 2024

Fix build with Android NDK and API < 24 for 32-bit targets · 3a21e97b

Anton Mitrofanov authored 3 months ago

fseeko() is not available before API 24 with _FILE_OFFSET_BITS=64.
x264.c: x264cli.h must be first as it contains _FILE_OFFSET_BITS define.

3a21e97b

Oct 20, 2024
- configure: Add DragonFly support · 80c1c47c
  Brad Smith authored 8 months ago
  
  80c1c47c
Oct 17, 2024
- Provide x264_getauxval() wrapper for getauxvaul() and elf_aux_info() · 1243d9ff
  Brad Smith authored 3 months ago
  
  1243d9ff
Oct 07, 2024
- aarch64: Use elf_aux_info() for CPU feature detection on FreeBSD/OpenBSD · 3a8b5be2
  Brad Smith authored 3 months ago
  
  3a8b5be2
Sep 17, 2024
- configure: Check for SVE support in MS armasm64 via as_check · c24e06c2
  Martin Storsjö authored 10 months ago
```
This is mostly supported in armasm64 since MSVC 2022 17.10.
```
  c24e06c2
May 13, 2024

x86inc: Improve ELF PIC support for external function calls · 4613ac3c

Henrik Gramner authored 8 months ago

PLT/GOT indirections are required in some cases. Most commonly when
calling functions from other shared libraries, but also in some
scenarios when calling functions with default symbol visibility
even within the same component on certain elf64 platforms.

On elf64 we can simply use PLT relocations for all calls to external
functions. Since the linker is able to eliminate unnecessary PLT
indirections with the final output binary being identical to non-PLT
relocations there isn't really any downside to doing so. This mimics
what regular compilers normally do for calls to external functions.

On elf32 with PIC we can use a function pointer from the GOT when
calling external functions, similar to what regular compilers do when
using -fno-plt. Since this both introduces overhead and clobbers one
register, which could potentially have been used for custom calling
conventions when calling other asm functions within the same library,
it's only performed for functions declared using 'cextern_naked'.

4613ac3c

Mar 21, 2024
- loongarch: Enhance ultrafast encoding performance · 7ed753b1
  guxiwei authored 10 months ago and guxiwei committed 10 months ago
```
Using the following command, ultrafast encoding
has improved from 182fps to 189fps:
./x264 --preset ultrafast -o out.mkv yuv_1920x1080.yuv
```
  7ed753b1
- loongarch: Fixed pixel_sa8d_16x16_lasx · 16262286
  guxiwei authored 10 months ago and guxiwei committed 10 months ago
```
Save and restore FPR
```
  16262286
- loongarch: Add checkasm_call · 5a61afdb
  guxiwei authored 10 months ago and guxiwei committed 10 months ago
  
  5a61afdb
- loongarch: Update loongson_asm.S version to 0.4.0 · 982d3240
  guxiwei authored 10 months ago and guxiwei committed 10 months ago
  
  982d3240
Mar 14, 2024

x86inc: Improve XMM-spilling functionality on 64-bit Windows · 585e0199

Henrik Gramner authored 10 months ago and

Henrik Gramner committed 10 months ago

Prior to this change dealing with the scenario where the number of
XMM registers spilled depends on if a branch is taken or not was
complicated to handle well. There was essentially three options:

1) Always spill the largest number of XMM register. Results in
   unnecessary spills.

2) Do the spilling after the branch. Results in code duplication
   for the shared subset of spills.

3) Do the spilling manually. Optimal, but overly complex and vexing.

This adds an additional optional argument to the WIN64_SPILL_XMM
and WIN64_PUSH_XMM macros to make it possible to allocate space
for a certain number of registers but initially only push a subset
of those, with the option of pushing additional register later.

585e0199

x86inc: Restore the stack state between stack allocations · 4df71a75
Henrik Gramner authored 10 months ago and Henrik Gramner committed 10 months ago
```
Allows the use of multiple independent stack allocations within
a function without having to manually fiddle with stack offsets.
```
4df71a75
x86inc: Fix warnings with old nasm versions · 3d8aff7e
Henrik Gramner authored 11 months ago and Henrik Gramner committed 10 months ago

3d8aff7e

Mar 12, 2024

ppc: Fix incompatible pointer type errors · de1bea53

Anton Mitrofanov authored 10 months ago

Use correct return type for pixel_sad_x3/x4 functions.
Bug report by Dominik 'Rathann' Mierzejewski .

de1bea53

Feb 28, 2024

aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux · be4f0200

Martin Storsjö authored 10 months ago and

Anton Mitrofanov committed 10 months ago

This makes the code much simpler (especially for adding support
for other instruction set extensions), avoids needing inline
assembly for this feature, and generally is more of the canonical
way to do this.

The CPU feature detection was added in
9c3c7168, using HWCAP_CPUID.

The argument for using that, was that HWCAP_CPUID was added much
earlier in the kernel (in Linux v4.11), while the HWCAP flags for
individual features always come later. This allows detecting support
for new CPU extensions before the kernel exposes information about
them via hwcap flags.

However in practice, there's probably quite little advantage in this.
E.g. HWCAP_SVE was added in Linux v4.15, and HWCAP2_SVE2 was added in
v5.10 - later than HWCAP_CPUID, but there's probably very little
practical cases where one would run a kernel older than that on a CPU
that supports those instructions.

Additionally, we provide our own definitions of the flag values to
check (as they are fixed constants anyway), with names not conflicting
with the ones from system headers. This reduces the number of ifdefs
needed, and allows detecting those features even if building with
userland headers that are lacking the definitions of those flags.

Also, slightly older versions of QEMU, e.g. 6.2 in Ubuntu 22.04,
do expose support for these features via HWCAP flags, but the
emulated cpuid registers are missing the bits for exposing e.g. SVE2
(This issue is fixed in later versions of QEMU though.)

Also drop the ifdef check for whether AT_HWCAP is defined; it was
added to glibc in 1997. AT_HWCAP2 was added in 2013, in glibc 2.18,
which also precedes when aarch64 was commonly used anyway, so
don't guard the use of that with an ifdef.

be4f0200

CI: Switch 32/64-bit windows builds to LLVM · 7241d020
Anton Mitrofanov authored 10 months ago
```
Use same Docker images as VLC for contrib compilation.
```
7241d020
CI: Add config.log to job artifacts · ea08f586
Anton Mitrofanov authored 10 months ago

ea08f586

Feb 19, 2024

x86inc: Add support for ELF CET properties · 12426f5f

Henrik Gramner authored 11 months ago

Automatically flag x86-64 asm object files as SHSTK-compatible.

Shadow Stack (SHSTK) is a part of Control-flow Enforcement Technology
(CET) which is a feature aimed at defending against ROP attacks by
verifying that 'call' and 'ret' instructions are correctly matched.

For well-written code this works transparently without any code changes,
as return addresses popped from the shadow stack should match return
addresses popped from the normal stack for performance reasons anyway.

12426f5f

x86inc.asm: Add the crc32 SSE4.2 GPR instruction · 6fc4480c
Henrik Gramner authored 11 months ago

6fc4480c
x86inc: Add a cpu flag for the Ice Lake AVX-512 subset · 87476b4c
Henrik Gramner authored 11 months ago

87476b4c
x86inc: Add CLMUL cpu flag · a6b56179
Henrik Gramner authored 11 months ago
```
Also make the GFNI cpu flag imply the presence of both AESNI and CLMUL.
```
a6b56179

x86inc: Add template defines for EVEX broadcasts · 5207a74e

Henrik Gramner authored 11 months ago

Broadcasting a memory operand is a binary flag, you either broadcast
or you don't, and there's only a single possible element size for
any given instruction.

The instruction syntax however requires the broadcast semanticts
to be explicitly defined, which is an issue when using macros to
template code for multiple register widths.

Add some helper defines to alleviate the issue.

5207a74e

x86inc: Properly sort instructions in alphabetical order · 436be41f
Henrik Gramner authored 11 months ago

436be41f

Jan 13, 2024
- Bump dates to 2024 · 4815ccad
  Anton Mitrofanov authored 1 year ago
  
  4815ccad
Nov 23, 2023

Improve pixel-a.S Performance by Using SVE/SVE2 · c1c9931d

David Chen authored 1 year ago

Imporve the performance of NEON functions of aarch64/pixel-a.S
by using the SVE/SVE2 instruction set. Below, the specific functions
are listed together with the improved performance results.

Command executed: ./checkasm8 --bench=ssd
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
ssd_4x4_c: 235
ssd_4x4_neon: 226
ssd_4x4_sve: 151
ssd_4x8_c: 409
ssd_4x8_neon: 363
ssd_4x8_sve: 201
ssd_4x16_c: 781
ssd_4x16_neon: 653
ssd_4x16_sve: 313
ssd_8x4_c: 402
ssd_8x4_neon: 192
ssd_8x4_sve: 192
ssd_8x8_c: 728
ssd_8x8_neon: 275
ssd_8x8_sve: 275

Command executed: ./checkasm10 --bench=ssd
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
ssd_4x4_c: 256
ssd_4x4_neon: 226
ssd_4x4_sve: 153
ssd_4x8_c: 460
ssd_4x8_neon: 369
ssd_4x8_sve: 215
ssd_4x16_c: 852
ssd_4x16_neon: 651
ssd_4x16_sve: 340

Command executed: ./checkasm8 --bench=ssd
Testbed: AWS Graviton3
Results:
ssd_4x4_c: 295
ssd_4x4_neon: 288
ssd_4x4_sve: 228
ssd_4x8_c: 454
ssd_4x8_neon: 431
ssd_4x8_sve: 294
ssd_4x16_c: 779
ssd_4x16_neon: 631
ssd_4x16_sve: 438
ssd_8x4_c: 463
ssd_8x4_neon: 247
ssd_8x4_sve: 246
ssd_8x8_c: 781
ssd_8x8_neon: 413
ssd_8x8_sve: 353

Command executed: ./checkasm10 --bench=ssd
Testbed: AWS Graviton3
Results:
ssd_4x4_c: 322
ssd_4x4_neon: 335
ssd_4x4_sve: 240
ssd_4x8_c: 522
ssd_4x8_neon: 448
ssd_4x8_sve: 294
ssd_4x16_c: 832
ssd_4x16_neon: 603
ssd_4x16_sve: 440

Command executed: ./checkasm8 --bench=sa8d
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
sa8d_8x8_c: 2103
sa8d_8x8_neon: 619
sa8d_8x8_sve: 617

Command executed: ./checkasm8 --bench=sa8d
Testbed: AWS Graviton3
Results:
sa8d_8x8_c: 2021
sa8d_8x8_neon: 597
sa8d_8x8_sve: 580

Command executed: ./checkasm8 --bench=var
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
var_8x8_c: 595
var_8x8_neon: 262
var_8x8_sve: 262
var_8x16_c: 1193
var_8x16_neon: 435
var_8x16_sve: 419

Command executed: ./checkasm8 --bench=var
Testbed: AWS Graviton3
Results:
var_8x8_c: 616
var_8x8_neon: 229
var_8x8_sve: 222
var_8x16_c: 1207
var_8x16_neon: 399
var_8x16_sve: 389

Command executed: ./checkasm8 --bench=hadamard_ac
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
hadamard_ac_8x8_c: 2330
hadamard_ac_8x8_neon: 635
hadamard_ac_8x8_sve: 635
hadamard_ac_8x16_c: 4500
hadamard_ac_8x16_neon: 1152
hadamard_ac_8x16_sve: 1151
hadamard_ac_16x8_c: 4499
hadamard_ac_16x8_neon: 1151
hadamard_ac_16x8_sve: 1150
hadamard_ac_16x16_c: 8812
hadamard_ac_16x16_neon: 2187
hadamard_ac_16x16_sve: 2186

Command executed: ./checkasm8 --bench=hadamard_ac
Testbed: AWS Graviton3
Results:
hadamard_ac_8x8_c: 2266
hadamard_ac_8x8_neon: 517
hadamard_ac_8x8_sve: 513
hadamard_ac_8x16_c: 4444
hadamard_ac_8x16_neon: 867
hadamard_ac_8x16_sve: 849
hadamard_ac_16x8_c: 4443
hadamard_ac_16x8_neon: 880
hadamard_ac_16x8_sve: 868
hadamard_ac_16x16_c: 8595
hadamard_ac_16x16_neon: 1656
hadamard_ac_16x16_sve: 1622

c1c9931d

Create Common NEON pixel-a Macros and Constants · 0ac52d29

David Chen authored 1 year ago

Place NEON pixel-a macros and constants that are intended
to be used by SVE/SVE2 functions as well in a common file.

0ac52d29

Improve mc-a.S Performance by Using SVE/SVE2 · 06dcf3f9

David Chen authored 1 year ago

Imporve the performance of NEON functions of aarch64/mc-a.S
by using the SVE/SVE2 instruction set. Below, the specific functions
are listed together with the improved performance results.

Command executed: ./checkasm8 --bench=avg
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
avg_4x2_c: 274
avg_4x2_neon: 215
avg_4x2_sve: 171
avg_4x4_c: 461
avg_4x4_neon: 343
avg_4x4_sve: 225
avg_4x8_c: 806
avg_4x8_neon: 619
avg_4x8_sve: 334
avg_4x16_c: 1523
avg_4x16_neon: 1168
avg_4x16_sve: 558

Command executed: ./checkasm8 --bench=avg
Testbed: AWS Graviton3
Results:
avg_4x2_c: 267
avg_4x2_neon: 213
avg_4x2_sve: 167
avg_4x4_c: 467
avg_4x4_neon: 350
avg_4x4_sve: 221
avg_4x8_c: 784
avg_4x8_neon: 624
avg_4x8_sve: 302
avg_4x16_c: 1445
avg_4x16_neon: 1182
avg_4x16_sve: 485

06dcf3f9

Create Common NEON mc-a Macros and Functions · 21a788f1

David Chen authored 1 year ago

Place NEON mc-a macros and functions that are intended
to be used by SVE/SVE2 functions as well in a common file.

21a788f1

Nov 20, 2023

Improve deblock-a.S Performance by Using SVE/SVE2 · 5ad5e5d8

David Chen authored 1 year ago

Imporve the performance of NEON functions of aarch64/deblock-a.S
by using the SVE/SVE2 instruction set. Below, the specific functions
are listed together with the improved performance results.

Command executed: ./checkasm8 --bench=deblock
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
deblock_chroma[1]_c: 735
deblock_chroma[1]_neon: 427
deblock_chroma[1]_sve: 353

Command executed: ./checkasm8 --bench=deblock
Testbed: AWS Graviton3
Results:
deblock_chroma[1]_c: 719
deblock_chroma[1]_neon: 442
deblock_chroma[1]_sve: 345

5ad5e5d8

Create Common NEON deblock-a Macros · 37949a99

David Chen authored 1 year ago

Place NEON deblock-a macros that are intended to be
used by SVE/SVE2 functions as well in a common file.

37949a99

Improve dct-a.S Performance by Using SVE/SVE2 · 5c382660

David Chen authored 1 year ago

Imporve the performance of NEON functions of aarch64/dct-a.S
by using the SVE/SVE2 instruction set. Below, the specific functions
are listed together with the improved performance results.

Command executed: ./checkasm8 --bench=sub
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
sub4x4_dct_c: 528
sub4x4_dct_neon: 322
sub4x4_dct_sve: 247

Command executed: ./checkasm8 --bench=sub
Testbed: AWS Graviton3
Results:
sub4x4_dct_c: 562
sub4x4_dct_neon: 376
sub4x4_dct_sve: 255

Command executed: ./checkasm8 --bench=add
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
add4x4_idct_c: 698
add4x4_idct_neon: 386
add4x4_idct_sve2: 345

Command executed: ./checkasm8 --bench=zigzag
Testbed: Alibaba g8y instance based on Yitian 710 CPU
Results:
zigzag_interleave_8x8_cavlc_frame_c: 582
zigzag_interleave_8x8_cavlc_frame_neon: 273
zigzag_interleave_8x8_cavlc_frame_sve: 257

Command executed: ./checkasm8 --bench=zigzag
Testbed: AWS Graviton3
Results:
zigzag_interleave_8x8_cavlc_frame_c: 587
zigzag_interleave_8x8_cavlc_frame_neon: 257
zigzag_interleave_8x8_cavlc_frame_sve: 249

5c382660

Nov 18, 2023

Create Common NEON dct-a Macros · b6190c6f

David Chen authored 1 year ago

Place NEON dct-a macros that are intended to be
used by SVE/SVE2 functions as well in a common file.

b6190c6f