Commits · cde9a93 · VideoLAN / x264

Jun 10, 2020
- checkasm: increase float error margin to 1e-5 · 32a7ee1c
  Anton Mitrofanov authored 4 years ago
```
checkasm10 with seed=511142008 failed on win32 gcc builds.
```
  32a7ee1c
Apr 09, 2020
- Fix warning: comparison of integers of different signs [-Wsign-compare] · 33f9e147
  Anton Mitrofanov authored 5 years ago
  
  33f9e147
- Fix undefined behavior: access within misaligned address · af2755cd
  Anton Mitrofanov authored 5 years ago
  
  af2755cd
Feb 29, 2020
- Bump dates to 2020 · 04e6c65e
  Anton Mitrofanov authored 5 years ago
  
  04e6c65e
Nov 05, 2019
- aarch64: Fix compilation with disabled asm · 7114174b
  Anton Mitrofanov authored 5 years ago
  
  7114174b
Jul 17, 2019

x86: Perform stack realignment in C instead of assembly · b5bc5d69

Henrik Gramner authored 6 years ago and

Anton Mitrofanov committed 5 years ago

Simplifies a lot of code and avoids having to export public asm functions.

Note that the force_align_arg_pointer function attribute is broken in clang
versions prior to 6.0.1 which may result in crashes, so make sure to either
use a newer clang version or a different compiler.

b5bc5d69

checkasm: Fix heap-buffer-overflow read detected by ASan · 3147fa43
Anton Mitrofanov authored 6 years ago

3147fa43

Mar 06, 2019

Bump dates to 2019 · ec1d3230
Henrik Gramner authored 6 years ago and Anton Mitrofanov committed 6 years ago

ec1d3230

cli: Bash autocomplete support · 74c051f2

Henrik Gramner authored 6 years ago and

Anton Mitrofanov committed 6 years ago

Allows for automatic command line completion for both options and values.

Options such as --input-csp and --input-fmt will dynamically retrieve
supported values from libavformat when compiled with lavf support.

Execute 'source tools/bash-autocomplete.sh' in bash to enable.

74c051f2

Mar 03, 2019
- x86: Fix integer overflow in intra_sa8d_x3_8x8_sse2 · 72db4377
  Henrik Gramner authored 6 years ago
  
  72db4377
Aug 06, 2018
- Add Sony XAVC, a flavour of AVC-Intra · 1d18f0e0
  Kieran Kunhya authored 7 years ago and Henrik Gramner committed 6 years ago
  
  1d18f0e0
Jun 02, 2018

Fix clang stack alignment issues · 7737e6ad

Henrik Gramner authored 6 years ago

Clang emits aligned AVX stores for things like zeroing stack-allocated
variables when using -mavx even with -fno-tree-vectorize set which can
result in crashes if this occurs before we've realigned the stack.

Previously we only ensured that the stack was realigned before calling
assembly functions that accesses stack-allocated buffers but this is
not sufficient. Fix the issue by changing the stack realignment to
instead occur immediately in all CLI, API and thread entry points.

7737e6ad

Jan 18, 2018
- Remove ARRAY_SIZE macro which is identical to ARRAY_ELEMS · 7d0ff22e
  Anton Mitrofanov authored 7 years ago and Henrik Gramner committed 7 years ago
  
  7d0ff22e
Jan 17, 2018
- Bump dates to 2018 · ca5408b1
  Henrik Gramner authored 7 years ago
  
  ca5408b1
Dec 24, 2017

Update to the latest upstream version of gas-preprocessor · b461e015
Martin Storsjö authored 7 years ago and Anton Mitrofanov committed 7 years ago
```
This version supports converting aarch64 assembly for MS armasm64.exe.
```
b461e015
x86: AVX-512 mbtree_fix8_pack and mbtree_fix8_unpack · 5b62ab59
Henrik Gramner authored 7 years ago and Anton Mitrofanov committed 7 years ago
```
Takes advantage of opmasks to avoid having to use scalar code for the tail.

Also make some slight improvements to the checkasm test.
```
5b62ab59

Unify 8-bit and 10-bit CLI and libraries · 71ed44c7

Vittorio Giovara authored 8 years ago and

Anton Mitrofanov committed 7 years ago

Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
option to set the bit depth at runtime.

Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
incorrect value, it's preferable to induce a linking failure. If applications
relies on this symbol this will make it more obvious where the problem is.

Add Makefile rules that compiles modules with different bit depths. Assembly
on x86 is prefixed with the 'private_prefix' define, while all other archs
modify their function prefix internally.

Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
assembly, PowerPC assembly, and MIPS assembly.

The depth and cache CLI filters heavily depend on bit depth size, so they
need to be duplicated for each value. This means having to rename these
filters, and adjust the callers to use the right version.

Unfortunately the threaded input CLI module inherits a common.h dependency
(input/frame -> common/threadpool -> common/frame -> common/common) which
is extremely complicated to address in a sensible way. Instead duplicate
the module and select the appropriate one at run time.

Each bitdepth needs different checkasm compilation rules, so split the main
checkasm target into two executables.

71ed44c7

aarch64: Set the function symbol prefix in a single location · 7839a9e1
Vittorio Giovara authored 8 years ago and Anton Mitrofanov committed 7 years ago

7839a9e1
arm: Set the function symbol prefix in a single location · 498cca0b
Vittorio Giovara authored 8 years ago and Anton Mitrofanov committed 7 years ago

498cca0b
Drop the x264 prefix from static functions and variables · 8f2437d3
Vittorio Giovara authored 8 years ago and Anton Mitrofanov committed 7 years ago

8f2437d3

Jun 24, 2017

x86: AVX-512 sub8x8_dct · 774c6c76
Henrik Gramner authored 7 years ago

774c6c76

x86: AVX-512 mbtree_propagate_list · 07483f72

Henrik Gramner authored 7 years ago

Uses gathers and scatters in combination with conflict detections to
vectorize the scalar part.

Also improve the checkasm test to try different mb_y values and check
for out-of-bounds writes.

07483f72

Jun 14, 2017

Add support for levels 6, 6.1, and 6.2 · 6f8aa71c

Henrik Gramner authored 8 years ago and

Anton Mitrofanov committed 7 years ago

These levels were added in the 2016-10 revision of the H.264 specification and
improves support for content with high resolutions and/or high frame rates.

Level 6.2 supports 8K resolution at 120 fps.

Also shrink the x264_levels array by using smaller data types.

6f8aa71c

May 23, 2017

checkasm: Use the right variable in a loop condition · b4d811df

Martin Storsjö authored 7 years ago and

Henrik Gramner committed 7 years ago

Prior to this, this loop hasn't run at all. The condition has been
the same since it was introduced in 5b0cb86f.

This issue was pointed out by a clang warning.

b4d811df

May 21, 2017

x86: AVX-512 pixel_sad · 993eb207
Henrik Gramner authored 7 years ago
```
Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.
```
993eb207

Rework pixel_var2 · 92c074e2

Henrik Gramner authored 7 years ago

The functions are only ever called with pointers to fenc and fdec and the
strides are always constant so there's no point in having them as parameters.

Cover both the U and V planes in a single function call. This is more
efficient with SIMD, especially with the wider vectors provided by AVX2 and
AVX-512, even when accounting for losing the possibility of early termination.

Drop the MMX and XOP implementations, update the rest of the x86 assembly
to match the new behavior. Also enable high bit-depth in the AVX2 version.

Comment out the ARM, AARCH64, and MIPS MSA assembly for now.

92c074e2

x86: AVX-512 deblock_strength · 2eceefe8

Henrik Gramner authored 8 years ago

Also drop the MMX version and make some slight improvements to the SSE2,
SSSE3, AVX, and AVX2 versions.

2eceefe8

x86: AVX-512 plane_copy_deinterleave_v210 · 3081ffa1
Henrik Gramner authored 8 years ago

3081ffa1

x86: AVX-512 memzero_aligned · 95dc64c4

Henrik Gramner authored 8 years ago

Reorder some elements in the x264_t.mb.pic struct to reduce the amount
of padding required.

Also drop the MMX implementation in favor of SSE.

95dc64c4

x86: AVX and AVX-512 memcpy_aligned · c0cd7650

Henrik Gramner authored 8 years ago

Reorder some elements in the x264_mb_analysis_list_t struct to reduce the
amount of padding required.

Also drop the MMX implementation in favor of SSE.

c0cd7650

x86: AVX-512 dequant_4x4 · 74f7802b
Henrik Gramner authored 8 years ago

74f7802b
x86: AVX-512 mbtree_propagate_cost · 3451ba3a
Henrik Gramner authored 8 years ago
```
Also make the AVX and AVX2 implementations slightly faster.
```
3451ba3a
x86: AVX-512 coeff_last · 75f6f9b2
Henrik Gramner authored 8 years ago

75f6f9b2

x86: AVX-512 zigzag_scan_8x8_frame · 724a5772

Henrik Gramner authored 8 years ago

The vperm* instructions ignores unused bits, so we can pack the permutation
indices together to save cache and just use a shift to get the right values.

724a5772

x86: AVX-512 zigzag_scan_4x4_frame · 2b2f0395
Henrik Gramner authored 8 years ago

2b2f0395

checkasm: x86: More accurate ymm/zmm measurements · 1878c7f2

Henrik Gramner authored 7 years ago

YMM and ZMM registers on x86 are turned off to save power when they haven't
been used for some period of time. When they are used there will be a
"warmup" period during which performance will be reduced and inconsistent
which is problematic when trying to benchmark individual functions.

Periodically issue "dummy" instructions that uses those registers to
prevent them from being powered down. The end result is more consitent
benchmark results.

1878c7f2

x86: AVX-512 support · 472ce364

Henrik Gramner authored 8 years ago

AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
 * AVX-512 Foundation (F)
 * AVX-512 Conflict Detection Instructions (CD)
 * AVX-512 Byte and Word Instructions (BW)
 * AVX-512 Doubleword and Quadword Instructions (DQ)
 * AVX-512 Vector Length Extensions (VL)

On x86-64 AVX-512 provides 16 additional vector registers, prefer using
those over existing ones since it allows us to avoid using `vzeroupper`
unless more than 16 vector registers are required. They also happen to
be volatile on Windows which means that we don't need to save and restore
existing xmm register contents unless more than 22 vector registers are
required.

Also take the opportunity to drop X264_CPU_CMOV and X264_CPU_SLOW_CTZ while
we're breaking API by messing with the cpu flags since they weren't really
used for anything.

Big thanks to Intel for their support.

472ce364

x86: Add some additional cpuflag relations · 8c297425

Henrik Gramner authored 7 years ago

Simplifies writing assembly code that depends on available instructions.

LZCNT implies SSE2
BMI1 implies AVX+LZCNT
AVX2 implies BMI2

Skip printing LZCNT under CPU capabilities when BMI1 or BMI2 is available,
and don't print FMA4 when FMA3 is available.

8c297425

Support YUYV and UYVY packed 4:2:2 raw input · dcf40697

Henrik Gramner authored 8 years ago

Packed YUV is arguably more common than planar YUV when dealing with raw
4:2:2 content.

We can utilize the existing plane_copy_deinterleave() functions with some
additional minor constraints (we cannot assume any particular alignment
or overread the input buffer).

Enables assembly optimizations on x86.

dcf40697

configure: Support targeting ARM with MSVC tools · a52d41c4

Martin Storsjö authored 8 years ago and

Henrik Gramner committed 7 years ago

Set up the right gas-preprocessor as assembler frontend in these cases,
using armasm as actual assembler.

Don't try to add the -mcpu -mfpu options in this case.

Check whether the compiler actually supports inline assembly.

Check for the ARMv7 features in a different way for the MSVC compiler.

a52d41c4