Commits · 71ed44c7 · VideoLAN / x264

Dec 24, 2017

Unify 8-bit and 10-bit CLI and libraries · 71ed44c7

Vittorio Giovara authored 8 years ago and

Anton Mitrofanov committed 7 years ago

Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
option to set the bit depth at runtime.

Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
incorrect value, it's preferable to induce a linking failure. If applications
relies on this symbol this will make it more obvious where the problem is.

Add Makefile rules that compiles modules with different bit depths. Assembly
on x86 is prefixed with the 'private_prefix' define, while all other archs
modify their function prefix internally.

Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
assembly, PowerPC assembly, and MIPS assembly.

The depth and cache CLI filters heavily depend on bit depth size, so they
need to be duplicated for each value. This means having to rename these
filters, and adjust the callers to use the right version.

Unfortunately the threaded input CLI module inherits a common.h dependency
(input/frame -> common/threadpool -> common/frame -> common/common) which
is extremely complicated to address in a sensible way. Instead duplicate
the module and select the appropriate one at run time.

Each bitdepth needs different checkasm compilation rules, so split the main
checkasm target into two executables.

71ed44c7

Change default QP parameters initialization · 2451a728

Vittorio Giovara authored 8 years ago and

Anton Mitrofanov committed 7 years ago

qp is modified to require a valid value before use, while qp_max is set
to maximum allowable value (and clipped later on).

This is needed so that param functions do not depend on bit depth size.

2451a728

aarch64: Set the function symbol prefix in a single location · 7839a9e1
Vittorio Giovara authored 8 years ago and Anton Mitrofanov committed 7 years ago

7839a9e1
arm: Set the function symbol prefix in a single location · 498cca0b
Vittorio Giovara authored 8 years ago and Anton Mitrofanov committed 7 years ago

498cca0b
Drop the x264 prefix from static functions and variables · 8f2437d3
Vittorio Giovara authored 8 years ago and Anton Mitrofanov committed 7 years ago

8f2437d3
configure: Check for strtok_r compiler support · 4e2ed408
Anton Mitrofanov authored 7 years ago

4e2ed408
cabac: Make the cabac_contexts array static · d1eebb29
Henrik Gramner authored 7 years ago and Anton Mitrofanov committed 7 years ago
```
Also drop the x264 prefix from all static cabac arrays.
```
d1eebb29
x86: AVX-512 pixel_satd_x3 and pixel_satd_x4 · 3f9f6554
Henrik Gramner authored 7 years ago and Anton Mitrofanov committed 7 years ago

3f9f6554

x86: Shrink the x86-64 cabac coeff_last tables · dd399ab8

Henrik Gramner authored 7 years ago and

Anton Mitrofanov committed 7 years ago

Use dword instead of qword entries. Cuts the size of the tables in half
which allows each table fit inside a single cache line.

When PIC is disabled dwords are enough to store absolute addresses.

When PIC is enabled we can store dword offsets relative to the start of
the table and simply add the address of the table to the offset in order
to calculate the full address. This approach also have the advantage of
eliminating a whole bunch of run-time .data relocations.

dd399ab8

x86inc: Support creating global symbols from local labels · d463a92e
Henrik Gramner authored 7 years ago and Anton Mitrofanov committed 7 years ago
```
On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.
```
d463a92e

x86inc: Use .rdata instead of .rodata on Windows · 67b5c961

Henrik Gramner authored 7 years ago and

Anton Mitrofanov committed 7 years ago

The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.

67b5c961

x86inc: Set the correct cpuflag for AES-NI instructions · f15d3665
Henrik Gramner authored 7 years ago and Anton Mitrofanov committed 7 years ago

f15d3665

x86inc: Enable AVX emulation for floating-point pseudo-instructions · 1ae63361

Henrik Gramner authored 7 years ago and

Anton Mitrofanov committed 7 years ago

There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.

1ae63361

configure: Increase x86 stack alignment on clang · 1e27313c
Henrik Gramner authored 7 years ago and Anton Mitrofanov committed 7 years ago

1e27313c
x86: Fix stack alignment for x264_cabac_encode_ue_bypass call · e9a5903e
Anton Mitrofanov authored 7 years ago
```
Fix MSVS fprofiled build for win64
```
e9a5903e
mips: Fix incorrect pointers to msa optimized functions · 45e6eb60
Anton Mitrofanov authored 7 years ago

45e6eb60

Aug 11, 2017

Fix cpu capabilities listing on older x86 operating systems · 09705c0b

Henrik Gramner authored 7 years ago

Some cpuflags would previously be displayed incorrectly when running older
operating systems without AVX support on modern CPU:s.

09705c0b

Jun 26, 2017
- x86: AVX-512 pixel_avg_weight_w8 · ba24899b
  Henrik Gramner authored 7 years ago
  
  ba24899b
- x86: AVX-512 pixel_avg_weight_w16 · d3214e6b
  Henrik Gramner authored 7 years ago
  
  d3214e6b
- x86: AVX-512 sub8x16_dct_dc · 1d9dee2e
  Henrik Gramner authored 7 years ago
  
  1d9dee2e
Jun 24, 2017
- x86: AVX-512 sub8x8_dct_dc · f6727954
  Henrik Gramner authored 7 years ago
  
  f6727954
- x86: AVX-512 add8x8_idct · 0af1c6d0
  Henrik Gramner authored 7 years ago
  
  0af1c6d0
- x86: AVX-512 sub16x16_dct · 90340852
  Henrik Gramner authored 7 years ago
  
  90340852
- x86: AVX-512 sub8x8_dct · 774c6c76
  Henrik Gramner authored 7 years ago
  
  774c6c76
- x86: AVX-512 sub4x4_dct · 2d653411
  Henrik Gramner authored 7 years ago
  
  2d653411
- x86: AVX-512 mbtree_propagate_list · 07483f72
  Henrik Gramner authored 7 years ago
```
Uses gathers and scatters in combination with conflict detections to
vectorize the scalar part.

Also improve the checkasm test to try different mb_y values and check
for out-of-bounds writes.
```
  07483f72
- x86inc: Add aesni cpuflag define · 1a88481b
  James Darnley authored 7 years ago and Henrik Gramner committed 7 years ago
```
Upstreaming this from FFmpeg. Unused in x264.
```
  1a88481b
Jun 14, 2017

aarch64: Update the var2 functions to the new signature · 98e9543b

Martin Storsjö authored 7 years ago and

Anton Mitrofanov committed 7 years ago

The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:

var2_8x8_c:      4110
var2_8x8_neon:   1505
var2_8x16_c:     8019
var2_8x16_neon:  2545

However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon:   1205
var2_8x16_neon:  2327

98e9543b

arm: Update the var2 functions to the new signature · 824802ad

Martin Storsjö authored 7 years ago and

Anton Mitrofanov committed 7 years ago

The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:

             Cortex A7     A8     A9   A53
var2_8x8_c:       7302   5342   5050  4400
var2_8x8_neon:    2645   1612   1932  1715
var2_8x16_c:     14300  10528  10020  8637
var2_8x16_neon:   5127   2695   3217  2651

However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon:    2312   1190   1389  1300
var2_8x16_neon:   4862   2130   2293  2422

824802ad

Add support for levels 6, 6.1, and 6.2 · 6f8aa71c

Henrik Gramner authored 8 years ago and

Anton Mitrofanov committed 7 years ago

These levels were added in the 2016-10 revision of the H.264 specification and
improves support for content with high resolutions and/or high frame rates.

Level 6.2 supports 8K resolution at 120 fps.

Also shrink the x264_levels array by using smaller data types.

6f8aa71c

Use a larger integer type for the slice_table array · 2baa28c8
Henrik Gramner authored 8 years ago and Anton Mitrofanov committed 7 years ago
```
Makes it possible to use slicing with resolutions larger than 2^24 pixels.
```
2baa28c8

analyse: Reduce the size the cost_mv arrays · c9d2c1c8

Henrik Gramner authored 8 years ago and

Anton Mitrofanov committed 7 years ago

Use a dynamic size depending on the MV range. Reduces memory consumption by
up to a few megabytes.

Drop a related old miscompilation check since it may otherwise cause an
out-of-bounds memory access.

Also remove an unused extern variable declaration.

c9d2c1c8

Fix CABAC+8x8dct in 4:4:4 · d46a5a46
Anton Mitrofanov authored 7 years ago
```
Use the correct ctxIdxInc calculation for coded_block_flag.
```
d46a5a46

Fix 8x8dct in lossless encoding · 79b36f27

Anton Mitrofanov authored 7 years ago

Change V and H intra prediction in lossless (TransformBypassModeFlag == 1)
macroblocks to correctly adhere to the specification. Affects lossless
encoding with 8x8dct or mix of lossless with normal macroblocks.

8x8dct has already been disabled in lossless mode for some time due to
being out-of-spec but this will allow us to re-enable it again.

79b36f27

mbtree: Fix buffer overflow · 68a55021

Anton Mitrofanov authored 7 years ago

Could occur on the 1st pass in combination with --fake-interlaced and
some input heights due to allocating a too small buffer.

68a55021

May 23, 2017
- x86: Avoid self-relative expressions on macho64 · df79067c
  Henrik Gramner authored 7 years ago
```
Functions that uses self-relative expressions in the form of [foo-$$]
appears to cause issues on 64-bit Mach-O systems when assembled with nasm.
Temporarily disable those functions on macho64 for the time being until
we've figured out the root cause.
```
  df79067c
- configure: Don't try to detect clang by $CC · f1ac7122
  Anton Mitrofanov authored 7 years ago and Henrik Gramner committed 7 years ago
```
Only check if option -Werror=unknown-warning-option is supported before adding it
```
  f1ac7122
- checkasm: Use the right variable in a loop condition · b4d811df
  Martin Storsjö authored 7 years ago and Henrik Gramner committed 7 years ago
```
Prior to this, this loop hasn't run at all. The condition has been
the same since it was introduced in 5b0cb86f.

This issue was pointed out by a clang warning.
```
  b4d811df
- x86: Fix linking with 8-bit depth shared libx264 · a3d24462
  Anton Mitrofanov authored 7 years ago and Henrik Gramner committed 7 years ago
  
  a3d24462
May 21, 2017
- x86: Only enable AVX-512 in 8-bit mode · d1fe6fd1
  Henrik Gramner authored 7 years ago
  
  d1fe6fd1