Skip to content
Snippets Groups Projects
  1. Jun 10, 2020
  2. Apr 09, 2020
  3. Feb 29, 2020
  4. Nov 05, 2019
  5. Jul 17, 2019
  6. Mar 06, 2019
  7. Mar 03, 2019
  8. Aug 06, 2018
  9. Jun 02, 2018
    • Henrik Gramner's avatar
      Fix clang stack alignment issues · 7737e6ad
      Henrik Gramner authored
      Clang emits aligned AVX stores for things like zeroing stack-allocated
      variables when using -mavx even with -fno-tree-vectorize set which can
      result in crashes if this occurs before we've realigned the stack.
      
      Previously we only ensured that the stack was realigned before calling
      assembly functions that accesses stack-allocated buffers but this is
      not sufficient. Fix the issue by changing the stack realignment to
      instead occur immediately in all CLI, API and thread entry points.
      7737e6ad
  10. Jan 18, 2018
  11. Jan 17, 2018
  12. Dec 24, 2017
  13. Jun 24, 2017
  14. Jun 14, 2017
    • Henrik Gramner's avatar
      Add support for levels 6, 6.1, and 6.2 · 6f8aa71c
      Henrik Gramner authored and Anton Mitrofanov's avatar Anton Mitrofanov committed
      These levels were added in the 2016-10 revision of the H.264 specification and
      improves support for content with high resolutions and/or high frame rates.
      
      Level 6.2 supports 8K resolution at 120 fps.
      
      Also shrink the x264_levels array by using smaller data types.
      6f8aa71c
  15. May 23, 2017
  16. May 21, 2017
    • Henrik Gramner's avatar
      x86: AVX-512 pixel_sad · 993eb207
      Henrik Gramner authored
      Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.
      993eb207
    • Henrik Gramner's avatar
      Rework pixel_var2 · 92c074e2
      Henrik Gramner authored
      The functions are only ever called with pointers to fenc and fdec and the
      strides are always constant so there's no point in having them as parameters.
      
      Cover both the U and V planes in a single function call. This is more
      efficient with SIMD, especially with the wider vectors provided by AVX2 and
      AVX-512, even when accounting for losing the possibility of early termination.
      
      Drop the MMX and XOP implementations, update the rest of the x86 assembly
      to match the new behavior. Also enable high bit-depth in the AVX2 version.
      
      Comment out the ARM, AARCH64, and MIPS MSA assembly for now.
      92c074e2
    • Henrik Gramner's avatar
      x86: AVX-512 deblock_strength · 2eceefe8
      Henrik Gramner authored
      Also drop the MMX version and make some slight improvements to the SSE2,
      SSSE3, AVX, and AVX2 versions.
      2eceefe8
    • Henrik Gramner's avatar
      3081ffa1
    • Henrik Gramner's avatar
      x86: AVX-512 memzero_aligned · 95dc64c4
      Henrik Gramner authored
      Reorder some elements in the x264_t.mb.pic struct to reduce the amount
      of padding required.
      
      Also drop the MMX implementation in favor of SSE.
      95dc64c4
    • Henrik Gramner's avatar
      x86: AVX and AVX-512 memcpy_aligned · c0cd7650
      Henrik Gramner authored
      Reorder some elements in the x264_mb_analysis_list_t struct to reduce the
      amount of padding required.
      
      Also drop the MMX implementation in favor of SSE.
      c0cd7650
    • Henrik Gramner's avatar
      x86: AVX-512 dequant_4x4 · 74f7802b
      Henrik Gramner authored
      74f7802b
    • Henrik Gramner's avatar
      x86: AVX-512 mbtree_propagate_cost · 3451ba3a
      Henrik Gramner authored
      Also make the AVX and AVX2 implementations slightly faster.
      3451ba3a
    • Henrik Gramner's avatar
      x86: AVX-512 coeff_last · 75f6f9b2
      Henrik Gramner authored
      75f6f9b2
    • Henrik Gramner's avatar
      x86: AVX-512 zigzag_scan_8x8_frame · 724a5772
      Henrik Gramner authored
      The vperm* instructions ignores unused bits, so we can pack the permutation
      indices together to save cache and just use a shift to get the right values.
      724a5772
    • Henrik Gramner's avatar
      x86: AVX-512 zigzag_scan_4x4_frame · 2b2f0395
      Henrik Gramner authored
      2b2f0395
    • Henrik Gramner's avatar
      checkasm: x86: More accurate ymm/zmm measurements · 1878c7f2
      Henrik Gramner authored
      YMM and ZMM registers on x86 are turned off to save power when they haven't
      been used for some period of time. When they are used there will be a
      "warmup" period during which performance will be reduced and inconsistent
      which is problematic when trying to benchmark individual functions.
      
      Periodically issue "dummy" instructions that uses those registers to
      prevent them from being powered down. The end result is more consitent
      benchmark results.
      1878c7f2
    • Henrik Gramner's avatar
      x86: AVX-512 support · 472ce364
      Henrik Gramner authored
      AVX-512 consists of a plethora of different extensions, but in order to keep
      things a bit more manageable we group together the following extensions
      under a single baseline cpu flag which should cover SKL-X and future CPUs:
       * AVX-512 Foundation (F)
       * AVX-512 Conflict Detection Instructions (CD)
       * AVX-512 Byte and Word Instructions (BW)
       * AVX-512 Doubleword and Quadword Instructions (DQ)
       * AVX-512 Vector Length Extensions (VL)
      
      On x86-64 AVX-512 provides 16 additional vector registers, prefer using
      those over existing ones since it allows us to avoid using `vzeroupper`
      unless more than 16 vector registers are required. They also happen to
      be volatile on Windows which means that we don't need to save and restore
      existing xmm register contents unless more than 22 vector registers are
      required.
      
      Also take the opportunity to drop X264_CPU_CMOV and X264_CPU_SLOW_CTZ while
      we're breaking API by messing with the cpu flags since they weren't really
      used for anything.
      
      Big thanks to Intel for their support.
      472ce364
    • Henrik Gramner's avatar
      x86: Add some additional cpuflag relations · 8c297425
      Henrik Gramner authored
      Simplifies writing assembly code that depends on available instructions.
      
      LZCNT implies SSE2
      BMI1 implies AVX+LZCNT
      AVX2 implies BMI2
      
      Skip printing LZCNT under CPU capabilities when BMI1 or BMI2 is available,
      and don't print FMA4 when FMA3 is available.
      8c297425
    • Henrik Gramner's avatar
      Support YUYV and UYVY packed 4:2:2 raw input · dcf40697
      Henrik Gramner authored
      Packed YUV is arguably more common than planar YUV when dealing with raw
      4:2:2 content.
      
      We can utilize the existing plane_copy_deinterleave() functions with some
      additional minor constraints (we cannot assume any particular alignment
      or overread the input buffer).
      
      Enables assembly optimizations on x86.
      dcf40697
    • Martin Storsjö's avatar
      configure: Support targeting ARM with MSVC tools · a52d41c4
      Martin Storsjö authored and Henrik Gramner's avatar Henrik Gramner committed
      Set up the right gas-preprocessor as assembler frontend in these cases,
      using armasm as actual assembler.
      
      Don't try to add the -mcpu -mfpu options in this case.
      
      Check whether the compiler actually supports inline assembly.
      
      Check for the ARMv7 features in a different way for the MSVC compiler.
      a52d41c4
Loading