1. 06 Aug, 2018 2 commits
  2. 02 Jun, 2018 1 commit
    • Henrik Gramner's avatar
      Fix clang stack alignment issues · 7737e6ad
      Henrik Gramner authored
      Clang emits aligned AVX stores for things like zeroing stack-allocated
      variables when using -mavx even with -fno-tree-vectorize set which can
      result in crashes if this occurs before we've realigned the stack.
      Previously we only ensured that the stack was realigned before calling
      assembly functions that accesses stack-allocated buffers but this is
      not sufficient. Fix the issue by changing the stack realignment to
      instead occur immediately in all CLI, API and thread entry points.
  3. 17 Jan, 2018 1 commit
  4. 24 Dec, 2017 4 commits
    • Vittorio Giovara's avatar
    • Vittorio Giovara's avatar
      Unify 8-bit and 10-bit CLI and libraries · 71ed44c7
      Vittorio Giovara authored
      Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
      option to set the bit depth at runtime.
      Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
      incorrect value, it's preferable to induce a linking failure. If applications
      relies on this symbol this will make it more obvious where the problem is.
      Add Makefile rules that compiles modules with different bit depths. Assembly
      on x86 is prefixed with the 'private_prefix' define, while all other archs
      modify their function prefix internally.
      Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
      assembly, PowerPC assembly, and MIPS assembly.
      The depth and cache CLI filters heavily depend on bit depth size, so they
      need to be duplicated for each value. This means having to rename these
      filters, and adjust the callers to use the right version.
      Unfortunately the threaded input CLI module inherits a common.h dependency
      (input/frame -> common/threadpool -> common/frame -> common/common) which
      is extremely complicated to address in a sensible way. Instead duplicate
      the module and select the appropriate one at run time.
      Each bitdepth needs different checkasm compilation rules, so split the main
      checkasm target into two executables.
    • Vittorio Giovara's avatar
      Change default QP parameters initialization · 2451a728
      Vittorio Giovara authored
      qp is modified to require a valid value before use, while qp_max is set
      to maximum allowable value (and clipped later on).
      This is needed so that param functions do not depend on bit depth size.
    • Vittorio Giovara's avatar
  5. 14 Jun, 2017 1 commit
    • Henrik Gramner's avatar
      Add support for levels 6, 6.1, and 6.2 · 6f8aa71c
      Henrik Gramner authored
      These levels were added in the 2016-10 revision of the H.264 specification and
      improves support for content with high resolutions and/or high frame rates.
      Level 6.2 supports 8K resolution at 120 fps.
      Also shrink the x264_levels array by using smaller data types.
  6. 21 May, 2017 1 commit
    • Henrik Gramner's avatar
      Support YUYV and UYVY packed 4:2:2 raw input · dcf40697
      Henrik Gramner authored
      Packed YUV is arguably more common than planar YUV when dealing with raw
      4:2:2 content.
      We can utilize the existing plane_copy_deinterleave() functions with some
      additional minor constraints (we cannot assume any particular alignment
      or overread the input buffer).
      Enables assembly optimizations on x86.
  7. 21 Jan, 2017 1 commit
  8. 01 Dec, 2016 1 commit
    • Anton Mitrofanov's avatar
      Cosmetics · b2b39dae
      Anton Mitrofanov authored
      Also make x264_weighted_reference_duplicate() static.
  9. 20 Sep, 2016 1 commit
    • Henrik Gramner's avatar
      Adjust --preset slow · 4e5adb87
      Henrik Gramner authored
       * Swap --me umh for --trellis 2. They have a similar effect on performance
         but the latter gives slightly better results in most cases.
       * Change --b-adapt from 2 to 1. Negligible difference in quality since the
         b-adapt 1 improvements, but it's significantly faster.
      Also remove a redundant assignment from veryfast (--me hex is set by default).
  10. 20 Apr, 2016 1 commit
  11. 16 Jan, 2016 1 commit
  12. 18 Aug, 2015 1 commit
  13. 26 Jul, 2015 1 commit
  14. 25 Jul, 2015 1 commit
    • Xiaolei Yu's avatar
      NV21 input support · 627f891c
      Xiaolei Yu authored
      Eliminates an extra copy when encoding Android camera preview images.
      Checkasm test by Janne Grunau.
      ARM assembly with improvements from Janne Grunau.
  15. 24 Jul, 2015 2 commits
  16. 16 Jul, 2015 1 commit
  17. 23 Feb, 2015 1 commit
  18. 29 Sep, 2014 1 commit
  19. 21 Jan, 2014 2 commits
  20. 08 Jan, 2014 1 commit
  21. 30 Oct, 2013 2 commits
    • Anton Mitrofanov's avatar
      Remove --visualize option. · 95d196ef
      Anton Mitrofanov authored
      It probably wasn't used or maintained for last few years.
    • Fiona Glaser's avatar
      Add --filler option · c084f6c0
      Fiona Glaser authored
      Allows generation of hard-CBR streams without using NAL HRD.
      Useful if you want to be able to reconfigure the bitrate (which you can't do
      with NAL HRD on).
  22. 23 Aug, 2013 4 commits
    • Henrik Gramner's avatar
      Windows Unicode support · fa3cac51
      Henrik Gramner authored
      Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.
      This patch does the following in order to handle things like Unicode filenames:
      * Keep strings internally as UTF-8.
      * Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
      * Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
      * Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.
    • Kieran Kunhya's avatar
      AVC-Intra support · 9b94896b
      Kieran Kunhya authored
      This format has been reverse engineered and x264's output has almost exactly
      the same bitstream as Panasonic cameras and encoders produce. It therefore does
      not comply with SMPTE RP2027 since Panasonic themselves do not comply with
      their own specification. It has been tested in Avid, Premiere, Edius and
      Parts of this patch were written by Fiona Glaser and some reverse
      engineering was done by Joseph Artsimovich.
    • Henrik Gramner's avatar
      Transparent hugepage support · fa1e2b74
      Henrik Gramner authored
      Combine frame and mb data mallocs into a single large malloc.
      Additionally, on Linux systems with hugepage support, ask for hugepages on
      large mallocs.
      This gives a small performance improvement (~0.2-0.9%) on systems without
      hugepage support, as well as a small memory footprint reduction.
      On recent Linux kernels with hugepage support enabled (set to madvise or
      always), it improves performance up to 4% at the cost of about 7-12% more
      memory usage on typical settings..
      It may help even more on Haswell and other recent CPUs with improved 2MB page
      support in hardware.
    • Anton Mitrofanov's avatar
  23. 03 Jul, 2013 1 commit
  24. 20 May, 2013 1 commit
  25. 23 Apr, 2013 4 commits
    • Fiona Glaser's avatar
      x86: more AVX2 framework, AVX2 functions, plus some existing asm tweaks · 0ea5be85
      Fiona Glaser authored
      AVX2 functions:
      zigzag interleave
    • Steve Borho's avatar
      OpenCL lookahead · f49a1b2e
      Steve Borho authored
      OpenCL support is compiled in by default, but must be enabled at runtime by an
      --opencl command line flag. Compiling OpenCL support requires perl. To avoid
      the perl requirement use: configure --disable-opencl.
      When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
      device.  Lowres intra cost prediction, lowres motion search (including subpel)
      and bidir cost predictions are all done on the GPU.  MB-tree and final slice
      decisions are still done by the CPU.  Presets which do not use a threaded
      lookahead will not use OpenCL at all (superfast, ultrafast).
      Because of data dependencies, the GPU must use an iterative motion search which
      performs more total work than the CPU would do, so this is not work efficient
      or power efficient. But if there are spare GPU cycles to spare, it can often
      speed up the encode. Output quality when OpenCL lookahead is enabled is often
      very slightly worse in quality than the CPU quality (because of the same data
      x264 must co...
    • Fiona Glaser's avatar
      Add slices-max feature · 732e4f7e
      Fiona Glaser authored
      The H.264 spec technically has limits on the number of slices per frame. x264
      normally ignores this, since most use-cases that require large numbers of
      slices prefer it to. However, certain decoders may break with extremely large
      numbers of slices, as can occur with some slice-max-size/mbs settings.
      When set, x264 will refuse to create any slices beyond the maximum number,
      even if slice-max-size/mbs requires otherwise.
    • Fiona Glaser's avatar
      Add slice-min-mbs feature · fdfffa30
      Fiona Glaser authored
      Works in conjunction with slice-max-mbs and/or slice-max-size to avoid overly
      small slices.
      Useful with certain decoders that barf on extremely small slices.
      If slice-min-mbs would be violated as a result of slice-max-size, x264 will
      exceed slice-max-size and print a warning.
  26. 26 Feb, 2013 1 commit
    • Fiona Glaser's avatar
      x86: detect Bobcat, improve Atom optimizations, reorganize flags · 5d60b9c9
      Fiona Glaser authored
      The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
      and apply the appropriate flags.
      It also has an extremely slow palignr instruction; create a flag for this to
      avoid massive penalties on palignr-heavy functions.
      Improve Atom function selection and document exactly what the SLOW_ATOM flag
      Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
      optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
      Atom along with other SIMD multiplies.
      Drop TBM detection; it'll probably never be useful for x264.
      Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).
      Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.
  27. 25 Feb, 2013 1 commit