1. 06 Mar, 2019 1 commit
  2. 17 Jan, 2018 1 commit
  3. 24 Dec, 2017 1 commit
    • Vittorio Giovara's avatar
      Unify 8-bit and 10-bit CLI and libraries · 71ed44c7
      Vittorio Giovara authored
      Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
      option to set the bit depth at runtime.
      
      Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
      incorrect value, it's preferable to induce a linking failure. If applications
      relies on this symbol this will make it more obvious where the problem is.
      
      Add Makefile rules that compiles modules with different bit depths. Assembly
      on x86 is prefixed with the 'private_prefix' define, while all other archs
      modify their function prefix internally.
      
      Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
      assembly, PowerPC assembly, and MIPS assembly.
      
      The depth and cache CLI filters heavily depend on bit depth size, so they
      need to be duplicated for each value. This means having to rename these
      filters, and adjust the callers to use the right version.
      
      Unfortunately the threaded input CLI module inherits a common.h dependency
      (input/frame -> common/threadpool -> common/frame -> common/common) which
      is extremely complicated to address in a sensible way. Instead duplicate
      the module and select the appropriate one at run time.
      
      Each bitdepth needs different checkasm compilation rules, so split the main
      checkasm target into two executables.
      71ed44c7
  4. 21 May, 2017 1 commit
    • Henrik Gramner's avatar
      Rework pixel_var2 · 92c074e2
      Henrik Gramner authored
      The functions are only ever called with pointers to fenc and fdec and the
      strides are always constant so there's no point in having them as parameters.
      
      Cover both the U and V planes in a single function call. This is more
      efficient with SIMD, especially with the wider vectors provided by AVX2 and
      AVX-512, even when accounting for losing the possibility of early termination.
      
      Drop the MMX and XOP implementations, update the rest of the x86 assembly
      to match the new behavior. Also enable high bit-depth in the AVX2 version.
      
      Comment out the ARM, AARCH64, and MIPS MSA assembly for now.
      92c074e2
  5. 21 Jan, 2017 1 commit
  6. 01 Dec, 2016 1 commit
    • Anton Mitrofanov's avatar
      Cosmetics · b2b39dae
      Anton Mitrofanov authored
      Also make x264_weighted_reference_duplicate() static.
      b2b39dae
  7. 16 Jan, 2016 1 commit
  8. 23 Feb, 2015 1 commit
  9. 08 Jan, 2014 1 commit
  10. 26 Feb, 2013 1 commit
  11. 09 Jan, 2013 1 commit
  12. 23 Apr, 2012 1 commit
  13. 06 Mar, 2012 1 commit
    • Henrik Gramner's avatar
      Fix incorrect zero-extension assumptions in x86_64 asm · 3131a19c
      Henrik Gramner authored
      Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
      This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
      As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
      Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
      Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
      3131a19c
  14. 04 Feb, 2012 1 commit
  15. 22 Oct, 2011 2 commits
  16. 21 Sep, 2011 2 commits
    • Henrik Gramner's avatar
      4:2:2 encoding support · 5b0cb86f
      Henrik Gramner authored
      5b0cb86f
    • Loren Merritt's avatar
      SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9) · 3d82e875
      Loren Merritt authored
      i4x4 analysis cycles (per partition):
      penryn   sandybridge
      184-> 75  157-> 54  preset=superfast (sad)
      281->165  225->124  preset=faster    (satd with early termination)
      332->165  263->124  preset=medium
      379->165  297->124  preset=slower    (satd without early termination)
      
      This is the first code in x264 that intentionally produces different behavior
      on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
      directions, whereas the old code (on fast presets) may early terminate after
      checking only some of them. There is no systematic difference on slow presets,
      though they still occasionally disagree about tiebreaks.
      
      For ease of debugging, add an option "--cpu-independent" to disable satd_x9
      and any analogous future code.
      3d82e875
  17. 24 Aug, 2011 1 commit
  18. 10 Jul, 2011 1 commit
  19. 12 May, 2011 3 commits
  20. 24 Mar, 2011 1 commit
  21. 25 Jan, 2011 1 commit
  22. 31 Oct, 2010 1 commit
  23. 18 Sep, 2010 1 commit
    • Fiona Glaser's avatar
      Update source file headers · 213a99d0
      Fiona Glaser authored
      Update dates, improve file descriptions, make things more consistent.
      Also add information about commercial licensing.
      213a99d0
  24. 15 Jul, 2010 1 commit
    • Loren Merritt's avatar
      Convert x264 to use NV12 pixel format internally · 387828ed
      Loren Merritt authored
      ~1% faster overall on Conroe, mostly due to improved cache locality.
      Also allows improved SIMD on some chroma functions (e.g. deblock).
      This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
      This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.
      
      Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.
      387828ed
  25. 02 Jun, 2010 1 commit
  26. 17 Nov, 2009 1 commit
    • Fiona Glaser's avatar
      Faster weightp analysis · 63f71477
      Fiona Glaser authored
      Modify pixel_var slightly to return the necessary information and use it for weight analysis instead of sad/ssd.
      Various minor cosmetics.
      63f71477
  27. 03 Jul, 2009 1 commit
    • Fiona Glaser's avatar
      Early termination for chroma encoding · 205a032c
      Fiona Glaser authored
      Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only.
      This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only.
      Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8.  mmx/sse2/ssse3 versions of each.
      Early termination is disabled at very low QPs due to it not being useful there.
      Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2.
      Increase is greater with lower bitrates.
      205a032c
  28. 31 Mar, 2009 1 commit
  29. 30 Mar, 2009 2 commits
  30. 31 Dec, 2008 1 commit
  31. 24 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Optimize variance asm + minor changes · 9fe6e5e6
      Fiona Glaser authored
      Remove SAD argument from var, not needed anymore.
      Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
      Eliminate all remaining warnings on gcc 3.4 on cygwin
      Port another minor optimization from lavc (pskip)
      9fe6e5e6
  32. 15 Sep, 2008 1 commit
  33. 05 Sep, 2008 2 commits
  34. 16 Aug, 2008 1 commit