1. 21 Jul, 2018 1 commit
  2. 17 Jan, 2018 1 commit
  3. 24 Dec, 2017 2 commits
    • Vittorio Giovara's avatar
      Unify 8-bit and 10-bit CLI and libraries · 71ed44c7
      Vittorio Giovara authored and Anton Mitrofanov's avatar Anton Mitrofanov committed
      Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
      option to set the bit depth at runtime.
      
      Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
      incorrect value, it's preferable to induce a linking failure. If applications
      relies on this symbol this will make it more obvious where the problem is.
      
      Add Makefile rules that compiles modules with different bit depths. Assembly
      on x86 is prefixed with the 'private_prefix' define, while all other archs
      modify their function prefix internally.
      
      Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
      assembly, PowerPC assembly, and MIPS assembly.
      
      The depth and cache CLI filters heavily depend on bit depth size, so they
      need to be duplicated for each value. This means having to rename these
      filters, and adjust the callers to use the right version.
      
      Unfortunately the threaded input CLI module inherits a common.h dependency
      (input/frame -> common/threadpool -> common/frame -> common/common) which
      is extremely complicated to address in a sensible way. Instead duplicate
      the module and select the appropriate one at run time.
      
      Each bitdepth needs different checkasm compilation rules, so split the main
      checkasm target into two executables.
      71ed44c7
    • Vittorio Giovara's avatar
      Drop the x264 prefix from static functions and variables · 8f2437d3
      Vittorio Giovara authored and Anton Mitrofanov's avatar Anton Mitrofanov committed
      8f2437d3
  4. 21 May, 2017 3 commits
  5. 21 Jan, 2017 1 commit
  6. 12 Apr, 2016 2 commits
  7. 11 Apr, 2016 1 commit
    • Anton Mitrofanov's avatar
      Use the correct default B-ref placement with B-pyramid · fd2c3247
      Anton Mitrofanov authored
      Cost analyse functions expects the placement of the B-ref in a sequence of
      an even number of B-frames to be located towards the beginning while the
      actual placement was towards the end.
      
      Change the placement to be consistent with the analyse expectations, e.g.
      PbbBbP -> PbBbbP.
      fd2c3247
  8. 16 Jan, 2016 1 commit
  9. 11 Oct, 2015 1 commit
  10. 25 Jul, 2015 1 commit
  11. 16 Jul, 2015 1 commit
  12. 23 Feb, 2015 1 commit
  13. 03 Sep, 2014 1 commit
  14. 13 Mar, 2014 1 commit
    • Fiona Glaser's avatar
      Macroblock tree overhaul/optimization · b3fb7184
      Fiona Glaser authored
      Move the second core part of macroblock tree into an assembly function;
      SIMD-optimize roughly half of it (for x86). Roughly ~25-65% faster mbtree,
      depending on content.
      
      Slightly change how mbtree handles the tradeoff between range and precision
      for propagation.
      
      Overall a slight (but mostly negligible) effect on SSIM and ~2% faster.
      b3fb7184
  15. 12 Mar, 2014 1 commit
    • Henrik Gramner's avatar
      x86: Minor mbtree_propagate_cost improvements · f032147c
      Henrik Gramner authored
      Reduce the number of registers used from 7 to 6.
      Reduce the number of vector registers used by the AVX2 implementation from 8 to 7.
      Multiply fps_factor by 1/256 once per frame instead of once per macroblock row.
      Use mova instead of movu for dst since it's guaranteed to be aligned.
      Some cosmetics.
      f032147c
  16. 08 Jan, 2014 1 commit
  17. 23 Apr, 2013 2 commits
    • Steve Borho's avatar
      OpenCL lookahead · f49a1b2e
      Steve Borho authored
      OpenCL support is compiled in by default, but must be enabled at runtime by an
      --opencl command line flag. Compiling OpenCL support requires perl. To avoid
      the perl requirement use: configure --disable-opencl.
      
      When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
      device.  Lowres intra cost prediction, lowres motion search (including subpel)
      and bidir cost predictions are all done on the GPU.  MB-tree and final slice
      decisions are still done by the CPU.  Presets which do not use a threaded
      lookahead will not use OpenCL at all (superfast, ultrafast).
      
      Because of data dependencies, the GPU must use an iterative motion search which
      performs more total work than the CPU would do, so this is not work efficient
      or power efficient. But if there are spare GPU cycles to spare, it can often
      speed up the encode. Output quality when OpenCL lookahead is enabled is often
      very slightly worse in quality than the CPU quality (because of the same data
      dependencies).
      
      x264 must compile its OpenCL kernels for your device before running them, and in
      order to avoid doing this every run it caches the compiled kernel binary in a
      file named x264_lookahead.clbin (--opencl-clbin FNAME to override).  The cache
      file will be ignored if the device, driver, or OpenCL source are changed.
      
      x264 will use the first GPU device which supports the required cl_image
      features required by its kernels. Most modern discrete GPUs and all AMD
      integrated GPUs will work.  Intel integrated GPUs (up to IvyBridge) do not
      support those necessary features. Use --opencl-device N to specify a number of
      capable GPUs to skip during device detection.
      
      Switchable graphics environments (e.g. AMD Enduro) are currently not supported,
      as some have bugs in their OpenCL drivers that cause output to be silently
      incorrect.
      
      Developed by MulticoreWare with support from AMD and Telestream.
      f49a1b2e
    • Fiona Glaser's avatar
      weightp: improve scale/offset search, chroma · 2d0c47a5
      Fiona Glaser authored
      Rescale the scale factor if the offset clips. This makes weightp more effective
      in fades to/from white (and an other situation that requires big offsets).
      
      Search more than 1 scale factor and more than 1 offset, depending on --subme.
      
      Try to find the optimal chroma denominator instead of hardcoding it.
      
      Overall improvement: a few percent in fade-heavy clips, such as a sample from
      Avatar: TLA.
      2d0c47a5
  18. 26 Feb, 2013 1 commit
  19. 25 Feb, 2013 2 commits
  20. 09 Jan, 2013 1 commit
  21. 08 Jan, 2013 1 commit
  22. 07 Nov, 2012 1 commit
  23. 18 May, 2012 1 commit
    • Fiona Glaser's avatar
      Threaded lookahead · df700eae
      Fiona Glaser authored
      Split each lookahead frame analysis call into multiple threads.  Has a small
      impact on quality, but does not seem to be consistently any worse.
      
      This helps alleviate bottlenecks with many cores and frame threads. In many
      case, this massively increases performance on many-core systems.  For example,
      over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
      Realtime 1080p30 at --preset slow should now be feasible on real systems.
      
      For sliced-threads, this patch should be faster regardless of settings (~10%).
      
      By default, lookahead threads are 1/6 of regular threads.  This isn't exacting,
      but it seems to work well for all presets on real systems.  With sliced-threads,
      it's the same as the number of encoding threads.
      df700eae
  24. 23 Apr, 2012 1 commit
  25. 27 Mar, 2012 1 commit
  26. 07 Mar, 2012 1 commit
    • Anton Mitrofanov's avatar
      Add an small per-MB cost penalty for lowres · 48e8e52e
      Anton Mitrofanov authored
      Helps avoid VBV predictors going nuts with very low-cost MBs.
      One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.
      48e8e52e
  27. 06 Mar, 2012 1 commit
    • Henrik Gramner's avatar
      Fix incorrect zero-extension assumptions in x86_64 asm · 3131a19c
      Henrik Gramner authored
      Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
      This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
      As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
      Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
      Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
      3131a19c
  28. 04 Feb, 2012 1 commit
  29. 11 Nov, 2011 1 commit
  30. 22 Oct, 2011 1 commit
  31. 15 Oct, 2011 1 commit
  32. 21 Sep, 2011 1 commit
  33. 24 Aug, 2011 2 commits