1. 24 Dec, 2017 2 commits
    • Anton Mitrofanov's avatar
      Cosmetics · b00bcafe
      Anton Mitrofanov authored
      b00bcafe
    • Vittorio Giovara's avatar
      Unify 8-bit and 10-bit CLI and libraries · 71ed44c7
      Vittorio Giovara authored
      Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
      option to set the bit depth at runtime.
      
      Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
      incorrect value, it's preferable to induce a linking failure. If applications
      relies on this symbol this will make it more obvious where the problem is.
      
      Add Makefile rules that compiles modules with different bit depths. Assembly
      on x86 is prefixed with the 'private_prefix' define, while all other archs
      modify their function prefix internally.
      
      Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
      assembly, PowerPC assembly, and MIPS assembly.
      
      The depth and cache CLI filters heavily depend on bit depth size, so they
      need to be duplicated for each value. This means having to rename these
      filters, and adjust the callers to use the right version.
      
      Unfortunately the threaded input CLI module inherits a common.h dependency
      (input/frame -> common/threadpool -> common/frame -> common/common) which
      is extremely complicated to address in a sensible way. Instead duplicate
      the module and select the appropriate one at run time.
      
      Each bitdepth needs different checkasm compilation rules, so split the main
      checkasm target into two executables.
      71ed44c7
  2. 21 May, 2017 6 commits
  3. 21 Jan, 2017 1 commit
  4. 12 Apr, 2016 1 commit
    • Henrik Gramner's avatar
      x86: SSE2/AVX idct_dequant_2x4_(dc|dconly) · 23d1d8e8
      Henrik Gramner authored
      Only used in 4:2:2. Both 8-bit and high bit-depth implemented.
      
      Approximate performance improvement compared to C on Ivy Bridge:
      
                               x86-32  x86-64
      idct_dequant_2x4_dc      2.1x    1.7x
      idct_dequant_2x4_dconly  2.7x    2.0x
      
      Helps more on 32-bit due to the C versions being register starved.
      23d1d8e8
  5. 16 Jan, 2016 1 commit
  6. 23 Feb, 2015 1 commit
  7. 16 Sep, 2014 1 commit
  8. 26 Aug, 2014 1 commit
  9. 08 Jan, 2014 1 commit
  10. 20 May, 2013 1 commit
  11. 23 Apr, 2013 2 commits
  12. 26 Feb, 2013 1 commit
    • Fiona Glaser's avatar
      quant_4x4x4: quant one 8x8 block at a time · 993c81e9
      Fiona Glaser authored
      This reduces overhead and lets us use less branchy code for zigzag, dequant,
      decimate, and so on.
      Reorganize and optimize a lot of macroblock_encode using this new function.
      ~1-2% faster overall.
      
      Includes NEON and x86 versions of the new function.
      Using larger merged functions like this will also make wider SIMD, like
      AVX2, more effective.
      993c81e9
  13. 09 Jan, 2013 1 commit
  14. 04 Feb, 2012 1 commit
  15. 15 Jan, 2012 1 commit
    • Loren Merritt's avatar
      CABAC trellis opts part 4: x86_64 asm · 7d804baf
      Loren Merritt authored
      Another 20% faster.
      18k->12k codesize.
      
      This patch series may have a large impact on encoding speed.
      For example, 24% faster at --preset slower --crf 23 with 720p parkjoy.
      Overall speed increase is proportional to the cost of trellis (which is proportional to bitrate, and much more with --trellis 2).
      7d804baf
  16. 22 Oct, 2011 2 commits
  17. 21 Sep, 2011 1 commit
  18. 09 Aug, 2011 1 commit
  19. 24 Mar, 2011 1 commit
  20. 18 Feb, 2011 1 commit
  21. 25 Jan, 2011 2 commits
    • Fiona Glaser's avatar
      Initial AVX support · 68cda11b
      Fiona Glaser authored
      Automatically handle 3-operand instructions and abstraction between SSE and AVX.
      Implement one function with this (denoise_dct) as an initial test.
      x264 can't make much use of the 256-bit support of AVX (as it's float-only), but 3-operand could give some small benefits.
      68cda11b
    • Sean McGovern's avatar
      Bump dates to 2011 · ee9bc136
      Sean McGovern authored
      ee9bc136
  22. 14 Dec, 2010 1 commit
  23. 19 Nov, 2010 2 commits
  24. 18 Sep, 2010 1 commit
    • Fiona Glaser's avatar
      Update source file headers · 213a99d0
      Fiona Glaser authored
      Update dates, improve file descriptions, make things more consistent.
      Also add information about commercial licensing.
      213a99d0
  25. 26 May, 2010 1 commit
    • Fiona Glaser's avatar
      Detect Atom CPU, enable appropriate asm functions · 57729402
      Fiona Glaser authored
      I'm not going to actually optimize for this pile of garbage unless someone pays me.
      But it can't hurt to at least enable the correct functions based on benchmarks.
      
      Also save some cache on Intel CPUs that don't need the decimate LUT due to having fast bsr/bsf.
      57729402
  26. 12 Oct, 2009 1 commit
    • Loren Merritt's avatar
      change all dct arrays to 1d. · 1fbba0ca
      Loren Merritt authored
      the C standard doesn't allow you to iterate 1-dimensionally over 2d arrays, and nothing other than the dsp functions themselves cares about the 2dness of dct.
      this fixes a miscompilation in x264_mb_optimize_chroma_dc.
      1fbba0ca
  27. 30 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Massive overhaul of nnz/cbp calculation · e394bd60
      Fiona Glaser authored
      Modify quantization to also calculate array_non_zero.
      PPC assembly changes by gpoirior.
      New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
      Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
      Also add new i16x16 DC-only iDCT with asm.
      Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
      Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
      Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
      Overall performance increase 0-6% depending on encoding settings.
      e394bd60
  28. 31 Dec, 2008 1 commit
  29. 30 Dec, 2008 1 commit
  30. 28 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Significantly faster CABAC and CAVLC residual coding and bit cost calculation · c1d73389
      Fiona Glaser authored
      Early-terminate in residual writing using stored nnz counts
      To allow the above, store nnz counts for luma and chroma DC
      Add assembly functions to find the last nonzero coefficient in a block
      Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
      Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
      CAVLC output should be equivalent.
      c1d73389