1. 06 Mar, 2019 1 commit
  2. 17 Jan, 2018 1 commit
  3. 24 Dec, 2017 5 commits
    • Martin Storsjö's avatar
      aarch64: Use ldurb/sturb for loads/stores with negative offsets · 99ca611d
      Martin Storsjö authored
      The assembler (both gas and clang/llvm) automatically fixes this,
      armasm64 doesn't. We can fix it in gas-preprocessor, but we should
      also be using the right instruction form.
      99ca611d
    • Martin Storsjö's avatar
      aarch64: Don't .set a symbol named st2 · 12ca9a69
      Martin Storsjö authored
      This confuses gas-preprocessor, which tries to replace actual
      st2 instructions by the integer 1 or 2.
      12ca9a69
    • Vittorio Giovara's avatar
      Unify 8-bit and 10-bit CLI and libraries · 71ed44c7
      Vittorio Giovara authored
      Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
      option to set the bit depth at runtime.
      
      Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
      incorrect value, it's preferable to induce a linking failure. If applications
      relies on this symbol this will make it more obvious where the problem is.
      
      Add Makefile rules that compiles modules with different bit depths. Assembly
      on x86 is prefixed with the 'private_prefix' define, while all other archs
      modify their function prefix internally.
      
      Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
      assembly, PowerPC assembly, and MIPS assembly.
      
      The depth and cache CLI filters heavily depend on bit depth size, so they
      need to be duplicated for each value. This means having to rename these
      filters, and adjust the callers to use the right version.
      
      Unfortunately the threaded input CLI module inherits a common.h dependency
      (input/frame -> common/threadpool -> common/frame -> common/common) which
      is extremely complicated to address in a sensible way. Instead duplicate
      the module and select the appropriate one at run time.
      
      Each bitdepth needs different checkasm compilation rules, so split the main
      checkasm target into two executables.
      71ed44c7
    • Vittorio Giovara's avatar
    • Vittorio Giovara's avatar
  4. 14 Jun, 2017 1 commit
    • Martin Storsjö's avatar
      aarch64: Update the var2 functions to the new signature · 98e9543b
      Martin Storsjö authored
      The existing functions could easily be used by just calling them
      twice - this would give the following cycle numbers from checkasm:
      
      var2_8x8_c:      4110
      var2_8x8_neon:   1505
      var2_8x16_c:     8019
      var2_8x16_neon:  2545
      
      However, by merging both passes into the same function, we get the
      following speedup:
      var2_8x8_neon:   1205
      var2_8x16_neon:  2327
      98e9543b
  5. 21 Jan, 2017 2 commits
  6. 01 Dec, 2016 1 commit
    • Anton Mitrofanov's avatar
      Cosmetics · b2b39dae
      Anton Mitrofanov authored
      Also make x264_weighted_reference_duplicate() static.
      b2b39dae
  7. 21 Nov, 2016 1 commit
  8. 20 Sep, 2016 1 commit
  9. 17 Sep, 2016 1 commit
  10. 13 Jun, 2016 1 commit
    • Janne Grunau's avatar
      aarch64: Add asm for mbtree fixed point conversion · b6f189eb
      Janne Grunau authored
      pack is ~7 times faster and unpack is ~9 times faster on a cortex-a53
      compared to gcc-5.3.
      
      mbtree_fix8_pack_c: 41534
      mbtree_fix8_pack_neon: 5766
      mbtree_fix8_unpack_c: 44102
      mbtree_fix8_unpack_neon: 4868
      b6f189eb
  11. 16 Jan, 2016 1 commit
  12. 11 Oct, 2015 9 commits
  13. 03 Sep, 2015 1 commit
  14. 27 Aug, 2015 1 commit
    • Martin Storsjö's avatar
      aarch64: Fix integral_init4/8h_neon · 5c4728d8
      Martin Storsjö authored
      The stride is the number of uint16_t elements and thus needs
      to be shifted.
      
      This issue had slipped unnoticed since checkasm didn't actually
      verify the output of these functions.
      5c4728d8
  15. 23 Feb, 2015 1 commit
  16. 16 Dec, 2014 12 commits