1. 19 May, 2019 3 commits
    • Martin Storsjö's avatar
      ci: Add full testdata tests on aarch64 · a690e548
      Martin Storsjö authored
      The armv7 runner doesn't seem to cope well with the testdata though.
      a690e548
    • Henrik Gramner's avatar
      7d5f0d0c
    • Martin Storsjö's avatar
      arm: mc: Fix 8tap_v w8 with OBMC 3/4 heights · bf920fba
      Martin Storsjö authored
      Also make sure that the w4 case can exit after processing 12 pixels,
      where it is convenient.
      
      This gives a small slowdown for in-order cores like A7, A8, A53, but
      acutally seems to give a small speedup for out-of-order cores like
      A9, A72 and A73.
      
      AArch64:
      Before:                      Cortex A53     A72     A73
      mc_8tap_regular_w8_v_8bpc_neon:   223.8   247.3   228.5
      After:
      mc_8tap_regular_w8_v_8bpc_neon:   232.5   243.9   223.4
      
      AArch32:
      Before:                       Cortex A7      A8      A9     A53     A72     A73
      mc_8tap_regular_w8_v_8bpc_neon:   550.2   470.7   520.5   257.0   256.4   248.2
      After:
      mc_8tap_regular_w8_v_8bpc_neon:   554.3   474.2   511.6   267.5   252.6   246.8
      bf920fba
  2. 18 May, 2019 1 commit
    • Henrik Gramner's avatar
      Optimize obmc blend · f64fdae5
      Henrik Gramner authored
      The last 1/4 of the mask is always zero, so we can skip some
      calculations that doesn't change the output.
      f64fdae5
  3. 17 May, 2019 3 commits
  4. 16 May, 2019 2 commits
  5. 15 May, 2019 2 commits
    • Martin Storsjö's avatar
      arm64: msac: Add handwritten versions of msac_decode_bool functions · 2e8a3a21
      Martin Storsjö authored
      GCC                     Cortex A53   A72   A73
      msac_decode_bool_c:           29.9  17.9  23.2
      msac_decode_bool_neon:        27.4  15.3  20.4
      msac_decode_bool_adapt_c:     49.2  26.8  31.0
      msac_decode_bool_adapt_neon:  38.2  22.2  25.4
      msac_decode_bool_equi_c:      26.6  16.8  19.4
      msac_decode_bool_equi_neon:   23.9  13.7  15.7
      
      Clang                   Cortex A53   A72   A73
      msac_decode_bool_c:           28.0  16.4  23.1
      msac_decode_bool_neon:        26.9  14.6  21.0
      msac_decode_bool_adapt_c:     46.8  25.1  31.4
      msac_decode_bool_adapt_neon:  36.2  19.0  26.2
      msac_decode_bool_equi_c:      23.7  13.4  18.8
      msac_decode_bool_equi_neon:   23.7  11.3  14.2
      
      This is as fast as, or faster than, what either GCC or Clang
      produces.
      2e8a3a21
    • Martin Storsjö's avatar
      arm64: msac: Fix a typo in a comment · 84f938ec
      Martin Storsjö authored
      84f938ec
  6. 14 May, 2019 5 commits
  7. 12 May, 2019 2 commits
  8. 11 May, 2019 2 commits
  9. 09 May, 2019 7 commits
  10. 08 May, 2019 4 commits
  11. 07 May, 2019 2 commits
  12. 06 May, 2019 1 commit
  13. 05 May, 2019 1 commit
  14. 04 May, 2019 3 commits
    • Henrik Gramner's avatar
      Control the stack size of spawned threads · 2756254f
      Henrik Gramner authored
      On some systems (e.g. Google Fuchsia) the default stack size of new
      threads is insufficient, resulting in crashes.
      
      On other systems the default stack size is unnecessarily large,
      which can waste a lot of virtual memory.
      
      By setting it to a sufficiently large fixed value we can ensure that
      we don't run out of stack space while keeping down memory usage.
      2756254f
    • Martin Storsjö's avatar
      arm64: msac: Implement NEON msac_decode_symbol_adapt · 1d5c1a49
      Martin Storsjö authored
                                   Cortex A53    A72    A73
      msac_decode_symbol_adapt4_c:      107.6   57.1   67.8
      msac_decode_symbol_adapt4_neon:    70.4   56.4   55.1
      msac_decode_symbol_adapt8_c:      157.1   74.5   90.3
      msac_decode_symbol_adapt8_neon:    75.6   57.2   56.9
      msac_decode_symbol_adapt16_c:     257.4  106.6  135.9
      msac_decode_symbol_adapt16_neon:  101.8   62.0   65.2
      1d5c1a49
    • Martin Storsjö's avatar
      itx_tmpl: Fix the assert in inv_txfm_add_c · 058ca08d
      Martin Storsjö authored
      The previous form of the assert was automatically true for any
      value of w and h.
      058ca08d
  15. 29 Apr, 2019 1 commit
  16. 24 Apr, 2019 1 commit