1. 21 Jun, 2019 1 commit
  2. 20 Jun, 2019 1 commit
  3. 19 Jun, 2019 1 commit
  4. 14 Jun, 2019 1 commit
    • B Krishnan Iyer's avatar
      arm:mc: NEON implementation of blend, blend_h and blend_v function · a1e3f358
      B Krishnan Iyer authored
      	                A73	A53
      
      blend_h_w2_8bpc_c:	149.3	246.8
      blend_h_w2_8bpc_neon:	74.6	137
      blend_h_w4_8bpc_c:	251.6	409.8
      blend_h_w4_8bpc_neon:	66	146.6
      blend_h_w8_8bpc_c:	446.6	844.1
      blend_h_w8_8bpc_neon:	68.6	131.2
      blend_h_w16_8bpc_c:	830	1513
      blend_h_w16_8bpc_neon:	85.9	192
      blend_h_w32_8bpc_c:	1605.2	2847.8
      blend_h_w32_8bpc_neon:	149.8	357.6
      blend_h_w64_8bpc_c:	3304.8	5515.5
      blend_h_w64_8bpc_neon:	262.8	629.5
      blend_h_w128_8bpc_c:	7895.1	13260.6
      blend_h_w128_8bpc_neon:	577	1402
      blend_v_w2_8bpc_c:	241.2	410.8
      blend_v_w2_8bpc_neon:	122.1	196.8
      blend_v_w4_8bpc_c:	874.4	1418.2
      blend_v_w4_8bpc_neon:	248.5	375.9
      blend_v_w8_8bpc_c:	1550.5	2514.7
      blend_v_w8_8bpc_neon:	210.8	376
      blend_v_w16_8bpc_c:	2925.3	5086
      blend_v_w16_8bpc_neon:	253.4	608.3
      blend_v_w32_8bpc_c:	5686.7	9470.5
      blend_v_w32_8bpc_neon:	348.2	994.8
      blend_w4_8bpc_c:	201.5	309.3
      blend_w4_8bpc_neon:	38.6	99.2
      blend_w8_8bpc_c:	531.3	944.8
      blend_w8_8bpc_neon:	55.1	125.8
      blend_w16_8bpc_c:	1992.8	3349.8
      blend_w16_8bpc_neon:	150.1	344
      blend_w32_8bpc_c:	4982	8165.9
      blend_w32_8bpc_neon:	360.4	910.9
      a1e3f358
  5. 10 Jun, 2019 5 commits
  6. 09 Jun, 2019 2 commits
  7. 07 Jun, 2019 1 commit
  8. 06 Jun, 2019 1 commit
  9. 05 Jun, 2019 2 commits
  10. 04 Jun, 2019 1 commit
    • Marvin Scholz's avatar
      meson: Fix nasm detection · 098a565c
      Marvin Scholz authored
      nasm -v can actually fail for example on macOS, where nasm could be a
      stub executable that forwards commands to the real nasm, but if the real
      nasm is not installed, fails.
      This would lead to a confusing error message due to the out of bounds
      array access, to avoid that, explicitly check the exit code.
      098a565c
  11. 01 Jun, 2019 1 commit
  12. 31 May, 2019 1 commit
  13. 24 May, 2019 2 commits
  14. 23 May, 2019 1 commit
  15. 21 May, 2019 5 commits
  16. 19 May, 2019 3 commits
    • Martin Storsjö's avatar
      ci: Add full testdata tests on aarch64 · a690e548
      Martin Storsjö authored
      The armv7 runner doesn't seem to cope well with the testdata though.
      a690e548
    • Henrik Gramner's avatar
      7d5f0d0c
    • Martin Storsjö's avatar
      arm: mc: Fix 8tap_v w8 with OBMC 3/4 heights · bf920fba
      Martin Storsjö authored
      Also make sure that the w4 case can exit after processing 12 pixels,
      where it is convenient.
      
      This gives a small slowdown for in-order cores like A7, A8, A53, but
      acutally seems to give a small speedup for out-of-order cores like
      A9, A72 and A73.
      
      AArch64:
      Before:                      Cortex A53     A72     A73
      mc_8tap_regular_w8_v_8bpc_neon:   223.8   247.3   228.5
      After:
      mc_8tap_regular_w8_v_8bpc_neon:   232.5   243.9   223.4
      
      AArch32:
      Before:                       Cortex A7      A8      A9     A53     A72     A73
      mc_8tap_regular_w8_v_8bpc_neon:   550.2   470.7   520.5   257.0   256.4   248.2
      After:
      mc_8tap_regular_w8_v_8bpc_neon:   554.3   474.2   511.6   267.5   252.6   246.8
      bf920fba
  17. 18 May, 2019 1 commit
    • Henrik Gramner's avatar
      Optimize obmc blend · f64fdae5
      Henrik Gramner authored
      The last 1/4 of the mask is always zero, so we can skip some
      calculations that doesn't change the output.
      f64fdae5
  18. 17 May, 2019 3 commits
  19. 16 May, 2019 2 commits
  20. 15 May, 2019 2 commits
    • Martin Storsjö's avatar
      arm64: msac: Add handwritten versions of msac_decode_bool functions · 2e8a3a21
      Martin Storsjö authored
      GCC                     Cortex A53   A72   A73
      msac_decode_bool_c:           29.9  17.9  23.2
      msac_decode_bool_neon:        27.4  15.3  20.4
      msac_decode_bool_adapt_c:     49.2  26.8  31.0
      msac_decode_bool_adapt_neon:  38.2  22.2  25.4
      msac_decode_bool_equi_c:      26.6  16.8  19.4
      msac_decode_bool_equi_neon:   23.9  13.7  15.7
      
      Clang                   Cortex A53   A72   A73
      msac_decode_bool_c:           28.0  16.4  23.1
      msac_decode_bool_neon:        26.9  14.6  21.0
      msac_decode_bool_adapt_c:     46.8  25.1  31.4
      msac_decode_bool_adapt_neon:  36.2  19.0  26.2
      msac_decode_bool_equi_c:      23.7  13.4  18.8
      msac_decode_bool_equi_neon:   23.7  11.3  14.2
      
      This is as fast as, or faster than, what either GCC or Clang
      produces.
      2e8a3a21
    • Martin Storsjö's avatar
      arm64: msac: Fix a typo in a comment · 84f938ec
      Martin Storsjö authored
      84f938ec
  21. 14 May, 2019 3 commits