1. 29 Jun, 2019 2 commits
  2. 27 Jun, 2019 2 commits
  3. 26 Jun, 2019 3 commits
    • Martin Storsjö's avatar
      arm64: itx: Add NEON optimized inverse transforms · ef1ea008
      Martin Storsjö authored
      The speedup for most non-dc-only dct functions is around 9-12x
      over the C code generated by GCC 7.3.
      
      Relative speedups vs C for a few functions:
      
                                                    Cortex A53    A72    A73
      inv_txfm_add_4x4_dct_dct_0_8bpc_neon:               3.90   4.16   5.65
      inv_txfm_add_4x4_dct_dct_1_8bpc_neon:               7.20   8.05  11.19
      inv_txfm_add_8x8_dct_dct_0_8bpc_neon:               5.09   6.73   6.45
      inv_txfm_add_8x8_dct_dct_1_8bpc_neon:              12.18  10.80  13.05
      inv_txfm_add_16x16_dct_dct_0_8bpc_neon:             7.31   9.35  11.17
      inv_txfm_add_16x16_dct_dct_1_8bpc_neon:            14.36  13.06  15.93
      inv_txfm_add_16x16_dct_dct_2_8bpc_neon:            11.00  10.09  12.05
      inv_txfm_add_32x32_dct_dct_0_8bpc_neon:             4.41   5.40   5.77
      inv_txfm_add_32x32_dct_dct_1_8bpc_neon:            13.84  13.81  18.04
      inv_txfm_add_32x32_dct_dct_2_8bpc_neon:            11.75  11.87  15.22
      inv_txfm_add_32x32_dct_dct_3_8bpc_neon:            10.20  10.40  13.13
      inv_txfm_add_32x32_dct_dct_4_8bpc_neon:             9.01   9.21  11.56
      inv_txfm_add_64x64_dct_dct_0_8bpc_neon:             3.84   4.82   5.28
      inv_txfm_add_64x64_dct_dct_1_8bpc_neon:            14.40  12.69  16.71
      inv_txfm_add_64x64_dct_dct_4_8bpc_neon:            10.91   9.63  12.67
      
      Some of the specialcased identity_identity transforms for 32x32
      give insane speedups over the generic C code:
      
      inv_txfm_add_32x32_identity_identity_0_8bpc_neon: 225.26 238.11 247.07
      inv_txfm_add_32x32_identity_identity_1_8bpc_neon: 225.33 238.53 247.69
      inv_txfm_add_32x32_identity_identity_2_8bpc_neon:  59.60  61.94  64.63
      inv_txfm_add_32x32_identity_identity_3_8bpc_neon:  26.98  27.99  29.21
      inv_txfm_add_32x32_identity_identity_4_8bpc_neon:  15.08  15.93  16.56
      ef1ea008
    • Marvin Scholz's avatar
      tools: Use DAV1D_ERR for strerror calls · e0346114
      Marvin Scholz authored
      e0346114
    • Marvin Scholz's avatar
      04dc8a4d
  4. 24 Jun, 2019 2 commits
  5. 21 Jun, 2019 1 commit
  6. 20 Jun, 2019 1 commit
  7. 19 Jun, 2019 1 commit
  8. 14 Jun, 2019 1 commit
    • B Krishnan Iyer's avatar
      arm:mc: NEON implementation of blend, blend_h and blend_v function · a1e3f358
      B Krishnan Iyer authored
      	                A73	A53
      
      blend_h_w2_8bpc_c:	149.3	246.8
      blend_h_w2_8bpc_neon:	74.6	137
      blend_h_w4_8bpc_c:	251.6	409.8
      blend_h_w4_8bpc_neon:	66	146.6
      blend_h_w8_8bpc_c:	446.6	844.1
      blend_h_w8_8bpc_neon:	68.6	131.2
      blend_h_w16_8bpc_c:	830	1513
      blend_h_w16_8bpc_neon:	85.9	192
      blend_h_w32_8bpc_c:	1605.2	2847.8
      blend_h_w32_8bpc_neon:	149.8	357.6
      blend_h_w64_8bpc_c:	3304.8	5515.5
      blend_h_w64_8bpc_neon:	262.8	629.5
      blend_h_w128_8bpc_c:	7895.1	13260.6
      blend_h_w128_8bpc_neon:	577	1402
      blend_v_w2_8bpc_c:	241.2	410.8
      blend_v_w2_8bpc_neon:	122.1	196.8
      blend_v_w4_8bpc_c:	874.4	1418.2
      blend_v_w4_8bpc_neon:	248.5	375.9
      blend_v_w8_8bpc_c:	1550.5	2514.7
      blend_v_w8_8bpc_neon:	210.8	376
      blend_v_w16_8bpc_c:	2925.3	5086
      blend_v_w16_8bpc_neon:	253.4	608.3
      blend_v_w32_8bpc_c:	5686.7	9470.5
      blend_v_w32_8bpc_neon:	348.2	994.8
      blend_w4_8bpc_c:	201.5	309.3
      blend_w4_8bpc_neon:	38.6	99.2
      blend_w8_8bpc_c:	531.3	944.8
      blend_w8_8bpc_neon:	55.1	125.8
      blend_w16_8bpc_c:	1992.8	3349.8
      blend_w16_8bpc_neon:	150.1	344
      blend_w32_8bpc_c:	4982	8165.9
      blend_w32_8bpc_neon:	360.4	910.9
      a1e3f358
  9. 10 Jun, 2019 5 commits
  10. 09 Jun, 2019 2 commits
  11. 07 Jun, 2019 1 commit
  12. 06 Jun, 2019 1 commit
  13. 05 Jun, 2019 2 commits
  14. 04 Jun, 2019 1 commit
    • Marvin Scholz's avatar
      meson: Fix nasm detection · 098a565c
      Marvin Scholz authored
      nasm -v can actually fail for example on macOS, where nasm could be a
      stub executable that forwards commands to the real nasm, but if the real
      nasm is not installed, fails.
      This would lead to a confusing error message due to the out of bounds
      array access, to avoid that, explicitly check the exit code.
      098a565c
  15. 01 Jun, 2019 1 commit
  16. 31 May, 2019 1 commit
  17. 24 May, 2019 2 commits
  18. 23 May, 2019 1 commit
  19. 21 May, 2019 5 commits
  20. 19 May, 2019 3 commits
    • Martin Storsjö's avatar
      ci: Add full testdata tests on aarch64 · a690e548
      Martin Storsjö authored
      The armv7 runner doesn't seem to cope well with the testdata though.
      a690e548
    • Henrik Gramner's avatar
      7d5f0d0c
    • Martin Storsjö's avatar
      arm: mc: Fix 8tap_v w8 with OBMC 3/4 heights · bf920fba
      Martin Storsjö authored
      Also make sure that the w4 case can exit after processing 12 pixels,
      where it is convenient.
      
      This gives a small slowdown for in-order cores like A7, A8, A53, but
      acutally seems to give a small speedup for out-of-order cores like
      A9, A72 and A73.
      
      AArch64:
      Before:                      Cortex A53     A72     A73
      mc_8tap_regular_w8_v_8bpc_neon:   223.8   247.3   228.5
      After:
      mc_8tap_regular_w8_v_8bpc_neon:   232.5   243.9   223.4
      
      AArch32:
      Before:                       Cortex A7      A8      A9     A53     A72     A73
      mc_8tap_regular_w8_v_8bpc_neon:   550.2   470.7   520.5   257.0   256.4   248.2
      After:
      mc_8tap_regular_w8_v_8bpc_neon:   554.3   474.2   511.6   267.5   252.6   246.8
      bf920fba
  21. 18 May, 2019 1 commit
    • Henrik Gramner's avatar
      Optimize obmc blend · f64fdae5
      Henrik Gramner authored
      The last 1/4 of the mask is always zero, so we can skip some
      calculations that doesn't change the output.
      f64fdae5
  22. 17 May, 2019 1 commit