1. 02 Jan, 2020 2 commits
  2. 01 Jan, 2020 3 commits
  3. 31 Dec, 2019 2 commits
  4. 29 Dec, 2019 2 commits
  5. 28 Dec, 2019 3 commits
  6. 24 Dec, 2019 1 commit
  7. 18 Dec, 2019 1 commit
    • Martin Storsjö's avatar
      Don't assume dlsym exists on linux · 14d586ac
      Martin Storsjö authored
      After checking if -ldl exists, use it for checking for the dlsym
      function.
      
      This fixes building in environments where the dlsym function is
      unavailable. (My testcase is NDK builds with -static, where dlsym
      isn't available for static linking, only if linking dynamically.)
      14d586ac
  8. 17 Dec, 2019 1 commit
  9. 14 Dec, 2019 4 commits
  10. 13 Dec, 2019 1 commit
  11. 05 Dec, 2019 3 commits
  12. 30 Nov, 2019 1 commit
  13. 27 Nov, 2019 1 commit
    • Henrik Gramner's avatar
      Avoid excessive L2 collisions with certain frame widths · 82eda83a
      Henrik Gramner authored
      Memory addresses with certain power-of-two offsets will map to the
      same set of cache lines. Using such offsets as strides will cause
      excessive cache evictions resulting in more cache misses.
      
      Avoid this by adding a small padding when the stride is a multiple
      of 1024 (somewhat arbitrarily chosen as the specific number depends
      on the hardware implementation) when allocating picture buffers.
      82eda83a
  14. 26 Nov, 2019 1 commit
  15. 23 Nov, 2019 1 commit
  16. 21 Nov, 2019 2 commits
    • Ronald S. Bultje's avatar
      Fix stride type · 35d3d2b6
      Ronald S. Bultje authored
      Prevents the following compiler warning:
      
      ../src/decode.c:1979:32: warning: implicit conversion loses integer precision: 'const ptrdiff_t' (aka 'const long') to 'int' [-Wshorten-64-to-32]
                  const int stride = f->cur.stride[!!p];
                            ~~~~~~   ^~~~~~~~~~~~~~~~~~
      1 warning generated.
      35d3d2b6
    • Ronald S. Bultje's avatar
      Make OBU_* types public · 4bf52cb5
      Ronald S. Bultje authored
      4bf52cb5
  17. 17 Nov, 2019 1 commit
  18. 16 Nov, 2019 2 commits
  19. 15 Nov, 2019 1 commit
  20. 12 Nov, 2019 5 commits
    • Martin Storsjö's avatar
      arm: 64: loopfilter: Avoid nested ifdefs where easily possible · dcbbf775
      Martin Storsjö authored
      This was requested in the review of the arm32 version of the same.
      dcbbf775
    • Martin Storsjö's avatar
      arm: 64: loopfilter: Fix a typo in a macro parameter condition · 564482b6
      Martin Storsjö authored
      This removes one redundant instruction for loop filters smaller
      than 16.
      564482b6
    • Martin Storsjö's avatar
      arm64: loopfilter: Reorder instructions and tweak register use to match the arm32 port · 3069ab94
      Martin Storsjö authored
      This doesn't change performance measurably, but eases potential
      future maintainance of the code.
      3069ab94
    • Martin Storsjö's avatar
      abd07c67
    • Martin Storsjö's avatar
      arm: 32: Port the arm64 NEON loopfilter to arm32 · 9a100261
      Martin Storsjö authored
      The code is a fairly exact 1:1 port of the ARM64 code, but operating
      on 8 pixels at a time, instead of 16.
      
      Relative speedup over C code according to checkasm:
                             Cortex A7     A8     A9    A53    A72    A73
      lpf_h_sb_uv_w4_8bpc_neon:   1.36   1.40   1.25   1.71   1.55   1.59
      lpf_h_sb_uv_w6_8bpc_neon:   2.18   2.11   1.74   2.65   2.32   2.34
      lpf_h_sb_y_w4_8bpc_neon:    1.48   1.43   1.20   1.91   1.49   1.64
      lpf_h_sb_y_w8_8bpc_neon:    2.34   2.05   1.78   2.84   2.35   2.69
      lpf_h_sb_y_w16_8bpc_neon:   2.13   1.83   1.63   2.51   2.10   2.35
      lpf_v_sb_uv_w4_8bpc_neon:   1.69   1.66   1.60   2.16   2.24   2.24
      lpf_v_sb_uv_w6_8bpc_neon:   2.68   2.43   2.22   3.53   3.44   3.35
      lpf_v_sb_y_w4_8bpc_neon:    1.74   1.74   1.43   2.34   2.14   2.18
      lpf_v_sb_y_w8_8bpc_neon:    2.92   2.47   2.19   3.55   3.22   3.54
      lpf_v_sb_y_w16_8bpc_neon:   2.62   2.19   1.98   3.25   2.80   3.10
      
      Comparison to the original ARM64 assembly:
      ARM64:                        A53     A72     A73
      lpf_h_sb_uv_w4_8bpc_neon:   702.5   518.2   529.1
      lpf_h_sb_uv_w6_8bpc_neon:  1007.3   672.6   736.6
      lpf_h_sb_y_w4_8bpc_neon:   1652.8  1261.2  1276.5
      lpf_h_sb_y_w8_8bpc_neon:   2144.7  1559.8  1638.7
      lpf_h_sb_y_w16_8bpc_neon:  2318.3  1757.2  1792.8
      lpf_v_sb_uv_w4_8bpc_neon:   447.1   302.0   292.4
      lpf_v_sb_uv_w6_8bpc_neon:   600.0   397.7   406.9
      lpf_v_sb_y_w4_8bpc_neon:   1212.6   840.1   818.4
      lpf_v_sb_y_w8_8bpc_neon:   1623.3  1167.4  1156.7
      lpf_v_sb_y_w16_8bpc_neon:  1694.9  1237.9  1182.3
      ARM32:
      lpf_h_sb_uv_w4_8bpc_neon:   821.2   501.1   500.8
      lpf_h_sb_uv_w6_8bpc_neon:  1232.0   715.7   746.6
      lpf_h_sb_y_w4_8bpc_neon:   2208.1  1373.2  1414.7
      lpf_h_sb_y_w8_8bpc_neon:   3138.3  1843.1  1915.2
      lpf_h_sb_y_w16_8bpc_neon:  3293.1  1842.5  1975.9
      lpf_v_sb_uv_w4_8bpc_neon:   619.9   326.7   324.9
      lpf_v_sb_uv_w6_8bpc_neon:   855.9   446.7   468.2
      lpf_v_sb_y_w4_8bpc_neon:   1737.6   935.5  1007.0
      lpf_v_sb_y_w8_8bpc_neon:   2346.7  1232.8  1298.3
      lpf_v_sb_y_w16_8bpc_neon:  2353.4  1283.4  1379.9
      9a100261
  21. 10 Nov, 2019 1 commit
  22. 01 Nov, 2019 1 commit