Skip to content
Snippets Groups Projects
  1. Feb 08, 2021
    • Janne Grunau's avatar
      tools: add optional xxh3 based muxer · e6168525
      Janne Grunau authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      The required 'xxhash.h' header can either be in system include directory
      or can be copied to 'tools/output'.
      
      The xxh3_128bits based muxer shows no significant slowdown compared to
      the null muxer. Decoding times Chimera-AV1-8bit-1920x1080-6736kbps.ivf
      with 4 frame and 4 tile threads on a core i7-8550U (disabled turbo boost):
      
      null:  72.5 s
      md5:   99.8 s
      xxh3:  73.8 s
      
      Decoding Chimera-AV1-10bit-1920x1080-6191kbps.ivf with 6 frame and 4 tile
      threads on a m1 mc mini:
      
      null:  27.8 s
      md5:  105.9 s
      xxh3:  28.3 s
      e6168525
    • Matthias Dressel's avatar
      cli: Fix md5 verification for short values · 061ac9ae
      Matthias Dressel authored
      Verification should not succeed if the given string is too short to be a
      real hash.
      
      Fixes videolan/dav1d#361
      061ac9ae
  2. Feb 06, 2021
  3. Feb 05, 2021
    • Victorien Le Couviour--Tuffet's avatar
      Fix potential deadlock · 8b1a96e4
      Victorien Le Couviour--Tuffet authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      If the postfilter tasks allocation fails, a deadlock would occur.
      8b1a96e4
    • Martin Storsjö's avatar
    • Kyle Siefring's avatar
      arm64: warped motion: Various optimizations · a3b8157e
      Kyle Siefring authored and Martin Storsjö's avatar Martin Storsjö committed
      - Reorder loads of filters to benifit in order cores.
      - Use full 128-bit vectors to transpose 8x8 bytes. zip1 is called in the
         first stage which will hurt performance on some older big cores.
      - Rework horz stage for 8 bit mode:
          * Use smull instead of mul
          * Replace existing narrow and long instructions
          * Replace mov after calling with right shift
      
      Before:            Cortex A55    A53     A72     A73
      warp_8x8_8bpc_neon:    1683.2  1860.6  1065.0  1102.6
      warp_8x8t_8bpc_neon:   1673.2  1846.4  1057.0  1098.4
      warp_8x8_16bpc_neon:   1870.7  2031.7  1147.3  1220.7
      warp_8x8t_16bpc_neon:  1848.0  2006.2  1121.6  1188.0
      After:
      warp_8x8_8bpc_neon:    1267.2  1446.2   807.0   871.5
      warp_8x8t_8bpc_neon:   1245.4  1422.0   810.2   868.4
      warp_8x8_16bpc_neon:   1769.8  1929.3  1132.0  1238.2
      warp_8x8t_16bpc_neon:  1747.3  1904.1  1101.5  1207.9
      
      Cortex-A55
      Before:
      warp_8x8_8bpc_neon:   1683.2
      warp_8x8t_8bpc_neon:  1673.2
      warp_8x8_16bpc_neon:  1870.7
      warp_8x8t_16bpc_neon: 1848.0
      After:
      warp_8x8_8bpc_neon:   1267.2
      warp_8x8t_8bpc_neon:  1245.4
      warp_8x8_16bpc_neon:  1769.8
      warp_8x8t_16bpc_neon: 1747.3
      a3b8157e
    • Kyle Siefring's avatar
      arm64: loopfilter: Avoid leaving 8-bits · 833382b3
      Kyle Siefring authored and Martin Storsjö's avatar Martin Storsjö committed
      Avoid moving between 8 and 16-bit vectors where possible.
      833382b3
  4. Feb 04, 2021
  5. Feb 02, 2021
  6. Feb 01, 2021
  7. Jan 28, 2021
  8. Jan 25, 2021
  9. Jan 21, 2021
  10. Jan 20, 2021
  11. Jan 18, 2021
  12. Jan 15, 2021
  13. Jan 11, 2021
    • Nathan E. Egge's avatar
      Round and clip with one step, mc_8tap_regular_h_c · b12229cc
      Nathan E. Egge authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Relative speed-ups compared with gcc-9.2.0:
      
                                        Before     After
      mc_8tap_regular_w2_h_16bpc_c:      276.6     219.9
      mc_8tap_regular_w4_h_16bpc_c:      489.5     374.5
      mc_8tap_regular_w8_h_16bpc_c:      897.7     686.8
      mc_8tap_regular_w16_h_16bpc_c:    2573.7    2314.2
      mc_8tap_regular_w32_h_16bpc_c:    7647.3    7012.4
      mc_8tap_regular_w64_h_16bpc_c:   28163.8   25057.4
      mc_8tap_regular_w128_h_16bpc_c:  77678.4   73570.0
      b12229cc
    • Kyle Siefring's avatar
      Rework the usage of noskip_mask · 0bd57c6b
      Kyle Siefring authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Remove half of the masks since they are only used for cdef on a 8x8
      level of granularity.
      
      Load the mask and combine the 16-bit sections into the 32-bit sections
      outside of the inner cdef loop. This should save some registers.
      
      Results in mild performance improvements.
      0bd57c6b
Loading