Skip to content
Snippets Groups Projects
  1. Feb 09, 2021
    • Martin Storsjö's avatar
      arm64: looprestoration: Rewrite the wiener functions · 2e73051c
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Make them operate in a more cache friendly manner, interleaving
      horizontal and vertical filtering (reducing the amount of stack
      used from 51 KB to 4 KB), similar to what was done for x86 in
      78d27b7d.
      
      This also adds separate 5tap versions of the filters and unrolls
      the vertical filter a bit more (which maybe could have been done
      without doing the rewrite).
      
      This does, however, increase the compiled code size by around
      3.5 KB.
      
      Before:                Cortex A53       A72       A73
      wiener_5tap_8bpc_neon:   136855.6   91446.2   87363.6
      wiener_7tap_8bpc_neon:   136861.6   91454.9   87374.5
      wiener_5tap_10bpc_neon:  167685.3  114720.3  116522.1
      wiener_5tap_12bpc_neon:  167677.5  114724.7  116511.9
      wiener_7tap_10bpc_neon:  167681.6  114738.5  116567.0
      wiener_7tap_12bpc_neon:  167673.8  114720.8  116515.4
      After:
      wiener_5tap_8bpc_neon:    87102.1   60460.6   66803.8
      wiener_7tap_8bpc_neon:   110831.7   78489.0   82015.9
      wiener_5tap_10bpc_neon:  109999.2   90259.0   89238.0
      wiener_5tap_12bpc_neon:  109978.3   90255.7   89220.7
      wiener_7tap_10bpc_neon:  137877.6  107578.5  103435.6
      wiener_7tap_12bpc_neon:  137868.8  107568.9  103390.4
      2e73051c
    • Kyle Siefring's avatar
      arm64: mc: Improve first tap for inorder cores · 4e869495
      Kyle Siefring authored and Martin Storsjö's avatar Martin Storsjö committed
      Change order of multiply accumulates to allow inorder cores to forward
      the results.
      4e869495
  2. Feb 08, 2021
    • Martin Storsjö's avatar
      arm32: mc: Optimize warp by doing horz filtering in 8 bit · 0477fcf1
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Additionally reschedule instructions for loading, to reduce stalls
      on in order cores.
      
      This applies the changes from a3b8157e
      on the arm32 version.
      
      Before:             Cortex A7      A8      A9     A53     A72     A73
      warp_8x8_8bpc_neon:    3659.3  1746.0  1931.9  2128.8  1173.7  1188.9
      warp_8x8t_8bpc_neon:   3650.8  1724.6  1919.8  2105.0  1147.7  1206.9
      warp_8x8_16bpc_neon:   4039.4  2111.9  2337.1  2462.5  1334.6  1396.5
      warp_8x8t_16bpc_neon:  3973.9  2137.1  2299.6  2413.2  1282.8  1369.6
      After:
      warp_8x8_8bpc_neon:    2920.8  1269.8  1410.3  1767.3   860.2  1004.8
      warp_8x8t_8bpc_neon:   2904.9  1283.9  1397.5  1743.7   863.6  1024.7
      warp_8x8_16bpc_neon:   3895.5  2060.7  2339.8  2376.6  1331.1  1394.0
      warp_8x8t_16bpc_neon:  3822.7  2026.7  2298.7  2325.4  1278.1  1360.8
      0477fcf1
    • Henrik Gramner's avatar
      build: Fix ninja warning message on Windows · 69268d3a
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      We currently run 'git describe --match' to obtain the current version,
      but meson doesn't properly quote/escape the pattern string on Windows.
      
      As a result, "fatal: Not a valid object name .ninja_log" is printed
      when compiling on Windows systems. Compilation still works, but the
      warning is annoying and misleading.
      
      Currently we don't actually need the pattern matching functionality
      (which is why things still work), so simply remove it as a workaround.
      69268d3a
    • Martin Storsjö's avatar
      xxhash: Add a cast to silence a warning when built with MSVC · 95884615
      Martin Storsjö authored
      This silences the following warning:
      tools/output/xxhash.c(127): warning C4244: '=': conversion from 'unsigned long' to 'unsigned char', possible loss of data
      95884615
    • Martin Storsjö's avatar
      lf_mask: Align an array that is accessed via aliasing structures · 0a577fd2
      Martin Storsjö authored
      This fixes bus errors due to missing alignment, when built with GCC 9
      for arm32 with -mfpu=neon.
      0a577fd2
    • Janne Grunau's avatar
      tools: add optional xxh3 based muxer · e6168525
      Janne Grunau authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      The required 'xxhash.h' header can either be in system include directory
      or can be copied to 'tools/output'.
      
      The xxh3_128bits based muxer shows no significant slowdown compared to
      the null muxer. Decoding times Chimera-AV1-8bit-1920x1080-6736kbps.ivf
      with 4 frame and 4 tile threads on a core i7-8550U (disabled turbo boost):
      
      null:  72.5 s
      md5:   99.8 s
      xxh3:  73.8 s
      
      Decoding Chimera-AV1-10bit-1920x1080-6191kbps.ivf with 6 frame and 4 tile
      threads on a m1 mc mini:
      
      null:  27.8 s
      md5:  105.9 s
      xxh3:  28.3 s
      e6168525
    • Matthias Dressel's avatar
      cli: Fix md5 verification for short values · 061ac9ae
      Matthias Dressel authored
      Verification should not succeed if the given string is too short to be a
      real hash.
      
      Fixes videolan/dav1d#361
      061ac9ae
  3. Feb 06, 2021
  4. Feb 05, 2021
    • Victorien Le Couviour--Tuffet's avatar
      Fix potential deadlock · 8b1a96e4
      Victorien Le Couviour--Tuffet authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      If the postfilter tasks allocation fails, a deadlock would occur.
      8b1a96e4
    • Martin Storsjö's avatar
    • Kyle Siefring's avatar
      arm64: warped motion: Various optimizations · a3b8157e
      Kyle Siefring authored and Martin Storsjö's avatar Martin Storsjö committed
      - Reorder loads of filters to benifit in order cores.
      - Use full 128-bit vectors to transpose 8x8 bytes. zip1 is called in the
         first stage which will hurt performance on some older big cores.
      - Rework horz stage for 8 bit mode:
          * Use smull instead of mul
          * Replace existing narrow and long instructions
          * Replace mov after calling with right shift
      
      Before:            Cortex A55    A53     A72     A73
      warp_8x8_8bpc_neon:    1683.2  1860.6  1065.0  1102.6
      warp_8x8t_8bpc_neon:   1673.2  1846.4  1057.0  1098.4
      warp_8x8_16bpc_neon:   1870.7  2031.7  1147.3  1220.7
      warp_8x8t_16bpc_neon:  1848.0  2006.2  1121.6  1188.0
      After:
      warp_8x8_8bpc_neon:    1267.2  1446.2   807.0   871.5
      warp_8x8t_8bpc_neon:   1245.4  1422.0   810.2   868.4
      warp_8x8_16bpc_neon:   1769.8  1929.3  1132.0  1238.2
      warp_8x8t_16bpc_neon:  1747.3  1904.1  1101.5  1207.9
      
      Cortex-A55
      Before:
      warp_8x8_8bpc_neon:   1683.2
      warp_8x8t_8bpc_neon:  1673.2
      warp_8x8_16bpc_neon:  1870.7
      warp_8x8t_16bpc_neon: 1848.0
      After:
      warp_8x8_8bpc_neon:   1267.2
      warp_8x8t_8bpc_neon:  1245.4
      warp_8x8_16bpc_neon:  1769.8
      warp_8x8t_16bpc_neon: 1747.3
      a3b8157e
    • Kyle Siefring's avatar
      arm64: loopfilter: Avoid leaving 8-bits · 833382b3
      Kyle Siefring authored and Martin Storsjö's avatar Martin Storsjö committed
      Avoid moving between 8 and 16-bit vectors where possible.
      833382b3
  5. Feb 04, 2021
  6. Feb 02, 2021
  7. Feb 01, 2021
  8. Jan 28, 2021
  9. Jan 25, 2021
  10. Jan 21, 2021
  11. Jan 20, 2021
Loading