Skip to content
Snippets Groups Projects
  1. Mar 07, 2020
  2. Mar 06, 2020
    • Konstantin Pavlov's avatar
      CI: add examples job build · 55439739
      Konstantin Pavlov authored
      This currently does not check the vulkan/placebo codepath since needed
      packages are not yet in Debian unstable.
      55439739
    • Konstantin Pavlov's avatar
      examples: fail when SDL is not found · e36ebb6f
      Konstantin Pavlov authored
      Now when -Denable_examples=true is requested, meson will fail as
      expected if there is no SDL available.
      e36ebb6f
    • Konstantin Pavlov's avatar
      CI: Add documentation CI job · b8200c13
      Konstantin Pavlov authored
      This requires a docker image with doxygen & dot installed, so bump it as
      well.
      
      Fixes #334.
      b8200c13
    • Konstantin Pavlov's avatar
      CI: Deduplicate and template jobs · bf60f0ab
      Konstantin Pavlov authored
      This makes it much easier to introduce new jobs without copying walls of
      text over and over.  No functional changes.
      
      Changes are:
       - move docker images to common templates to make them easier to bump
       - replace "debian" tag with "docker" to choose runners
       - align meson parameters
       - use variables sections where applicable
       - move test data cache to before_script
      bf60f0ab
    • Konstantin Pavlov's avatar
    • Jan Beich's avatar
      examples: chase cacc8e35 · e04227c5
      Jan Beich authored
      ../examples/dav1dplay.c:1030:5: warning: implicit declaration of function 'init_demuxers' is invalid in C99 [-Wimplicit-function-declaration]
          init_demuxers();
          ^
      /usr/bin/ld.bfd: examples/c590b3c@@dav1dplay@exe/dav1dplay.c.o: in function `decoder_thread_main':
      dav1dplay.c:(.text+0x1243): undefined reference to `init_demuxers'
      cc: error: linker command failed with exit code 1 (use -v to see invocation)
      e04227c5
  3. Mar 05, 2020
    • Jean-Baptiste Kempf's avatar
      Update NEWS for 0.6.0 · efd9e551
      Jean-Baptiste Kempf authored
      0.6.0
      efd9e551
    • Martin Storsjö's avatar
      arm64: mc: NEON implementation of w_mask for 16 bpc · c8aaddea
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Checkasm numbers:          Cortex A53       A72       A73
      w_mask_420_w4_16bpc_neon:       173.6     123.5     120.3
      w_mask_420_w8_16bpc_neon:       484.2     344.1     329.5
      w_mask_420_w16_16bpc_neon:     1411.2    1027.4    1035.1
      w_mask_420_w32_16bpc_neon:     5561.5    4093.2    3980.1
      w_mask_420_w64_16bpc_neon:    13809.6    9856.5    9581.0
      w_mask_420_w128_16bpc_neon:   35614.7   25553.8   24284.4
      w_mask_422_w4_16bpc_neon:       159.4     112.2     114.2
      w_mask_422_w8_16bpc_neon:       453.4     326.1     326.7
      w_mask_422_w16_16bpc_neon:     1394.6    1062.3    1050.2
      w_mask_422_w32_16bpc_neon:     5485.8    4219.6    4027.3
      w_mask_422_w64_16bpc_neon:    13701.2   10079.6    9692.6
      w_mask_422_w128_16bpc_neon:   35455.3   25892.5   24625.9
      w_mask_444_w4_16bpc_neon:       153.0     112.3     112.7
      w_mask_444_w8_16bpc_neon:       437.2     331.8     325.8
      w_mask_444_w16_16bpc_neon:     1395.1    1069.1    1041.7
      w_mask_444_w32_16bpc_neon:     5370.1    4213.5    4138.1
      w_mask_444_w64_16bpc_neon:    13482.6   10190.5   10004.6
      w_mask_444_w128_16bpc_neon:   35583.7   26911.2   25638.8
      
      Corresponding numbers for 8 bpc for comparison:
      
      w_mask_420_w4_8bpc_neon:        126.6      79.1      87.7
      w_mask_420_w8_8bpc_neon:        343.9     195.0     211.5
      w_mask_420_w16_8bpc_neon:       886.3     540.3     577.7
      w_mask_420_w32_8bpc_neon:      3558.6    2152.4    2216.7
      w_mask_420_w64_8bpc_neon:      8894.9    5161.2    5297.0
      w_mask_420_w128_8bpc_neon:    22520.1   13514.5   13887.2
      w_mask_422_w4_8bpc_neon:        112.9      68.2      77.0
      w_mask_422_w8_8bpc_neon:        314.4     175.5     208.7
      w_mask_422_w16_8bpc_neon:       835.5     565.0     608.3
      w_mask_422_w32_8bpc_neon:      3381.3    2231.8    2287.6
      w_mask_422_w64_8bpc_neon:      8499.4    5343.6    5460.8
      w_mask_422_w128_8bpc_neon:    21823.3   14206.5   14249.1
      w_mask_444_w4_8bpc_neon:        104.6      65.8      72.7
      w_mask_444_w8_8bpc_neon:        290.4     173.7     196.6
      w_mask_444_w16_8bpc_neon:       831.4     586.7     591.7
      w_mask_444_w32_8bpc_neon:      3320.8    2300.6    2251.0
      w_mask_444_w64_8bpc_neon:      8300.0    5480.5    5346.8
      w_mask_444_w128_8bpc_neon:    21633.8   15981.3   14384.8
      c8aaddea
    • Janne Grunau's avatar
      CI: run a selection of jobs on a node with avx2 · bce8fae9
      Janne Grunau authored
      Switches build-debian (for avx2 checkasm coverage) and test-win64 and
      test-debian-unaligned-stack (for testing asm '%if's).
      Refs #330, #333
      bce8fae9
  4. Mar 04, 2020
    • Henrik Gramner's avatar
      x86: Fix crash in AVX2 cdef_filter with <32-byte stack alignment · 3a6a55d8
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      3a6a55d8
    • Martin Storsjö's avatar
      arm64: mc: NEON implementation of blend for 16bpc · fb348f64
      Martin Storsjö authored
      Checkasm numbers:     Cortex A53     A72     A73
      blend_h_w2_16bpc_neon:     109.3    83.1    56.7
      blend_h_w4_16bpc_neon:     114.1    61.4    62.3
      blend_h_w8_16bpc_neon:     133.3    80.8    81.1
      blend_h_w16_16bpc_neon:    215.6   132.7   149.5
      blend_h_w32_16bpc_neon:    390.4   254.2   235.8
      blend_h_w64_16bpc_neon:    719.1   456.3   453.8
      blend_h_w128_16bpc_neon:  1646.1  1112.3  1065.9
      blend_v_w2_16bpc_neon:     185.9   175.9   180.0
      blend_v_w4_16bpc_neon:     338.0   183.4   232.1
      blend_v_w8_16bpc_neon:     426.5   213.8   250.6
      blend_v_w16_16bpc_neon:    678.2   357.8   382.6
      blend_v_w32_16bpc_neon:   1098.3   686.2   695.6
      blend_w4_16bpc_neon:        75.7    31.5    32.0
      blend_w8_16bpc_neon:       134.0    75.0    75.8
      blend_w16_16bpc_neon:      467.9   267.3   310.0
      blend_w32_16bpc_neon:     1201.9   658.7   779.7
      
      Corresponding numbers for 8bpc for comparison:
      blend_h_w2_8bpc_neon:      104.1    55.9    60.8
      blend_h_w4_8bpc_neon:      108.9    58.7    48.2
      blend_h_w8_8bpc_neon:       99.3    64.4    67.4
      blend_h_w16_8bpc_neon:     145.2    93.4    85.1
      blend_h_w32_8bpc_neon:     262.2   157.5   148.6
      blend_h_w64_8bpc_neon:     466.7   278.9   256.6
      blend_h_w128_8bpc_neon:   1054.2   624.7   571.0
      blend_v_w2_8bpc_neon:      170.5   106.6   113.4
      blend_v_w4_8bpc_neon:      333.0   189.9   225.9
      blend_v_w8_8bpc_neon:      314.9   199.0   203.5
      blend_v_w16_8bpc_neon:     476.9   300.8   241.1
      blend_v_w32_8bpc_neon:     766.9   430.4   415.1
      blend_w4_8bpc_neon:         66.7    35.4    26.0
      blend_w8_8bpc_neon:        110.7    47.9    48.1
      blend_w16_8bpc_neon:       299.4   161.8   162.3
      blend_w32_8bpc_neon:       725.8   417.0   432.8
      fb348f64
    • Martin Storsjö's avatar
      arm: mc: Optimize blend_v · 52e9b435
      Martin Storsjö authored
      Use a post-increment with a register on the last increment, avoiding
      a separate increment. Avoid processing the last 8 pixels in the w32
      case when we only output 24 pixels.
      
      Before:
      ARM32                Cortex A7      A8      A9     A53     A72     A73
      blend_v_w4_8bpc_neon:    450.4   574.7   538.7   374.6   199.3   260.5
      blend_v_w8_8bpc_neon:    559.6   351.3   552.5   357.6   214.8   204.3
      blend_v_w16_8bpc_neon:   926.3   511.6   787.9   593.0   271.0   246.8
      blend_v_w32_8bpc_neon:  1482.5   917.0  1149.5   991.9   354.0   368.9
      ARM64
      blend_v_w4_8bpc_neon:                            351.1   200.0   224.1
      blend_v_w8_8bpc_neon:                            333.0   212.4   203.8
      blend_v_w16_8bpc_neon:                           495.2   302.0   247.0
      blend_v_w32_8bpc_neon:                           840.0   557.8   514.0
      
      After:
      ARM32
      blend_v_w4_8bpc_neon:    435.5   575.8   537.6   356.2   198.3   259.5
      blend_v_w8_8bpc_neon:    545.2   347.9   553.5   339.1   207.8   204.2
      blend_v_w16_8bpc_neon:   913.7   511.0   788.1   573.7   275.4   243.3
      blend_v_w32_8bpc_neon:  1445.3   951.2  1079.1   920.4   352.2   361.6
      ARM64
      blend_v_w4_8bpc_neon:                            333.0   191.3   225.9
      blend_v_w8_8bpc_neon:                            314.9   199.3   203.5
      blend_v_w16_8bpc_neon:                           476.9   301.3   241.1
      blend_v_w32_8bpc_neon:                           766.9   432.8   416.9
      52e9b435
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      arm64: mc: Fix indentation · 48ffb05e
      Martin Storsjö authored
      48ffb05e
    • Martin Storsjö's avatar
      arm64: mc: Use more intuitive lane specifications for loads/stores · 83c62716
      Martin Storsjö authored
      For loads where we load/store a full or half register (instead of
      a lanewise load/store), the lane specification in itself doesn't
      matter, only its size.
      
      This doesn't change the generated code, but makes it more readable.
      83c62716
  5. Mar 03, 2020
  6. Mar 02, 2020
    • Martin Storsjö's avatar
      arm64: loopfilter: NEON implementation of loopfilter for 16 bpc · 360243c2
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Checkasm runtimes:      Cortex A53     A72     A73
      lpf_h_sb_uv_w4_16bpc_neon:   919.0   795.0   714.9
      lpf_h_sb_uv_w6_16bpc_neon:  1267.7  1116.2  1081.9
      lpf_h_sb_y_w4_16bpc_neon:   1500.2  1543.9  1778.5
      lpf_h_sb_y_w8_16bpc_neon:   2216.1  2183.0  2568.1
      lpf_h_sb_y_w16_16bpc_neon:  2641.8  2630.4  2639.4
      lpf_v_sb_uv_w4_16bpc_neon:   836.5   572.7   667.3
      lpf_v_sb_uv_w6_16bpc_neon:  1130.8   709.1   955.5
      lpf_v_sb_y_w4_16bpc_neon:   1271.6  1434.4  1272.1
      lpf_v_sb_y_w8_16bpc_neon:   1818.0  1759.1  1664.6
      lpf_v_sb_y_w16_16bpc_neon:  1998.6  2115.8  1586.6
      
      Corresponding numbers for 8 bpc for comparison:
      lpf_h_sb_uv_w4_8bpc_neon:    799.4   632.8   695.4
      lpf_h_sb_uv_w6_8bpc_neon:   1067.3   613.6   767.5
      lpf_h_sb_y_w4_8bpc_neon:    1490.5  1179.1  1018.9
      lpf_h_sb_y_w8_8bpc_neon:    1892.9  1382.0  1172.0
      lpf_h_sb_y_w16_8bpc_neon:   2117.4  1625.4  1739.0
      lpf_v_sb_uv_w4_8bpc_neon:    447.1   447.7   446.0
      lpf_v_sb_uv_w6_8bpc_neon:    522.1   529.0   513.1
      lpf_v_sb_y_w4_8bpc_neon:    1043.7   785.0   775.9
      lpf_v_sb_y_w8_8bpc_neon:    1500.4  1115.9   881.2
      lpf_v_sb_y_w16_8bpc_neon:   1493.5  1371.4  1248.5
      360243c2
    • Martin Storsjö's avatar
      arm: loopfilter: Prepare for 16 bpc · ebbf91f4
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      ebbf91f4
    • Martin Storsjö's avatar
      arm: loopfilter: Fix a comment · ac492552
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      ac492552
  7. Feb 25, 2020
  8. Feb 24, 2020
  9. Feb 21, 2020
  10. Feb 20, 2020
  11. Feb 18, 2020
  12. Feb 17, 2020
    • Martin Storsjö's avatar
      arm: cdef: Do an 8 bit implementation for cases with all edges present · b33f46e8
      Martin Storsjö authored
      This increases the code size by around 3 KB on arm64.
      
      Before:
      ARM32:                    Cortex A7      A8      A9     A53     A72     A73
      cdef_filter_4x4_8bpc_neon:    807.1   517.0   617.7   506.6   429.9   357.8
      cdef_filter_4x8_8bpc_neon:   1407.9   899.3  1054.6   862.3   726.5   628.1
      cdef_filter_8x8_8bpc_neon:   2394.9  1456.8  1676.8  1461.2  1084.4  1101.2
      ARM64:
      cdef_filter_4x4_8bpc_neon:                            460.7   301.8   308.0
      cdef_filter_4x8_8bpc_neon:                            831.6   547.0   555.2
      cdef_filter_8x8_8bpc_neon:                           1454.6   935.6   960.4
      
      After:
      ARM32:
      cdef_filter_4x4_8bpc_neon:    669.3   541.3   524.4   424.9   322.7   298.1
      cdef_filter_4x8_8bpc_neon:   1159.1   922.9   881.1   709.2   538.3   514.1
      cdef_filter_8x8_8bpc_neon:   1888.8  1285.4  1358.5  1152.9   839.3   871.2
      ARM64:
      cdef_filter_4x4_8bpc_neon:                            383.6   262.1   259.9
      cdef_filter_4x8_8bpc_neon:                            684.9   472.2   464.7
      cdef_filter_8x8_8bpc_neon:                           1160.0   756.8   788.0
      
      (The checkasm benchmark averages three different cases; the fully
      edged case is one of those three, while it's the most common case
      in actual video. The difference is much bigger if only benchmarking
      that particular case.)
      
      This actually apparently makes the code a little bit slower for the w=4
      cases on Cortex A8, while it's a significant speedup on all other cores.
      b33f46e8
    • Martin Storsjö's avatar
      arm32: cdef: Fix a typo for consistency · aff9a210
      Martin Storsjö authored
      The signedness of elements doesn't matter for vsub; match the vsub.i16
      next to it.
      aff9a210
  13. Feb 16, 2020
    • Henrik Gramner's avatar
      cli: Implement line buffering in print_stats() · 09d90658
      Henrik Gramner authored
      Console output is incredibly slow on Windows, which is aggravated by
      the lack of line buffering. As a result, a significant percentage of
      overall runtime is actually spent displaying the decoding progress.
      
      Doing the line buffering manually alleviates most of the issue.
      09d90658
Loading