1. 25 Mar, 2020 7 commits
    • Ronald S. Bultje's avatar
      x86: add AVX2 SIMD for ipred.cfl_ac[444] · 7f2833a9
      Ronald S. Bultje authored
      cfl_ac_444_w4_8bpc_c: 499.1
      cfl_ac_444_w4_8bpc_ssse3: 24.3
      cfl_ac_444_w4_8bpc_avx2: 28.9
      cfl_ac_444_w8_8bpc_c: 1240.2
      cfl_ac_444_w8_8bpc_ssse3: 47.4
      cfl_ac_444_w8_8bpc_avx2: 34.9
      cfl_ac_444_w16_8bpc_c: 1785.7
      cfl_ac_444_w16_8bpc_ssse3: 86.7
      cfl_ac_444_w16_8bpc_avx2: 54.6
      cfl_ac_444_w32_8bpc_c: 4343.5
      cfl_ac_444_w32_8bpc_ssse3: 236.5
      cfl_ac_444_w32_8bpc_avx2: 113.6
      7f2833a9
    • Ronald S. Bultje's avatar
      checkasm: add proper restrictions for h/w_pad in ipred.cfl_ac[444/422] · a02ed9c6
      Ronald S. Bultje authored
      h_pad and w_pad can only be even if ss_ver=0 or ss_hor=0, respectively.
      This means certain special cases don't need to be implemented in SIMD
      while still guaranteeing correct decoding, and thus we don't want to
      test for these special cases in the checkasm test either.
      a02ed9c6
    • Ronald S. Bultje's avatar
      x86: add SSSE3 SIMD for generate_grain_uv_{422,444} · 6b85daf0
      Ronald S. Bultje authored
      gen_grain_uv_ar0_8bpc_420_c: 72275.4
      gen_grain_uv_ar0_8bpc_420_ssse3: 7274.8
      gen_grain_uv_ar0_8bpc_422_c: 111742.9
      gen_grain_uv_ar0_8bpc_422_ssse3: 13724.8
      gen_grain_uv_ar0_8bpc_444_c: 205688.5
      gen_grain_uv_ar0_8bpc_444_ssse3: 26218.3
      gen_grain_uv_ar1_8bpc_420_c: 100682.5
      gen_grain_uv_ar1_8bpc_420_ssse3: 20168.4
      gen_grain_uv_ar1_8bpc_422_c: 167931.4
      gen_grain_uv_ar1_8bpc_422_ssse3: 39524.7
      gen_grain_uv_ar1_8bpc_444_c: 323812.2
      gen_grain_uv_ar1_8bpc_444_ssse3: 77930.3
      gen_grain_uv_ar2_8bpc_420_c: 159545.7
      gen_grain_uv_ar2_8bpc_420_ssse3: 25849.7
      gen_grain_uv_ar2_8bpc_422_c: 295959.9
      gen_grain_uv_ar2_8bpc_422_ssse3: 49286.6
      gen_grain_uv_ar2_8bpc_444_c: 571862.2
      gen_grain_uv_ar2_8bpc_444_ssse3: 98814.2
      gen_grain_uv_ar3_8bpc_420_c: 243445.9
      gen_grain_uv_ar3_8bpc_420_ssse3: 28806.2
      gen_grain_uv_ar3_8bpc_422_c: 458189.9
      gen_grain_uv_ar3_8bpc_422_ssse3: 56629.9
      gen_grain_uv_ar3_8bpc_444_c: 883627.3
      gen_grain_uv_ar3_8bpc_444_ssse3: 114761.2
      
      Also contains slight fixes to generate_grain_uv.ar0 to not pack before
      adding the current grain value. Fixes overflows in e.g. seed=1115072968.
      6b85daf0
    • Ronald S. Bultje's avatar
      x86: add AVX2 SIMD for generate_grain_uv_{422,444} · dab82163
      Ronald S. Bultje authored
      gen_grain_uv_ar0_8bpc_420_c: 72275.4
      gen_grain_uv_ar0_8bpc_420_avx2: 7253.4
      gen_grain_uv_ar0_8bpc_422_c: 111742.9
      gen_grain_uv_ar0_8bpc_422_avx2: 13704.1
      gen_grain_uv_ar0_8bpc_444_c: 205688.5
      gen_grain_uv_ar0_8bpc_444_avx2: 25007.5
      gen_grain_uv_ar1_8bpc_420_c: 100682.5
      gen_grain_uv_ar1_8bpc_420_avx2: 18434.4
      gen_grain_uv_ar1_8bpc_422_c: 167931.4
      gen_grain_uv_ar1_8bpc_422_avx2: 37817.9
      gen_grain_uv_ar1_8bpc_444_c: 323812.2
      gen_grain_uv_ar1_8bpc_444_avx2: 74049.6
      gen_grain_uv_ar2_8bpc_420_c: 159545.7
      gen_grain_uv_ar2_8bpc_420_avx2: 23994.0
      gen_grain_uv_ar2_8bpc_422_c: 295959.9
      gen_grain_uv_ar2_8bpc_422_avx2: 48103.5
      gen_grain_uv_ar2_8bpc_444_c: 571862.2
      gen_grain_uv_ar2_8bpc_444_avx2: 93044.6
      gen_grain_uv_ar3_8bpc_420_c: 243445.9
      gen_grain_uv_ar3_8bpc_420_avx2: 27698.3
      gen_grain_uv_ar3_8bpc_422_c: 458189.9
      gen_grain_uv_ar3_8bpc_422_avx2: 54183.1
      gen_grain_uv_ar3_8bpc_444_c: 883627.3
      gen_grain_uv_ar3_8bpc_444_avx2: 103296.7
      
      Also contains slight fixes to generate_grain_uv.ar0 to not pack before
      adding the current grain value. Fixes overflows in e.g. seed=1115072968.
      dab82163
    • Luc Trudeau's avatar
      Clean up dav1d_ref_create · bf8d6400
      Luc Trudeau authored
      bf8d6400
    • Luc Trudeau's avatar
      const correctness in thread_task · 1aaa5836
      Luc Trudeau authored
      1aaa5836
    • Luc Trudeau's avatar
      Make insert_border src pointer const · 1d3f0266
      Luc Trudeau authored
      1d3f0266
  2. 24 Mar, 2020 6 commits
  3. 22 Mar, 2020 2 commits
  4. 21 Mar, 2020 6 commits
  5. 15 Mar, 2020 1 commit
  6. 08 Mar, 2020 2 commits
  7. 07 Mar, 2020 4 commits
  8. 06 Mar, 2020 6 commits
    • Konstantin Pavlov's avatar
      CI: add examples job build · 55439739
      Konstantin Pavlov authored
      This currently does not check the vulkan/placebo codepath since needed
      packages are not yet in Debian unstable.
      55439739
    • Konstantin Pavlov's avatar
      examples: fail when SDL is not found · e36ebb6f
      Konstantin Pavlov authored
      Now when -Denable_examples=true is requested, meson will fail as
      expected if there is no SDL available.
      e36ebb6f
    • Konstantin Pavlov's avatar
      CI: Add documentation CI job · b8200c13
      Konstantin Pavlov authored
      This requires a docker image with doxygen & dot installed, so bump it as
      well.
      
      Fixes #334.
      b8200c13
    • Konstantin Pavlov's avatar
      CI: Deduplicate and template jobs · bf60f0ab
      Konstantin Pavlov authored
      This makes it much easier to introduce new jobs without copying walls of
      text over and over.  No functional changes.
      
      Changes are:
       - move docker images to common templates to make them easier to bump
       - replace "debian" tag with "docker" to choose runners
       - align meson parameters
       - use variables sections where applicable
       - move test data cache to before_script
      bf60f0ab
    • Konstantin Pavlov's avatar
    • Jan Beich's avatar
      examples: chase cacc8e35 · e04227c5
      Jan Beich authored
      ../examples/dav1dplay.c:1030:5: warning: implicit declaration of function 'init_demuxers' is invalid in C99 [-Wimplicit-function-declaration]
          init_demuxers();
          ^
      /usr/bin/ld.bfd: examples/c590b3c@@dav1dplay@exe/dav1dplay.c.o: in function `decoder_thread_main':
      dav1dplay.c:(.text+0x1243): undefined reference to `init_demuxers'
      cc: error: linker command failed with exit code 1 (use -v to see invocation)
      e04227c5
  9. 05 Mar, 2020 3 commits
    • Jean-Baptiste Kempf's avatar
      Update NEWS for 0.6.0 · efd9e551
      Jean-Baptiste Kempf authored
      efd9e551
    • Martin Storsjö's avatar
      arm64: mc: NEON implementation of w_mask for 16 bpc · c8aaddea
      Martin Storsjö authored
      Checkasm numbers:          Cortex A53       A72       A73
      w_mask_420_w4_16bpc_neon:       173.6     123.5     120.3
      w_mask_420_w8_16bpc_neon:       484.2     344.1     329.5
      w_mask_420_w16_16bpc_neon:     1411.2    1027.4    1035.1
      w_mask_420_w32_16bpc_neon:     5561.5    4093.2    3980.1
      w_mask_420_w64_16bpc_neon:    13809.6    9856.5    9581.0
      w_mask_420_w128_16bpc_neon:   35614.7   25553.8   24284.4
      w_mask_422_w4_16bpc_neon:       159.4     112.2     114.2
      w_mask_422_w8_16bpc_neon:       453.4     326.1     326.7
      w_mask_422_w16_16bpc_neon:     1394.6    1062.3    1050.2
      w_mask_422_w32_16bpc_neon:     5485.8    4219.6    4027.3
      w_mask_422_w64_16bpc_neon:    13701.2   10079.6    9692.6
      w_mask_422_w128_16bpc_neon:   35455.3   25892.5   24625.9
      w_mask_444_w4_16bpc_neon:       153.0     112.3     112.7
      w_mask_444_w8_16bpc_neon:       437.2     331.8     325.8
      w_mask_444_w16_16bpc_neon:     1395.1    1069.1    1041.7
      w_mask_444_w32_16bpc_neon:     5370.1    4213.5    4138.1
      w_mask_444_w64_16bpc_neon:    13482.6   10190.5   10004.6
      w_mask_444_w128_16bpc_neon:   35583.7   26911.2   25638.8
      
      Corresponding numbers for 8 bpc for comparison:
      
      w_mask_420_w4_8bpc_neon:        126.6      79.1      87.7
      w_mask_420_w8_8bpc_neon:        343.9     195.0     211.5
      w_mask_420_w16_8bpc_neon:       886.3     540.3     577.7
      w_mask_420_w32_8bpc_neon:      3558.6    2152.4    2216.7
      w_mask_420_w64_8bpc_neon:      8894.9    5161.2    5297.0
      w_mask_420_w128_8bpc_neon:    22520.1   13514.5   13887.2
      w_mask_422_w4_8bpc_neon:        112.9      68.2      77.0
      w_mask_422_w8_8bpc_neon:        314.4     175.5     208.7
      w_mask_422_w16_8bpc_neon:       835.5     565.0     608.3
      w_mask_422_w32_8bpc_neon:      3381.3    2231.8    2287.6
      w_mask_422_w64_8bpc_neon:      8499.4    5343.6    5460.8
      w_mask_422_w128_8bpc_neon:    21823.3   14206.5   14249.1
      w_mask_444_w4_8bpc_neon:        104.6      65.8      72.7
      w_mask_444_w8_8bpc_neon:        290.4     173.7     196.6
      w_mask_444_w16_8bpc_neon:       831.4     586.7     591.7
      w_mask_444_w32_8bpc_neon:      3320.8    2300.6    2251.0
      w_mask_444_w64_8bpc_neon:      8300.0    5480.5    5346.8
      w_mask_444_w128_8bpc_neon:    21633.8   15981.3   14384.8
      c8aaddea
    • Janne Grunau's avatar
      CI: run a selection of jobs on a node with avx2 · bce8fae9
      Janne Grunau authored
      Switches build-debian (for avx2 checkasm coverage) and test-win64 and
      test-debian-unaligned-stack (for testing asm '%if's).
      Refs #330, #333
      bce8fae9
  10. 04 Mar, 2020 3 commits
    • Henrik Gramner's avatar
    • Martin Storsjö's avatar
      arm64: mc: NEON implementation of blend for 16bpc · fb348f64
      Martin Storsjö authored
      Checkasm numbers:     Cortex A53     A72     A73
      blend_h_w2_16bpc_neon:     109.3    83.1    56.7
      blend_h_w4_16bpc_neon:     114.1    61.4    62.3
      blend_h_w8_16bpc_neon:     133.3    80.8    81.1
      blend_h_w16_16bpc_neon:    215.6   132.7   149.5
      blend_h_w32_16bpc_neon:    390.4   254.2   235.8
      blend_h_w64_16bpc_neon:    719.1   456.3   453.8
      blend_h_w128_16bpc_neon:  1646.1  1112.3  1065.9
      blend_v_w2_16bpc_neon:     185.9   175.9   180.0
      blend_v_w4_16bpc_neon:     338.0   183.4   232.1
      blend_v_w8_16bpc_neon:     426.5   213.8   250.6
      blend_v_w16_16bpc_neon:    678.2   357.8   382.6
      blend_v_w32_16bpc_neon:   1098.3   686.2   695.6
      blend_w4_16bpc_neon:        75.7    31.5    32.0
      blend_w8_16bpc_neon:       134.0    75.0    75.8
      blend_w16_16bpc_neon:      467.9   267.3   310.0
      blend_w32_16bpc_neon:     1201.9   658.7   779.7
      
      Corresponding numbers for 8bpc for comparison:
      blend_h_w2_8bpc_neon:      104.1    55.9    60.8
      blend_h_w4_8bpc_neon:      108.9    58.7    48.2
      blend_h_w8_8bpc_neon:       99.3    64.4    67.4
      blend_h_w16_8bpc_neon:     145.2    93.4    85.1
      blend_h_w32_8bpc_neon:     262.2   157.5   148.6
      blend_h_w64_8bpc_neon:     466.7   278.9   256.6
      blend_h_w128_8bpc_neon:   1054.2   624.7   571.0
      blend_v_w2_8bpc_neon:      170.5   106.6   113.4
      blend_v_w4_8bpc_neon:      333.0   189.9   225.9
      blend_v_w8_8bpc_neon:      314.9   199.0   203.5
      blend_v_w16_8bpc_neon:     476.9   300.8   241.1
      blend_v_w32_8bpc_neon:     766.9   430.4   415.1
      blend_w4_8bpc_neon:         66.7    35.4    26.0
      blend_w8_8bpc_neon:        110.7    47.9    48.1
      blend_w16_8bpc_neon:       299.4   161.8   162.3
      blend_w32_8bpc_neon:       725.8   417.0   432.8
      fb348f64
    • Martin Storsjö's avatar
      arm: mc: Optimize blend_v · 52e9b435
      Martin Storsjö authored
      Use a post-increment with a register on the last increment, avoiding
      a separate increment. Avoid processing the last 8 pixels in the w32
      case when we only output 24 pixels.
      
      Before:
      ARM32                Cortex A7      A8      A9     A53     A72     A73
      blend_v_w4_8bpc_neon:    450.4   574.7   538.7   374.6   199.3   260.5
      blend_v_w8_8bpc_neon:    559.6   351.3   552.5   357.6   214.8   204.3
      blend_v_w16_8bpc_neon:   926.3   511.6   787.9   593.0   271.0   246.8
      blend_v_w32_8bpc_neon:  1482.5   917.0  1149.5   991.9   354.0   368.9
      ARM64
      blend_v_w4_8bpc_neon:                            351.1   200.0   224.1
      blend_v_w8_8bpc_neon:                            333.0   212.4   203.8
      blend_v_w16_8bpc_neon:                           495.2   302.0   247.0
      blend_v_w32_8bpc_neon:                           840.0   557.8   514.0
      
      After:
      ARM32
      blend_v_w4_8bpc_neon:    435.5   575.8   537.6   356.2   198.3   259.5
      blend_v_w8_8bpc_neon:    545.2   347.9   553.5   339.1   207.8   204.2
      blend_v_w16_8bpc_neon:   913.7   511.0   788.1   573.7   275.4   243.3
      blend_v_w32_8bpc_neon:  1445.3   951.2  1079.1   920.4   352.2   361.6
      ARM64
      blend_v_w4_8bpc_neon:                            333.0   191.3   225.9
      blend_v_w8_8bpc_neon:                            314.9   199.3   203.5
      blend_v_w16_8bpc_neon:                           476.9   301.3   241.1
      blend_v_w32_8bpc_neon:                           766.9   432.8   416.9
      52e9b435