1. 23 Nov, 2020 1 commit
  2. 22 Nov, 2020 1 commit
    • Henrik Gramner's avatar
      Add more buffer pools · 236e1122
      Henrik Gramner authored
      Add buffer pools for miscellaneous smaller buffers that are
      repeatedly being freed and reallocated.
      
      Also improve dav1d_ref_create() by consolidating two separate
      memory allocations into a single one.
      236e1122
  3. 20 Nov, 2020 5 commits
    • Martin Storsjö's avatar
      arm32: mc: NEON implementation of warp8x8 for 16 bpc · dc98fff8
      Martin Storsjö authored
      Checkasm benchmarks:
                          Cortex A7      A8     A53     A72     A73
      warp_8x8_16bpc_neon:   4062.6  2109.4  2462.0  1338.9  1391.1
      warp_8x8t_16bpc_neon:  3996.3  2102.4  2412.0  1273.8  1368.9
      
      Corresponding numbers for arm64, for comparison:
                                         Cortex A53     A72     A73
      warp_8x8_16bpc_neon:                   2037.0  1148.8  1222.0
      warp_8x8t_16bpc_neon:                  2008.0  1120.4  1200.9
      dc98fff8
    • Martin Storsjö's avatar
      arm32: cdef: Add NEON implementations of CDEF for 16 bpc · 018e64e7
      Martin Storsjö authored
      Use a shared template file for assembly functions that can be
      templated into 8 and 16 bpc forms, just like in the arm64 version.
      
      Checkasm benchmarks:
                                Cortex A7      A8     A53     A72     A73
      cdef_dir_16bpc_neon:          975.9   853.2   555.2   378.7   386.9
      cdef_filter_4x4_16bpc_neon:   746.9   521.7   481.2   333.0   340.8
      cdef_filter_4x8_16bpc_neon:  1300.0   885.5   816.3   582.7   599.5
      cdef_filter_8x8_16bpc_neon:  2282.5  1415.0  1417.6  1059.0  1076.3
      
      Corresponding numbers for arm64, for comparison:
                                               Cortex A53     A72     A73
      cdef_dir_16bpc_neon:                          418.0   306.7   310.7
      cdef_filter_4x4_16bpc_neon:                   453.4   282.9   297.4
      cdef_filter_4x8_16bpc_neon:                   807.5   514.2   533.8
      cdef_filter_8x8_16bpc_neon:                  1425.2   924.4   942.0
      018e64e7
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      arm64: cdef: Fix a comment typo · c48ea15f
      Martin Storsjö authored
      c48ea15f
    • Matthias Dressel's avatar
      Update THANKS.md · ba875b96
      Matthias Dressel authored
      ba875b96
  4. 18 Nov, 2020 1 commit
  5. 17 Nov, 2020 1 commit
  6. 16 Nov, 2020 3 commits
    • Henrik Gramner's avatar
      Add a picture buffer pool · 9057d286
      Henrik Gramner authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Reuse buffers allocated for picture data instead of constantly
      freeing and allocating new ones.
      
      The impact of this can vary significantly between different systems,
      in particular it's highly beneficial on Windows where it can result
      in an overall performance increase of up to 10% in some cases.
      9057d286
    • Martin Storsjö's avatar
      meson: Handle the b_lto option as a string option for newer meson versions · 920079ed
      Martin Storsjö authored
      Since 3e6fbde94c1cb8d4e01b7daf0282c315ff0e6c7d in meson (past the
      0.56 release), the b_lto option was changed from a bool to a
      tristate option (false/true/thin).
      
      One could just compare the b_lto option against 'false', but that
      causes warnings on older meson versions (on all existing releases).
      920079ed
    • Luc Trudeau's avatar
      use less memory in SGR C code · bcebc7bd
      Luc Trudeau authored
      bcebc7bd
  7. 07 Nov, 2020 1 commit
    • oddstone's avatar
      Fix variable name · ffd052bd
      oddstone authored
      The first index to task_idx_to_sby_and_tile_idx is task_idx not tile_idx
      ffd052bd
  8. 21 Oct, 2020 1 commit
  9. 02 Oct, 2020 1 commit
  10. 01 Oct, 2020 1 commit
  11. 27 Sep, 2020 2 commits
  12. 24 Sep, 2020 7 commits
  13. 20 Sep, 2020 4 commits
  14. 17 Sep, 2020 1 commit
  15. 15 Sep, 2020 1 commit
    • Wan-Teh Chang's avatar
      Ban op->idc that may drop all layer-specific OBUs · 50e876c6
      Wan-Teh Chang authored
      If c->operating_point_idc is nonzero and either bits 0-7 or bits 8-11 in
      it are all 0s, it will cause dav1d_parse_obus() to drop all
      layer-specific OBUs. Prohibit any op->idc with such properties because
      it could be selected as c->operating_point_idc.
      50e876c6
  16. 06 Sep, 2020 1 commit
  17. 03 Sep, 2020 6 commits
    • Martin Storsjö's avatar
      arm32: mc: NEON implementation of put/prep 8tap/bilin for 16 bpc · 856662b4
      Martin Storsjö authored
      Examples of checkasm benchmarks:
                                        Cortex A7      A8      A9     A53     A72     A73
      mc_8tap_regular_w8_0_16bpc_neon:      158.7   106.2   167.0   127.9    55.0    77.2
      mc_8tap_regular_w8_h_16bpc_neon:     1000.8   557.5   749.2   609.2   401.4   485.4
      mc_8tap_regular_w8_hv_16bpc_neon:    2278.9  1255.4  1352.5  1277.2   867.8   915.9
      mc_8tap_regular_w8_v_16bpc_neon:     1060.0   393.6   485.5   448.3   298.0   298.2
      mc_bilinear_w8_0_16bpc_neon:          159.7    96.6   161.1   123.7    55.4    74.7
      mc_bilinear_w8_h_16bpc_neon:          342.3   250.8   352.9   239.0   158.4   165.1
      mc_bilinear_w8_hv_16bpc_neon:         587.7   373.8   469.0   339.8   244.4   247.5
      mc_bilinear_w8_v_16bpc_neon:          285.8   189.3   284.9   180.4   103.4   100.9
      mct_8tap_regular_w8_0_16bpc_neon:     233.0   136.6   229.3   169.3    86.2    98.3
      mct_8tap_regular_w8_h_16bpc_neon:    1106.8   588.3   817.9   654.1   406.4   489.8
      mct_8tap_regular_w8_hv_16bpc_neon:   2473.3  1326.3  1428.2  1373.7   903.3   951.1
      mct_8tap_regular_w8_v_16bpc_neon:    1266.0   474.1   581.3   505.9   382.0   373.4
      mct_bilinear_w8_0_16bpc_neon:         232.9   126.2   225.0   166.3    86.2    91.7
      mct_bilinear_w8_h_16bpc_neon:         380.6   270.6   386.0   259.7   154.1   151.9
      mct_bilinear_w8_hv_16bpc_neon:        631.4   409.2   509.4   372.1   243.1   244.1
      mct_bilinear_w8_v_16bpc_neon:         349.5   233.5   347.9   212.4   138.7   138.4
      
      For comparison, the corresponding numbers for the existing arm64
      implementation:
      
                                                               Cortex A53     A72     A73
      mc_8tap_regular_w8_0_16bpc_neon:                               94.1    48.9    62.3
      mc_8tap_regular_w8_h_16bpc_neon:                              570.4   388.1   467.3
      mc_8tap_regular_w8_hv_16bpc_neon:                            1035.8   775.0   891.2
      mc_8tap_regular_w8_v_16bpc_neon:                              399.8   284.5   278.2
      mc_bilinear_w8_0_16bpc_neon:                                   90.0    44.3    57.4
      mc_bilinear_w8_h_16bpc_neon:                                  191.7   158.7   156.4
      mc_bilinear_w8_hv_16bpc_neon:                                 295.6   235.0   244.9
      mc_bilinear_w8_v_16bpc_neon:                                  147.2    99.0    88.8
      mct_8tap_regular_w8_0_16bpc_neon:                             139.4    78.4    84.9
      mct_8tap_regular_w8_h_16bpc_neon:                             612.3   395.9   478.6
      mct_8tap_regular_w8_hv_16bpc_neon:                           1113.0   804.3   963.5
      mct_8tap_regular_w8_v_16bpc_neon:                             462.1   370.8   353.3
      mct_bilinear_w8_0_16bpc_neon:                                 135.6    77.0    80.5
      mct_bilinear_w8_h_16bpc_neon:                                 210.8   159.2   141.7
      mct_bilinear_w8_hv_16bpc_neon:                                325.7   238.4   227.3
      mct_bilinear_w8_v_16bpc_neon:                                 180.7   136.7   129.5
      856662b4
    • Martin Storsjö's avatar
      arm64: mc: Apply tuning from w4/w8 case to w2 case in 16 bpc 8tap_hv · 4ae3f5f7
      Martin Storsjö authored
      Narrowing the intermediates from the horizontal pass is beneficial
      (on most cores, but a small slowdown on A53) here as well. This
      increases consistency in the code between the cases.
      
      (The corresponding change in the upcoming arm32 version is beneficial
      on all tested cores except for on A53 - it helps, on some cores a lot,
      on A7, A8, A9, A72, A73 and only makes it marginally slower on A53.)
      
      Before:                        Cortex A53     A72     A73
      mc_8tap_regular_w2_hv_16bpc_neon:   457.7   301.0   317.1
      After:
      mc_8tap_regular_w2_hv_16bpc_neon:   472.0   276.0   284.3
      4ae3f5f7
    • Martin Storsjö's avatar
      arm: mc: Avoid an unnecessary mov in 8tap_hv w2 · 65a1aafd
      Martin Storsjö authored
      This matches how the same logic is written for w4 and above.
      65a1aafd
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      arm32: mc: Use narrower vext.8 in 8tap_w4_h · ea7e13e7
      Martin Storsjö authored
      The previous form was a leftover from how it had to be written on
      aarch64.
      ea7e13e7
    • Martin Storsjö's avatar
      arm64: mc: Use more descriptive element specifiers for loads/stores in 16 bpc put_neon · 13fad75d
      Martin Storsjö authored
      For loads of a half/full register, the actual size of the elements
      doesn't matter, but it makes the code more readable and understandable.
      13fad75d
  18. 01 Sep, 2020 1 commit
    • Henrik Gramner's avatar
      cli: Use proper integer math in Y4M PAR calculations · 3bfe8c7c
      Henrik Gramner authored
      The previous floating-point implementation produced results that were
      sometimes slightly off due to rounding errors.
      
      For example, a frame size of 432x240 with a render size of 176x240
      previously resulted in a PAR of 98:240 instead of the correct 11:27.
      
      Also reduce fractions to produce more readable numbers.
      3bfe8c7c
  19. 30 Aug, 2020 1 commit