1. 16 Apr, 2021 1 commit
    • James Almer's avatar
      dav1d: add event flags to the decoding process · a98f5e60
      James Almer authored
      And a function to fetch them. Should be useful to signal changes in the
      bitstream the user may want to know about.
      
      Starting with two flags, DAV1D_EVENT_FLAG_NEW_SEQUENCE and
      DAV1D_EVENT_FLAG_NEW_OP_PARAMS_INFO, which signal the presence of an updated
      sequence header in the last returned (or to be returned) picture.
      a98f5e60
  2. 14 Apr, 2021 5 commits
  3. 12 Apr, 2021 1 commit
  4. 16 Mar, 2021 1 commit
  5. 15 Mar, 2021 1 commit
  6. 07 Mar, 2021 1 commit
  7. 21 Feb, 2021 1 commit
  8. 19 Feb, 2021 9 commits
  9. 17 Feb, 2021 3 commits
  10. 16 Feb, 2021 2 commits
  11. 15 Feb, 2021 5 commits
  12. 13 Feb, 2021 1 commit
  13. 12 Feb, 2021 3 commits
  14. 11 Feb, 2021 5 commits
    • Emmanuel Gil Peyrot's avatar
      Set thread names on Haiku · b44ec453
      Emmanuel Gil Peyrot authored
      b44ec453
    • Henrik Gramner's avatar
      x86: Rewrite SGR AVX2 asm · fe2bb774
      Henrik Gramner authored
      The previous implementation did multiple passes in the horizontal
      and vertical directions, with the intermediate values being stored
      in buffers on the stack. This caused bad cache thrashing.
      
      By interleaving the all the different passes in combination with a
      ring buffer for storing only a few rows at a time the performance
      is improved by a significant amount.
      
      Also slightly speed up neighbor calculations by packing the a and b
      values into a single 32-bit unsigned integer which allows calculations
      on both values simultaneously.
      fe2bb774
    • Henrik Gramner's avatar
      Add minor SGR optimizations · c290c02e
      Henrik Gramner authored
      Split the 5x5, 3x3, and mix cases into separate functions.
      
      Shrink some tables.
      
      Move some scalar calculations out of the DSP function.
      
      Make Wiener and SGR share the same function prototype to
      eliminate a branch in lr_stripe().
      c290c02e
    • Henrik Gramner's avatar
      x86inc: Add stack probing on Windows · c36b191a
      Henrik Gramner authored
      Large stack allocations on Windows need to use stack probing in order
      to guarantee that all stack memory is committed before accessing it.
      This is done by ensuring that the guard page(s) at the end of the
      currently committed pages are touched prior to any pages beyond that.
      c36b191a
    • Emmanuel Gil Peyrot's avatar
      dav1dplay: Add -lm for llround() support · 58cb4cf0
      Emmanuel Gil Peyrot authored
      Neither --buildtype=plain nor --buildtype=debug set -ffast-math, so
      llround() is kept as a function call and isn’t optimised out into
      cvttsd2siq (on amd64), thus requiring the math lib to be linked.
      
      Note that even with -ffast-math, it isn’t guaranteed that a call to
      llround() will always be omitted (I have reproduced this on PowerPC), so
      this fix is correct even if we ever decide to enable -ffast-math in
      other build types.
      58cb4cf0
  15. 10 Feb, 2021 1 commit
    • Martin Storsjö's avatar
      arm64: itx16: Use usqadd to avoid separate clamping of negative values · 6f9f3391
      Martin Storsjö authored
      Before:                                Cortex A53     A72      A73
      inv_txfm_add_4x4_dct_dct_0_10bpc_neon:       40.7    23.0     24.0
      inv_txfm_add_4x4_dct_dct_1_10bpc_neon:      116.0    71.5     78.2
      inv_txfm_add_8x8_dct_dct_0_10bpc_neon:       85.7    50.7     53.8
      inv_txfm_add_8x8_dct_dct_1_10bpc_neon:      287.0   203.5    215.2
      inv_txfm_add_16x16_dct_dct_0_10bpc_neon:    255.7   129.1    140.4
      inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   1401.4  1026.7   1039.2
      inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   1913.2  1407.3   1479.6
      After:
      inv_txfm_add_4x4_dct_dct_0_10bpc_neon:       38.7    21.5     22.2
      inv_txfm_add_4x4_dct_dct_1_10bpc_neon:      116.0    71.3     77.2
      inv_txfm_add_8x8_dct_dct_0_10bpc_neon:       76.7    44.7     43.5
      inv_txfm_add_8x8_dct_dct_1_10bpc_neon:      278.0   203.0    203.9
      inv_txfm_add_16x16_dct_dct_0_10bpc_neon:    236.9   106.2    116.2
      inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   1368.7   999.7   1008.4
      inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   1880.5  1381.2   1459.4
      6f9f3391