Skip to content
Snippets Groups Projects
  1. Feb 13, 2023
  2. Feb 10, 2023
    • Victorien Le Couviour--Tuffet's avatar
      drain: Properly fix a desync between next and first · 9b4b2448
      Victorien Le Couviour--Tuffet authored
      The code in dav1d_drain_picture could result in a desync between
      c->task_thread.first (oldest submitted frame) and c->frame_thread.next (first
      frame to retrieve and/or next submit location).
      As we loop through drain, we always increment next, but first only if the
      frame has data. If the frame is visible we return. The problem arises when
      encountering (an) invisible frame(s), and the next entries haven't been fed
      yet, we then keep on looping increasing next but not first, as these have no
      data.
      
      We should always return when we encountered data (visible or
      invisible decoded frame): for visible, the code already returns, for
      invisible, we can store a boolean indicating we drained at least one frame,
      whenever we reach an empty entry after that, we return (all subsequent
      entries are guaranteed to be empty anyway), not incrementing next nor first.
      This will have the effect to insert the next frame at the first free spot
      (which is much better than the weird skips it's doing now).
      
      So basically, c->frame_thread.next could skip some (empty) entries.
      Now it's contiguous.
      
      Fixes #416.
      9b4b2448
  3. Feb 09, 2023
    • Victorien Le Couviour--Tuffet's avatar
      Revert "Fix mismatch between first and next in drain" · 3f19ece6
      Victorien Le Couviour--Tuffet authored
      This reverts commit a51b6ce4.
      
      We can't increment first when no data is there, otherwise we might do it
      while the first frame was not yet decoded, messing up ordering: imagine
      having a framedelay of 8, and a file with 7 frames. We feed 7 frames over 8
      slots, now next points to [7] (empty entry), and we start draining cause EOF.
      We do need next to be incremented to reach the first frame ([0]), so it can
      be outputted, and only then first too.
      
      Fixes #418.
      3f19ece6
  4. Feb 03, 2023
  5. Jan 31, 2023
    • Martin Storsjö's avatar
      checkasm: Add an --affinity= option for selecting a CPU core · 77b39555
      Martin Storsjö authored
      Add an option for selecting the core where the single thread of
      checkasm runs. This allows benchmarking on specific CPU cores on
      heterogenous CPUs, like ARM big.LITTLE configurations.
      
      On Linux, one can easily wrap an invocation of checkasm with
      "taskset -c <n> [...]" - so this option isn't very essential
      there - however it is quite useful on Windows.
      
      On Windows, it is somewhat possible to do the same by launching
      the tool with "start /B /affinity <hexmask> [...]", but that
      doesn't work well with scripting ("start" returns before the
      command has finished running, and it's not obvious how to
      invoke "start" from within WSL).
      
      Using "taskset" to launch processes on specific cores within WSL
      on Windows doesn't work - regardless of the Linux level affinity,
      the process ends up running on the performance cores anyway.
      77b39555
    • Martin Storsjö's avatar
      arm64: ipred: 8 bpc NEON implementation of the Z3 function · 99956c73
      Martin Storsjö authored
      The implementation is a hybrid between two approaches; one generic
      (but non-ideal) for cases with large max_base_y, which fills two
      pixel columns at a time, i.e. looping over pixels first vertically,
      then horizontally - i.e. in a non-optimal manner.
      
      For cases with smaller max_base_y, it does two rows at a time, essentially
      doing gathers with the TBX instruction.
      
      Relative speedup over the C code:
      
                               Cortex A53    A55    A72    A73    A76   Apple M1
      intra_pred_z3_w4_8bpc_neon:    3.32   2.89   2.78   3.52   2.52   9.67
      intra_pred_z3_w8_8bpc_neon:    6.24   5.55   4.76   5.60   4.11   6.40
      intra_pred_z3_w16_8bpc_neon:   7.64   7.07   4.37   6.23   4.18   8.60
      intra_pred_z3_w32_8bpc_neon:   7.51   7.21   4.34   5.92   4.27   7.88
      intra_pred_z3_w64_8bpc_neon:   6.82   6.25   4.08   5.83   3.52   7.31
      99956c73
  6. Jan 27, 2023
    • Martin Storsjö's avatar
      arm64: ipred: 8 bpc NEON implementation of the Z1 function · fd4f348e
      Martin Storsjö authored
      Relative speedup over the C code:
      
                               Cortex A53    A55    A72    A73    A76  Apple M1
      intra_pred_z1_w4_8bpc_neon:    4.09   3.15   3.63   4.16   3.27  13.00
      intra_pred_z1_w8_8bpc_neon:    6.93   5.66   5.57   6.76   5.51   5.50
      intra_pred_z1_w16_8bpc_neon:   7.81   6.85   6.24   7.78   6.59   9.00
      intra_pred_z1_w32_8bpc_neon:  10.56   9.95   8.72  10.95   8.28  13.33
      intra_pred_z1_w64_8bpc_neon:  11.00  11.38   9.11  11.62   8.65  14.61
      
      (The speedup numbers for M1 are kinda noisy due to the very coarse
      granularity of the timer used there.)
      fd4f348e
    • Martin Storsjö's avatar
      checkasm: ipred: Iterate 5 times for each Z1/Z2/Z3 function · 2e990b37
      Martin Storsjö authored
      These functions contain a number of different codepaths; try to
      make sure that we hit most codepaths for each size combination.
      
      This both gives better test coverage in one single run of checkasm,
      but also should give a better averaged runtime in benchmarks.
      2e990b37
  7. Jan 26, 2023
  8. Jan 12, 2023
  9. Dec 14, 2022
  10. Dec 13, 2022
  11. Dec 09, 2022
  12. Dec 04, 2022
  13. Nov 21, 2022
  14. Nov 10, 2022
  15. Oct 30, 2022
  16. Oct 27, 2022
  17. Oct 26, 2022
  18. Oct 20, 2022
    • Victorien Le Couviour--Tuffet's avatar
      threading: Fix a race around frame completion (frame-mt) · 3e7886db
      Victorien Le Couviour--Tuffet authored
      The completion of the first frame to decode while an async reset
      request on that same frame is pending will render it stale. The
      processing of such a stale request is likely to result in a hang.
      
      One reason this happens is the skip condition at the beginning of
      reset_task_cur().
      => Consume the async request before that check.
      
      Another reason is several threads producing async reset requests in
      parallel: an async request for the first frame could cascade through the
      other threads (other frames) during completion of that frame, meaning
      not being caught by the last synchronous reset_task_cur() after
      signaling the main thread and before releasing the lock.
      => To solve this we need to add protections at the racy locations. That
      means after we increase first, before returning from
      reset_task_cur_async(), and after consuming the async request.
      3e7886db
  19. Oct 10, 2022
    • Sebastian Dröge's avatar
      Handle host_machine.system() 'ios' and 'tvos' the same way as 'darwin' · 5b07b425
      Sebastian Dröge authored
      Despite not being documented in Meson's list of canonical system names,
      Meson does accept 'ios' mostly a synonym for darwin.
      
      By using 'ios' instead of darwin, it allows distinguishing between the
      two in the cases where that is necessary. Therefore, within dav1d, allow
      using the 'ios' name as alias for 'darwin' for system name, to allow
      using cross files that does this distinction.
      
      meson itself also allows 'tvos' in addition to 'ios' in the internal
      `is_darwin()` function, as such all 3 are handled the same here.
      5b07b425
  20. Sep 30, 2022
  21. Sep 28, 2022
  22. Sep 26, 2022
Loading