1. 15 Sep, 2022 9 commits
    • Martin Storsjö's avatar
      Don't use gas-preprocessor with clang-cl for arm targets · cc9651f5
      Martin Storsjö authored
      Since meson 0.58.0 (released in May 2021), meson accepts adding '.S'
      assembly files as source files to the clang-cl compiler.
      If using an older version of meson, keep using gas-preprocessor
      just like for MSVC builds.
    • David Conrad's avatar
      Fix checking the reference dimesions for the projection process · d4a2b75d
      David Conrad authored
      Section 7.9.2 returns 0 "If RefMiRows[ srcIdx ] is not equal to MiRows,
      RefMiCols[ srcIdx ] is not equal to MiCols"
      dav1d was comparing pixel width/height, not block width/height,
      so conform with the spec
    • David Conrad's avatar
      Fix calculation of OBMC lap dimensions · eb25f00c
      David Conrad authored
      Individual OBMC lapped predictions have a max width of 64 pixels
      for the top lap and have a max height of 64 for the left laps
      This is Overlapped motion compensation process
      step4 = Clip3( 2, 16, Num_4x4_Blocks_Wide[ candSz ] )
      dav1d wasn't clipping this as needed, which means that with scaled MC, the
      interpolation of the 2nd half of a 128 block was incorrect, since mx/my
      for subpel filter selection need to be reset at the 64 pixel boundary
    • David Conrad's avatar
      Support film grain application whose only effect is clipping to video range · 10f5ce54
      David Conrad authored
      This is the parameter combination:
      num_y_points == 0 && num_cb_points == 0 && num_cr_points == 0 &&
      chroma_scaling_from_luma == 1 && clip_to_restricted_range == 1
      Film grain application has two effects: adding noise, and optionally
      clipping to video range
      For luma, the spec skips film grain application if there's no noise
      (num_y_points == 0), but for chroma, it's only skipped if there's no
      chroma noise *and* chroma_scaling_from_luma is false
      This means it's possible for there to be no noise (num_*_points = 0), but
      if clip_to_restricted_range is true then chroma pixels can be clipped to
      video range, if chroma_scaling_from_luma is true. Luma pixels, however,
      aren't clipped to video range unless there's noise to apply.
      dav1d currently skips applying film grain entirely if there is no noise,
      regardless of the secondary clipping.
    • David Conrad's avatar
      Ignore T.35 metadata if the OBU contains no payload · 673ee248
      David Conrad authored
      The syntax of itu_t_t35_payload_bytes is not defined in the AV1
      specification, but it does state that decoders should ignore the
      entire OBU if they do not understand it.
    • David Conrad's avatar
      Fix chroma deblock filter size calculation for lossless · 2152826b
      David Conrad authored
      In section 5.11.34 txSz is always defined to TX_4X4 if Lossless is true
      Chroma deblock filter size calculation needs to use this overridden txSz
      when lossless is enabled
    • David Conrad's avatar
      Fix rounding in the calculation of initialSubpelX · e202fa08
      David Conrad authored
      The spec divides err by two, rounding to 0, instead of >>1,
      which rounds towards negative infinity
    • David Conrad's avatar
      Fix overflow when saturating dequantized coefficients clipped to 0 · ee98592b
      David Conrad authored
      It's possible to encode a large coefficient that becomes 0 after
      the clipping in dequant (Abs( dq ) & 0xFFFFFF), e.g. 0x1000000
      After that &0xFFFFFF, coeffs are saturated in the range of
      [-(1 << (bitdepth+7)), 1 << (bitdepth+7))
      dav1d implements this saturation via umin(dq - sign, cf_max), then applies
      the sign afterwards via xor. However, for dq = 0 and sign = 1, this step
      evaulates to umin(UINT_MAX, cf_max) == cf_max instead of the expected 0.
      So instead, do unsigned saturate as umin(dq, cf_max + sign),
      then apply sign via (sign ? -dq : dq)
      On arm this is the same number of instructions, since cneg exists and is used
      On x86 this requires an additional instruction, but this isn't a
      latency-critical path
    • David Conrad's avatar
      Fix overflow in 8-bit NEON ADST · 1bdb776c
      David Conrad authored
      In 8-bit adst, it's possible that the final Round2(x[0], 12) can exceed
      16-bits signed
      Specifically, in Inverse ADST4 process, the precision requirement is:
      "It is a requirement of bitstream conformance that all values stored in the
      s and x arrays by this process are representable by a signed integer using
      r + 12 bits of precision."
      For 8 bits, r is 16 for both row and column, so x[] can be 28-bit signed.
      For values [134215680, 134217727] (within 2047 of the maximum 28-bit value),
      the final Round2(x[0], 12) evaluates to 32768, exceeding 16-bits signed.
      So switch to using sqrshrn, which saturates to 16-bits signed
      This is a continuation of: Commit b53ff29d
      arm: itx: Do clipping in all narrowing downshifts
  2. 14 Sep, 2022 1 commit
    • Martin Storsjö's avatar
      tools: Allocate the priv structs with proper alignment · 08c70801
      Martin Storsjö authored
      Previously, they could be allocated with any random alignment
      matching the end of the MuxerContext/DemuxerContext. The
      priv structs themselves can have members that require specific
      alignment, or at least the default alignment of malloc()/calloc()
      (which is sufficient for native types such as uint64_t and
      This fixes crashes in some arm builds, where GCC (correctly) wants
      to use 64 bit aligned stores to write to MD5Context.
  3. 12 Sep, 2022 1 commit
  4. 10 Sep, 2022 1 commit
  5. 09 Sep, 2022 6 commits
  6. 08 Sep, 2022 2 commits
  7. 07 Sep, 2022 2 commits
  8. 02 Sep, 2022 4 commits
  9. 30 Aug, 2022 1 commit
  10. 19 Aug, 2022 1 commit
  11. 25 Jul, 2022 1 commit
    • Henrik Gramner's avatar
      Adjust inlining attributes on some functions · a029d689
      Henrik Gramner authored
      The code size increase of inlining every call to certain functions
      isn't a worthwhile trade-off, and most compilers actually ends up
      overriding those particular inlining hints anyway.
      In some cases it's also better to split the function into separate
      luma and chroma functions.
  12. 19 Jul, 2022 1 commit
  13. 13 Jul, 2022 1 commit
  14. 11 Jul, 2022 1 commit
    • David Conrad's avatar
      Don't trash the return stack buffer in the NEON loop filter · d503bb0c
      David Conrad authored
      The NEON loop filter's innermost asm function can return to a different
      location than the address that called it. This messes up the return stack
      predictor, causing returns to be mispredicted
      Instead, rework the function to always return to the address that calls it,
      and instead return the information needed for the caller to short-circuit
      storing pixels
  15. 06 Jul, 2022 3 commits
    • Konstantin Pavlov's avatar
      CI: Removed snap package generation · 79bc755d
      Konstantin Pavlov authored and Henrik Gramner's avatar Henrik Gramner committed
      snapcraft version we use is no longer compatible with authentication
      schemes snap store uses.  This could be fixed by updating the snapcraft
      inside the docker image, but Ubuntu no longer ships an up to date
      snapcraft version in their own repositories.  The other way to install
      snapcraft is to manually fetch the project and core snaps just like we
      do in https://code.videolan.org/videolan/docker-images/-/blob/master/vlc-ubuntu-focal/Dockerfile,
      but that currently fails on Jammy due to conflict in Python versions
      between what is shipped in Jammy and inside snapcraft project.
      All in all, it seems snapcraft seems to be abandoned for our CI
      use-case, and the usefulness of dav1d snap is disputable, so just drop
      it altogether.  Packaging is still available in package/snap/ for the
      brave souls who want to build it on their own.
    • Henrik Gramner's avatar
      Eliminate unused C DSP functions at compile time · bd046635
      Henrik Gramner authored
      When compiling with asm enabled there's no point in compiling
      C versions of DSP functions that have asm implementations using
      instruction sets that the compiler can unconditionally use.
      E.g. when compiling with -mssse3 we can remove the C version
      of all functions with SSSE3 implementations.
      This is accomplished using the compiler's dead code elimination
      Can be configured using the new 'trim_dsp' meson option, which
      by default is enabled when compiling in release mode.
    • Henrik Gramner's avatar
      cpu: Inline dav1d_get_cpu_flags() · 820bf515
      Henrik Gramner authored
  16. 22 Jun, 2022 1 commit
  17. 20 Jun, 2022 3 commits
    • Henrik Gramner's avatar
      checkasm: Speed up signal handling · 0421f787
      Henrik Gramner authored
      Enabling/disabling signal handlers is very slow and requires a syscall.
      A better approach is to keep the signal handlers enabled all the time,
      and use a simple flag variable to determine if a given signal should
      be handled or passed on to the default signal handler.
    • Henrik Gramner's avatar
      checkasm: Improve seed generation on Windows · fa68b036
      Henrik Gramner authored
      GetTickCount() increases at a very low frequency, >10ms per tick.
      When running multiple loops of checkasm instances in parallel
      different instances regularly ends up using identical seeds.
      Prefer the use of QueryPerformanceCounter() instead, which ticks at
      a significantly higher rate, which in turn increases randomness.
    • Henrik Gramner's avatar
  18. 14 Jun, 2022 1 commit