Skip to content
Snippets Groups Projects

SSSE3 CDEF

Merged Victorien Le Couviour--Tuffet requested to merge psilokos/dav1d:ssse3_cdef into master

Merge request reports

Checking pipeline status.

Approval is optional

Merged by Jean-Baptiste KempfJean-Baptiste Kempf 6 years ago (Feb 26, 2019 10:28am UTC)

Merge details

  • Changes merged into master with 791ec219.
  • Deleted the source branch.
  • Auto-merge enabled

Pipeline #5058 passed

Pipeline passed for 791ec219 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 5 commits

    • dce4e788 - 1 commit from branch videolan:master
    • aaf43c71 - cdef: improve AVX2 cdef_filter macro consistency
    • dd368077 - cdef: remove redundant AVX2 code
    • fe7429c1 - cdef: fix potential crash in AVX2
    • e61d2fa1 - cdef: add SSSE3 implementation

    Compare with previous version

  • added 2 commits

    Compare with previous version

  • added 1 commit

    • 4c0e91e9 - wip_check_if_checkasm_is_too_narrow

    Compare with previous version

  • added 6 commits

    • e2f1d869 - checkasm: widen cdef filter strengh max value
    • 73fea97a - cdef: improve AVX2 cdef_filter macro consistency
    • 364769f5 - cdef: remove redundant AVX2 code
    • 0988f07e - cdef: fix potential crash in AVX2
    • 110732b1 - cdef: add SSSE3 implementation
    • 6d6f6b47 - TODO

    Compare with previous version

  • added 5 commits

    • 2f485fc6 - checkasm: widen cdef filter strengh max value
    • 30339e8d - cdef: improve AVX2 cdef_filter macro consistency
    • 8a6fe183 - cdef: remove redundant AVX2 code
    • d29387ed - cdef: fix potential crash in AVX2
    • 82834f34 - cdef: add SSSE3 implementation

    Compare with previous version

  • Victorien Le Couviour--Tuffet unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Henrik Gramner
  • added 2 commits

    • 30357edd - cdef: add SSSE3 implementation
    • 60a7dbe7 - cdef: add AVX2 4x8 filter

    Compare with previous version

  • Victorien Le Couviour--Tuffet marked as a Work In Progress

    marked as a Work In Progress

  • @psilokos I see a few AVX2 commits, do you expect significant changes in AVX2 performance?

    Edited by Ewout ter Hoeven
  • added 2 commits

    • 60238f5a - cdef: add AVX2 4x8 filter
    • 76302262 - cdef: add SSSE3 implementation

    Compare with previous version

  • @EwoutH not for what was already there, but I added the YUV 422 case, so you can make a bench of this one yes :)

  • @psilokos Then I will go on a great quest for 4:2:2 samples :)

  • added 1 commit

    • e88293fb - cdef: add SSSE3 implementation

    Compare with previous version

  • Victorien Le Couviour--Tuffet unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Victorien Le Couviour--Tuffet resolved all discussions

    resolved all discussions

  • Jean-Baptiste Kempf changed milestone to %0.2.0

    changed milestone to %0.2.0

  • Doesn't the "fix potential crash in AVX2" only apply to 4x8? In that case it should be squashed into the commit which adds the 4x8 version.

  • Intel Xeon w3680 Westmere 6x3.33 GHz (12 logical cores)
    Chimera 8-bit 1080p

    stock 1ba8423a

    frame threads tile threads time
    1 1 13m21.751s
    6 1 4m7.455s
    6 2 3m1.221s

    patched e88293fb

    frame threads tile threads time diff
    1 1 9m32.778s 40% faster
    6 1 2m48.903s 46% faster
    6 2 1m47.225s 69% faster

    Intel Xeon E5-2699 v4 Broadwell 22x2.2 GHz (44 logical cores)
    Liquid cooled with sustained turbo 2.8 GHz
    Chimera 8-bit 1080p

    stock 1ba8423a (cross GCC 8.2 i686)

    frame threads tile threads time
    1 1 16m22.632s
    11 1 5m5.261s
    22 1 4m46.58s
    11 2 3m59.2s
    22 2 3m44.466s

    patched e88293fb (cross GCC 8.2 i686)

    frame threads tile threads time diff
    1 1 10m9.794s 61% faster
    11 1 2m39.771s 91% faster
    22 1 2m29.10s 92% faster
    11 2 1m28.825s 169% faster
    22 2 1m23.607s 168% faster
    Edited by Bradley Sepos
  • Ronald S. Bultje mentioned in issue #78

    mentioned in issue #78

  • added 17 commits

    • e88293fb...9cb94d29 - 12 commits from branch videolan:master
    • 6e774b23 - checkasm: widen cdef filter strengh max value
    • 18ea9ba0 - cdef: improve AVX2 cdef_filter macro consistency
    • 6b62d725 - cdef: remove redundant AVX2 code
    • 018b8db0 - cdef: add AVX2 4x8 filter
    • 1e2fc370 - cdef: add SSSE3 implementation

    Compare with previous version

  • added 6 commits

    • 3b19f5b6 - checkasm: decrease cdef filter min damping value
    • ee3c5aa5 - cdef: improve AVX2 cdef_filter macro consistency
    • f9f50601 - cdef: remove redundant AVX2 code
    • 50310d80 - cdef: optimize 4 by X filters for HAVE_RIGHT=0
    • 040598ba - cdef: add AVX2 4x8 filter
    • 2f3ce00c - cdef: add SSSE3 implementation

    Compare with previous version

  • Henrik Gramner
  • Victorien Le Couviour--Tuffet
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading