Skip to content
Snippets Groups Projects

x86: Add cdef_filter SSE optimizations

Merged Henrik Gramner requested to merge gramner/dav1d:x86_cdef_sse_optimizations into master
                              old 32 new      old 64 new

cdef_filter_4x4_8bpc_sse2:   205.8  130.5    189.1  128.5
cdef_filter_4x4_8bpc_ssse3:  163.3  103.7    142.5  103.3
cdef_filter_4x4_8bpc_sse4:   150.3   99.5    130.6   98.8

cdef_filter_4x8_8bpc_sse2:   377.2  222.8    336.7  222.1
cdef_filter_4x8_8bpc_ssse3:  291.6  171.4    245.7  164.6
cdef_filter_4x8_8bpc_sse4:   264.7  163.2    218.7  157.2

cdef_filter_8x8_8bpc_sse2:   668.5  369.9    567.4  365.0
cdef_filter_8x8_8bpc_ssse3:  509.5  271.8    399.6  250.6
cdef_filter_8x8_8bpc_sse4:   461.6  258.5    341.0  234.3

Most performance gain is from having separate code paths for !pri_strength and !sec_strength, but there's various small optimizations everywhere.

The 32-bit PIC handling is also cleaned up and simplified.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading