• Victorien Le Couviour--Tuffet's avatar
    x86: cdef_filter: use 8-bit arithmetic for SSE · 75e88fab
    Victorien Le Couviour--Tuffet authored
    Port of c204da0f for AVX-2
    from Kyle Siefring.
    
    ---------------------
    x86_64:
    ------------------------------------------
    before: cdef_filter_4x4_8bpc_ssse3: 141.7
     after: cdef_filter_4x4_8bpc_ssse3: 131.6
    before: cdef_filter_4x4_8bpc_sse4: 128.3
     after: cdef_filter_4x4_8bpc_sse4: 119.0
    ------------------------------------------
    before: cdef_filter_4x8_8bpc_ssse3: 253.4
     after: cdef_filter_4x8_8bpc_ssse3: 236.1
    before: cdef_filter_4x8_8bpc_sse4: 228.5
     after: cdef_filter_4x8_8bpc_sse4: 213.2
    ------------------------------------------
    before: cdef_filter_8x8_8bpc_ssse3: 429.6
     after: cdef_filter_8x8_8bpc_ssse3: 386.9
    before: cdef_filter_8x8_8bpc_sse4: 379.9
     after: cdef_filter_8x8_8bpc_sse4: 335.9
    ------------------------------------------
    
    ---------------------
    x86_32:
    ------------------------------------------
    before: cdef_filter_4x4_8bpc_ssse3: 184.3
     after: cdef_filter_4x4_8bpc_ssse3: 163.3
    before: cdef_filter_4x4_8bpc_sse4: 168.9
     after: cdef_filter_4x4_8bpc_sse4: 146.1
    ------------------------------------------
    before: cdef_filter_4x8_8bpc_ssse3: 335.3
     after: cdef_filter_4x8_8bpc_ssse3: 280.7
    before: cdef_filter_4x8_8bpc_sse4: 305.1
     after: cdef_filter_4x8_8bpc_sse4: 257.9
    ------------------------------------------
    before: cdef_filter_8x8_8bpc_ssse3: 579.1
     after: cdef_filter_8x8_8bpc_ssse3: 500.5
    before: cdef_filter_8x8_8bpc_sse4: 517.0
     after: cdef_filter_8x8_8bpc_sse4: 455.8
    ------------------------------------------
    75e88fab
cdef_sse.asm 40 KB