1. 07 May, 2019 2 commits
  2. 06 May, 2019 1 commit
  3. 05 May, 2019 1 commit
  4. 04 May, 2019 3 commits
    • Henrik Gramner's avatar
      Control the stack size of spawned threads · 2756254f
      Henrik Gramner authored
      On some systems (e.g. Google Fuchsia) the default stack size of new
      threads is insufficient, resulting in crashes.
      
      On other systems the default stack size is unnecessarily large,
      which can waste a lot of virtual memory.
      
      By setting it to a sufficiently large fixed value we can ensure that
      we don't run out of stack space while keeping down memory usage.
      2756254f
    • Martin Storsjö's avatar
      arm64: msac: Implement NEON msac_decode_symbol_adapt · 1d5c1a49
      Martin Storsjö authored
                                   Cortex A53    A72    A73
      msac_decode_symbol_adapt4_c:      107.6   57.1   67.8
      msac_decode_symbol_adapt4_neon:    70.4   56.4   55.1
      msac_decode_symbol_adapt8_c:      157.1   74.5   90.3
      msac_decode_symbol_adapt8_neon:    75.6   57.2   56.9
      msac_decode_symbol_adapt16_c:     257.4  106.6  135.9
      msac_decode_symbol_adapt16_neon:  101.8   62.0   65.2
      1d5c1a49
    • Martin Storsjö's avatar
      itx_tmpl: Fix the assert in inv_txfm_add_c · 058ca08d
      Martin Storsjö authored
      The previous form of the assert was automatically true for any
      value of w and h.
      058ca08d
  5. 29 Apr, 2019 1 commit
  6. 24 Apr, 2019 2 commits
  7. 22 Apr, 2019 1 commit
    • Henrik Gramner's avatar
      Fix crash in SSSE3 inverse transform · f8cac8c5
      Henrik Gramner authored
      The 32x32 identity_identity transform would corrupt the stack, including
      the return address, when compiling with a 16-byte stack alignment on
      non-Windows systems.
      f8cac8c5
  8. 19 Apr, 2019 1 commit
  9. 18 Apr, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the {16, 32, 64}x64 and 64 x{16, 32} blocks in itx · 589e96a1
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x64_dct_dct_0_8bpc_c: 3973.5
      inv_txfm_add_16x64_dct_dct_0_8bpc_ssse3: 185.7
      inv_txfm_add_16x64_dct_dct_1_8bpc_c: 37869.1
      inv_txfm_add_16x64_dct_dct_1_8bpc_ssse3: 2103.1
      inv_txfm_add_16x64_dct_dct_2_8bpc_c: 37822.9
      inv_txfm_add_16x64_dct_dct_2_8bpc_ssse3: 2099.1
      inv_txfm_add_16x64_dct_dct_3_8bpc_c: 37871.7
      inv_txfm_add_16x64_dct_dct_3_8bpc_ssse3: 2663.5
      inv_txfm_add_16x64_dct_dct_4_8bpc_c: 38002.9
      inv_txfm_add_16x64_dct_dct_4_8bpc_ssse3: 2589.7
      inv_txfm_add_32x64_dct_dct_0_8bpc_c: 8319.2
      inv_txfm_add_32x64_dct_dct_0_8bpc_ssse3: 376.9
      inv_txfm_add_32x64_dct_dct_1_8bpc_c: 85956.8
      inv_txfm_add_32x64_dct_dct_1_8bpc_ssse3: 4298.1
      inv_txfm_add_32x64_dct_dct_2_8bpc_c: 89906.2
      inv_txfm_add_32x64_dct_dct_2_8bpc_ssse3: 4291.3
      inv_txfm_add_32x64_dct_dct_3_8bpc_c: 83710.9
      inv_txfm_add_32x64_dct_dct_3_8bpc_ssse3: 5589.5
      inv_txfm_add_32x64_dct_dct_4_8bpc_c: 87733.5
      inv_txfm_add_32x64_dct_dct_4_8bpc_ssse3: 5658.4
      inv_txfm_add_64x16_dct_dct_0_8bpc_c: 3895.9
      inv_txfm_add_64x16_dct_dct_0_8bpc_ssse3: 179.5
      inv_txfm_add_64x16_dct_dct_1_8bpc_c: 51375.2
      inv_txfm_add_64x16_dct_dct_1_8bpc_ssse3: 3859.2
      inv_txfm_add_64x16_dct_dct_2_8bpc_c: 52562.9
      inv_txfm_add_64x16_dct_dct_2_8bpc_ssse3: 4044.1
      inv_txfm_add_64x16_dct_dct_3_8bpc_c: 51347.0
      inv_txfm_add_64x16_dct_dct_3_8bpc_ssse3: 5259.5
      inv_txfm_add_64x16_dct_dct_4_8bpc_c: 49642.2
      inv_txfm_add_64x16_dct_dct_4_8bpc_ssse3: 4008.4
      inv_txfm_add_64x32_dct_dct_0_8bpc_c: 7196.4
      inv_txfm_add_64x32_dct_dct_0_8bpc_ssse3: 355.8
      inv_txfm_add_64x32_dct_dct_1_8bpc_c: 106588.4
      inv_txfm_add_64x32_dct_dct_1_8bpc_ssse3: 4965.3
      inv_txfm_add_64x32_dct_dct_2_8bpc_c: 106230.7
      inv_txfm_add_64x32_dct_dct_2_8bpc_ssse3: 4772.0
      inv_txfm_add_64x32_dct_dct_3_8bpc_c: 107427.0
      inv_txfm_add_64x32_dct_dct_3_8bpc_ssse3: 7146.9
      inv_txfm_add_64x32_dct_dct_4_8bpc_c: 111785.7
      inv_txfm_add_64x32_dct_dct_4_8bpc_ssse3: 7156.2
      inv_txfm_add_64x64_dct_dct_0_8bpc_c: 14512.4
      inv_txfm_add_64x64_dct_dct_0_8bpc_ssse3: 674.2
      inv_txfm_add_64x64_dct_dct_1_8bpc_c: 173246.3
      inv_txfm_add_64x64_dct_dct_1_8bpc_ssse3: 8790.8
      inv_txfm_add_64x64_dct_dct_2_8bpc_c: 174264.6
      inv_txfm_add_64x64_dct_dct_2_8bpc_ssse3: 8767.6
      inv_txfm_add_64x64_dct_dct_3_8bpc_c: 170047.3
      inv_txfm_add_64x64_dct_dct_3_8bpc_ssse3: 10784.9
      inv_txfm_add_64x64_dct_dct_4_8bpc_c: 170182.2
      inv_txfm_add_64x64_dct_dct_4_8bpc_ssse3: 10795.6
      589e96a1
  10. 17 Apr, 2019 1 commit
    • Ronald S. Bultje's avatar
      Over-allocate level array by 3-bytes · 36e1490b
      Ronald S. Bultje authored
      This is a workaround so that the AVX2 implementation of deblock can
      index the levels array starting from the level type, which causes it
      to over-read by up to 3 bytes. This is intended to fix #269.
      36e1490b
  11. 16 Apr, 2019 3 commits
    • Martin Storsjö's avatar
      arm64: loopfilter: Implement NEON loop filters · 0282f6f3
      Martin Storsjö authored
      The exact relative speedup compared to C code is a bit vague and hard
      to measure, depending on eactly how many filtered blocks are skipped,
      as the NEON version always filters 16 pixels at a time, while the
      C code can skip processing individual 4 pixel blocks.
      
      Additionally, the checkasm benchmarking code runs the same function
      repeatedly on the same buffer, which can make the filter take
      different codepaths on each run, as the function updates the buffer
      which will be used as input for the next run.
      
      If tweaking the checkasm test data to try to avoid skipped blocks,
      the relative speedups compared to C is between 2x and 5x, while
      it is around 1x to 4x with the current checkasm test as such.
      
      Benchmark numbers from a tweaked checkasm that avoids skipped
      blocks:
      
                              Cortex A53     A72     A73
      lpf_h_sb_uv_w4_8bpc_c:      2954.7  1399.3  1655.3
      lpf_h_sb_uv_w4_8bpc_neon:    895.5   650.8   692.0
      lpf_h_sb_uv_w6_8bpc_c:      3879.2  1917.2  2257.7
      lpf_h_sb_uv_w6_8bpc_neon:   1125.6   759.5   838.4
      lpf_h_sb_y_w4_8bpc_c:       6711.0  3275.5  3913.7
      lpf_h_sb_y_w4_8bpc_neon:    1744.0  1342.1  1351.5
      lpf_h_sb_y_w8_8bpc_c:      10695.7  6155.8  6638.9
      lpf_h_sb_y_w8_8bpc_neon:    2146.5  1560.4  1609.1
      lpf_h_sb_y_w16_8bpc_c:     11355.8  6292.0  6995.9
      lpf_h_sb_y_w16_8bpc_neon:   2475.4  1949.6  1968.4
      lpf_v_sb_uv_w4_8bpc_c:      2639.7  1204.8  1425.9
      lpf_v_sb_uv_w4_8bpc_neon:    510.7   351.4   334.7
      lpf_v_sb_uv_w6_8bpc_c:      3468.3  1757.1  2021.5
      lpf_v_sb_uv_w6_8bpc_neon:    625.0   415.0   397.8
      lpf_v_sb_y_w4_8bpc_c:       5428.7  2731.7  3068.5
      lpf_v_sb_y_w4_8bpc_neon:    1172.6   792.1   768.0
      lpf_v_sb_y_w8_8bpc_c:       8946.1  4412.8  5121.0
      lpf_v_sb_y_w8_8bpc_neon:    1565.5  1063.6  1062.7
      lpf_v_sb_y_w16_8bpc_c:      8978.9  4411.7  5112.0
      lpf_v_sb_y_w16_8bpc_neon:   1775.0  1288.1  1236.7
      0282f6f3
    • Martin Storsjö's avatar
      arm64: looprestoration: Add a NEON implementation of SGR · 204bf211
      Martin Storsjö authored
      Relative speedup vs (autovectorized) C code:
                            Cortex A53    A72    A73
      selfguided_3x3_8bpc_neon:   2.91   2.12   2.68
      selfguided_5x5_8bpc_neon:   3.18   2.65   3.39
      selfguided_mix_8bpc_neon:   3.04   2.29   2.98
      
      The relative speedup vs non-vectorized C code is around 2.6-4.6x.
      204bf211
    • Martin Storsjö's avatar
      msac: Add a cast to indicate intended narrowing from size_t to unsigned · 003fa104
      Martin Storsjö authored
      This fixes this compiler warning with MSVC:
      ../src/msac.c(148): warning C4267: '+=': conversion from 'size_t' to 'unsigned int', possible loss of data
      003fa104
  12. 15 Apr, 2019 1 commit
  13. 10 Apr, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_paeth · 44d0de41
      Xuefeng Jiang authored
      intra_pred_paeth_w4_8bpc_c: 561.6
      intra_pred_paeth_w4_8bpc_ssse3: 49.2
      intra_pred_paeth_w8_8bpc_c: 1475.8
      intra_pred_paeth_w8_8bpc_ssse3: 103.0
      intra_pred_paeth_w16_8bpc_c: 4697.8
      intra_pred_paeth_w16_8bpc_ssse3: 279.0
      intra_pred_paeth_w32_8bpc_c: 13245.1
      intra_pred_paeth_w32_8bpc_ssse3: 614.7
      intra_pred_paeth_w64_8bpc_c: 32638.9
      intra_pred_paeth_w64_8bpc_ssse3: 1477.6
      44d0de41
  14. 08 Apr, 2019 1 commit
  15. 07 Apr, 2019 1 commit
    • Martin Storsjö's avatar
      arm: Fix typos in comments · 556780b7
      Martin Storsjö authored
      The width register has been set to clz(w)-24, not the other way
      around. And the 32 bit prep function has got the h parameter in
      r4, not in r5.
      556780b7
  16. 04 Apr, 2019 2 commits
    • Martin Storsjö's avatar
      arm: Consistently use 8/24 columns indentation for assembly · 5d888dde
      Martin Storsjö authored
      For cases with indented, nested .if/.macro in asm.S, ident those
      by 4 chars.
      
      Some initial assembly files were indented to 4/16 columns, while all
      the actual implementation files, starting with src/arm/64/mc.S, have
      used 8/24 for indentation.
      5d888dde
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_444 · 0d936a1a
      Xuefeng Jiang authored
      cfl_ac_444_w4_8bpc_c: 978.2
      cfl_ac_444_w4_8bpc_ssse3: 110.4
      cfl_ac_444_w8_8bpc_c: 2312.3
      cfl_ac_444_w8_8bpc_ssse3: 197.5
      cfl_ac_444_w16_8bpc_c: 4081.1
      cfl_ac_444_w16_8bpc_ssse3: 274.1
      cfl_ac_444_w32_8bpc_c: 9544.3
      cfl_ac_444_w32_8bpc_ssse3: 617.1
      0d936a1a
  17. 28 Mar, 2019 5 commits
    • Henrik Gramner's avatar
      CI: Check for newline at end of file · abb972a5
      Henrik Gramner authored
      abb972a5
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_dir: optimize best cost finding for SSE · 91568b2a
      Victorien Le Couviour--Tuffet authored
      Port of 65ee1233 for AVX-2
      from Kyle Siefring to SSE41, and optimize SSSE3.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 110.3
       after: cdef_dir_8bpc_ssse3: 105.9
         new: cdef_dir_8bpc_sse4:   96.4
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 120.6
       after: cdef_dir_8bpc_ssse3: 110.7
         new: cdef_dir_8bpc_sse4:  106.5
      ------------------------------------------
      91568b2a
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use 8-bit arithmetic for SSE · 75e88fab
      Victorien Le Couviour--Tuffet authored
      Port of c204da0f for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 141.7
       after: cdef_filter_4x4_8bpc_ssse3: 131.6
      before: cdef_filter_4x4_8bpc_sse4: 128.3
       after: cdef_filter_4x4_8bpc_sse4: 119.0
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 253.4
       after: cdef_filter_4x8_8bpc_ssse3: 236.1
      before: cdef_filter_4x8_8bpc_sse4: 228.5
       after: cdef_filter_4x8_8bpc_sse4: 213.2
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 429.6
       after: cdef_filter_8x8_8bpc_ssse3: 386.9
      before: cdef_filter_8x8_8bpc_sse4: 379.9
       after: cdef_filter_8x8_8bpc_sse4: 335.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 184.3
       after: cdef_filter_4x4_8bpc_ssse3: 163.3
      before: cdef_filter_4x4_8bpc_sse4: 168.9
       after: cdef_filter_4x4_8bpc_sse4: 146.1
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 335.3
       after: cdef_filter_4x8_8bpc_ssse3: 280.7
      before: cdef_filter_4x8_8bpc_sse4: 305.1
       after: cdef_filter_4x8_8bpc_sse4: 257.9
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 579.1
       after: cdef_filter_8x8_8bpc_ssse3: 500.5
      before: cdef_filter_8x8_8bpc_sse4: 517.0
       after: cdef_filter_8x8_8bpc_sse4: 455.8
      ------------------------------------------
      75e88fab
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use a better constant for SSE4 · 22c3594d
      Victorien Le Couviour--Tuffet authored
      Port of dc2ae517 for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 141.7
      cdef_filter_4x4_8bpc_sse4: 128.3
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 253.4
      cdef_filter_4x8_8bpc_sse4: 228.5
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 429.6
      cdef_filter_8x8_8bpc_sse4: 379.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 184.3
      cdef_filter_4x4_8bpc_sse4: 168.9
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 335.3
      cdef_filter_4x8_8bpc_sse4: 305.1
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 579.1
      cdef_filter_8x8_8bpc_sse4: 517.0
      ------------------------------------------
      22c3594d
    • Victorien Le Couviour--Tuffet's avatar
  18. 27 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 16x32,32x16 and 32x32 blocks in itx · bd12b1ec
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6
      inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9
      inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0
      inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2
      inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6
      inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3
      inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6
      inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0
      inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7
      inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7
      inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6
      inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9
      inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4
      inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7
      inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4
      inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0
      inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4
      inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3
      inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9
      inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1
      inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8
      inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4
      inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9
      inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0
      inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7
      inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0
      inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7
      inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3
      inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5
      inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5
      inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2
      inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9
      inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0
      inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0
      inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2
      inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0
      inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2
      inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0
      inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2
      inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2
      inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1
      inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4
      inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1
      inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9
      inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1
      inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2
      inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4
      inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1
      inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5
      inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8
      inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6
      inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
      bd12b1ec
  19. 26 Mar, 2019 1 commit
  20. 24 Mar, 2019 2 commits
  21. 20 Mar, 2019 1 commit
  22. 19 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 8x32 and 32x8 blocks in itx · 585ac462
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7
      inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5
      inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6
      inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5
      inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4
      inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9
      inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5
      inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2
      inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3
      inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9
      inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0
      inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8
      inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5
      inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4
      inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1
      inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8
      inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4
      inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6
      inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7
      inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8
      inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3
      inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7
      inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3
      inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1
      inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8
      inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5
      inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1
      inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6
      inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1
      inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6
      inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2
      inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0
      inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
      585ac462
  23. 18 Mar, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_420 and ipred_cfl_ac_422 · 5d944dc6
      Xuefeng Jiang authored
      cfl_ac_420_w4_8bpc_c: 1621.0
      cfl_ac_420_w4_8bpc_ssse3: 92.5
      cfl_ac_420_w8_8bpc_c: 3344.1
      cfl_ac_420_w8_8bpc_ssse3: 115.4
      cfl_ac_420_w16_8bpc_c: 6024.9
      cfl_ac_420_w16_8bpc_ssse3: 187.8
      cfl_ac_422_w4_8bpc_c: 1762.5
      cfl_ac_422_w4_8bpc_ssse3: 81.4
      cfl_ac_422_w8_8bpc_c: 4941.2
      cfl_ac_422_w8_8bpc_ssse3: 166.5
      cfl_ac_422_w16_8bpc_c: 8261.8
      cfl_ac_422_w16_8bpc_ssse3: 272.3
      5d944dc6
  24. 16 Mar, 2019 2 commits
  25. 14 Mar, 2019 2 commits
  26. 13 Mar, 2019 1 commit