Skip to content
Snippets Groups Projects
  1. May 07, 2019
  2. May 06, 2019
  3. May 05, 2019
  4. May 04, 2019
    • Henrik Gramner's avatar
      Control the stack size of spawned threads · 2756254f
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      On some systems (e.g. Google Fuchsia) the default stack size of new
      threads is insufficient, resulting in crashes.
      
      On other systems the default stack size is unnecessarily large,
      which can waste a lot of virtual memory.
      
      By setting it to a sufficiently large fixed value we can ensure that
      we don't run out of stack space while keeping down memory usage.
      2756254f
    • Martin Storsjö's avatar
      arm64: msac: Implement NEON msac_decode_symbol_adapt · 1d5c1a49
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
                                   Cortex A53    A72    A73
      msac_decode_symbol_adapt4_c:      107.6   57.1   67.8
      msac_decode_symbol_adapt4_neon:    70.4   56.4   55.1
      msac_decode_symbol_adapt8_c:      157.1   74.5   90.3
      msac_decode_symbol_adapt8_neon:    75.6   57.2   56.9
      msac_decode_symbol_adapt16_c:     257.4  106.6  135.9
      msac_decode_symbol_adapt16_neon:  101.8   62.0   65.2
      1d5c1a49
    • Martin Storsjö's avatar
      itx_tmpl: Fix the assert in inv_txfm_add_c · 058ca08d
      Martin Storsjö authored
      The previous form of the assert was automatically true for any
      value of w and h.
      058ca08d
  5. Apr 29, 2019
  6. Apr 24, 2019
  7. Apr 22, 2019
  8. Apr 19, 2019
  9. Apr 18, 2019
    • Liwei Wang's avatar
      Add SSSE3 implementation for the {16, 32, 64}x64 and 64 x{16, 32} blocks in itx · 589e96a1
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x64_dct_dct_0_8bpc_c: 3973.5
      inv_txfm_add_16x64_dct_dct_0_8bpc_ssse3: 185.7
      inv_txfm_add_16x64_dct_dct_1_8bpc_c: 37869.1
      inv_txfm_add_16x64_dct_dct_1_8bpc_ssse3: 2103.1
      inv_txfm_add_16x64_dct_dct_2_8bpc_c: 37822.9
      inv_txfm_add_16x64_dct_dct_2_8bpc_ssse3: 2099.1
      inv_txfm_add_16x64_dct_dct_3_8bpc_c: 37871.7
      inv_txfm_add_16x64_dct_dct_3_8bpc_ssse3: 2663.5
      inv_txfm_add_16x64_dct_dct_4_8bpc_c: 38002.9
      inv_txfm_add_16x64_dct_dct_4_8bpc_ssse3: 2589.7
      inv_txfm_add_32x64_dct_dct_0_8bpc_c: 8319.2
      inv_txfm_add_32x64_dct_dct_0_8bpc_ssse3: 376.9
      inv_txfm_add_32x64_dct_dct_1_8bpc_c: 85956.8
      inv_txfm_add_32x64_dct_dct_1_8bpc_ssse3: 4298.1
      inv_txfm_add_32x64_dct_dct_2_8bpc_c: 89906.2
      inv_txfm_add_32x64_dct_dct_2_8bpc_ssse3: 4291.3
      inv_txfm_add_32x64_dct_dct_3_8bpc_c: 83710.9
      inv_txfm_add_32x64_dct_dct_3_8bpc_ssse3: 5589.5
      inv_txfm_add_32x64_dct_dct_4_8bpc_c: 87733.5
      inv_txfm_add_32x64_dct_dct_4_8bpc_ssse3: 5658.4
      inv_txfm_add_64x16_dct_dct_0_8bpc_c: 3895.9
      inv_txfm_add_64x16_dct_dct_0_8bpc_ssse3: 179.5
      inv_txfm_add_64x16_dct_dct_1_8bpc_c: 51375.2
      inv_txfm_add_64x16_dct_dct_1_8bpc_ssse3: 3859.2
      inv_txfm_add_64x16_dct_dct_2_8bpc_c: 52562.9
      inv_txfm_add_64x16_dct_dct_2_8bpc_ssse3: 4044.1
      inv_txfm_add_64x16_dct_dct_3_8bpc_c: 51347.0
      inv_txfm_add_64x16_dct_dct_3_8bpc_ssse3: 5259.5
      inv_txfm_add_64x16_dct_dct_4_8bpc_c: 49642.2
      inv_txfm_add_64x16_dct_dct_4_8bpc_ssse3: 4008.4
      inv_txfm_add_64x32_dct_dct_0_8bpc_c: 7196.4
      inv_txfm_add_64x32_dct_dct_0_8bpc_ssse3: 355.8
      inv_txfm_add_64x32_dct_dct_1_8bpc_c: 106588.4
      inv_txfm_add_64x32_dct_dct_1_8bpc_ssse3: 4965.3
      inv_txfm_add_64x32_dct_dct_2_8bpc_c: 106230.7
      inv_txfm_add_64x32_dct_dct_2_8bpc_ssse3: 4772.0
      inv_txfm_add_64x32_dct_dct_3_8bpc_c: 107427.0
      inv_txfm_add_64x32_dct_dct_3_8bpc_ssse3: 7146.9
      inv_txfm_add_64x32_dct_dct_4_8bpc_c: 111785.7
      inv_txfm_add_64x32_dct_dct_4_8bpc_ssse3: 7156.2
      inv_txfm_add_64x64_dct_dct_0_8bpc_c: 14512.4
      inv_txfm_add_64x64_dct_dct_0_8bpc_ssse3: 674.2
      inv_txfm_add_64x64_dct_dct_1_8bpc_c: 173246.3
      inv_txfm_add_64x64_dct_dct_1_8bpc_ssse3: 8790.8
      inv_txfm_add_64x64_dct_dct_2_8bpc_c: 174264.6
      inv_txfm_add_64x64_dct_dct_2_8bpc_ssse3: 8767.6
      inv_txfm_add_64x64_dct_dct_3_8bpc_c: 170047.3
      inv_txfm_add_64x64_dct_dct_3_8bpc_ssse3: 10784.9
      inv_txfm_add_64x64_dct_dct_4_8bpc_c: 170182.2
      inv_txfm_add_64x64_dct_dct_4_8bpc_ssse3: 10795.6
      589e96a1
  10. Apr 17, 2019
    • Ronald S. Bultje's avatar
      Over-allocate level array by 3-bytes · 36e1490b
      Ronald S. Bultje authored
      This is a workaround so that the AVX2 implementation of deblock can
      index the levels array starting from the level type, which causes it
      to over-read by up to 3 bytes. This is intended to fix #269.
      36e1490b
  11. Apr 16, 2019
    • Martin Storsjö's avatar
      arm64: loopfilter: Implement NEON loop filters · 0282f6f3
      Martin Storsjö authored
      The exact relative speedup compared to C code is a bit vague and hard
      to measure, depending on eactly how many filtered blocks are skipped,
      as the NEON version always filters 16 pixels at a time, while the
      C code can skip processing individual 4 pixel blocks.
      
      Additionally, the checkasm benchmarking code runs the same function
      repeatedly on the same buffer, which can make the filter take
      different codepaths on each run, as the function updates the buffer
      which will be used as input for the next run.
      
      If tweaking the checkasm test data to try to avoid skipped blocks,
      the relative speedups compared to C is between 2x and 5x, while
      it is around 1x to 4x with the current checkasm test as such.
      
      Benchmark numbers from a tweaked checkasm that avoids skipped
      blocks:
      
                              Cortex A53     A72     A73
      lpf_h_sb_uv_w4_8bpc_c:      2954.7  1399.3  1655.3
      lpf_h_sb_uv_w4_8bpc_neon:    895.5   650.8   692.0
      lpf_h_sb_uv_w6_8bpc_c:      3879.2  1917.2  2257.7
      lpf_h_sb_uv_w6_8bpc_neon:   1125.6   759.5   838.4
      lpf_h_sb_y_w4_8bpc_c:       6711.0  3275.5  3913.7
      lpf_h_sb_y_w4_8bpc_neon:    1744.0  1342.1  1351.5
      lpf_h_sb_y_w8_8bpc_c:      10695.7  6155.8  6638.9
      lpf_h_sb_y_w8_8bpc_neon:    2146.5  1560.4  1609.1
      lpf_h_sb_y_w16_8bpc_c:     11355.8  6292.0  6995.9
      lpf_h_sb_y_w16_8bpc_neon:   2475.4  1949.6  1968.4
      lpf_v_sb_uv_w4_8bpc_c:      2639.7  1204.8  1425.9
      lpf_v_sb_uv_w4_8bpc_neon:    510.7   351.4   334.7
      lpf_v_sb_uv_w6_8bpc_c:      3468.3  1757.1  2021.5
      lpf_v_sb_uv_w6_8bpc_neon:    625.0   415.0   397.8
      lpf_v_sb_y_w4_8bpc_c:       5428.7  2731.7  3068.5
      lpf_v_sb_y_w4_8bpc_neon:    1172.6   792.1   768.0
      lpf_v_sb_y_w8_8bpc_c:       8946.1  4412.8  5121.0
      lpf_v_sb_y_w8_8bpc_neon:    1565.5  1063.6  1062.7
      lpf_v_sb_y_w16_8bpc_c:      8978.9  4411.7  5112.0
      lpf_v_sb_y_w16_8bpc_neon:   1775.0  1288.1  1236.7
      0282f6f3
    • Martin Storsjö's avatar
      arm64: looprestoration: Add a NEON implementation of SGR · 204bf211
      Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
      Relative speedup vs (autovectorized) C code:
                            Cortex A53    A72    A73
      selfguided_3x3_8bpc_neon:   2.91   2.12   2.68
      selfguided_5x5_8bpc_neon:   3.18   2.65   3.39
      selfguided_mix_8bpc_neon:   3.04   2.29   2.98
      
      The relative speedup vs non-vectorized C code is around 2.6-4.6x.
      204bf211
    • Martin Storsjö's avatar
      msac: Add a cast to indicate intended narrowing from size_t to unsigned · 003fa104
      Martin Storsjö authored
      This fixes this compiler warning with MSVC:
      ../src/msac.c(148): warning C4267: '+=': conversion from 'size_t' to 'unsigned int', possible loss of data
      003fa104
  12. Apr 15, 2019
  13. Apr 10, 2019
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_paeth · 44d0de41
      Xuefeng Jiang authored and Henrik Gramner's avatar Henrik Gramner committed
      intra_pred_paeth_w4_8bpc_c: 561.6
      intra_pred_paeth_w4_8bpc_ssse3: 49.2
      intra_pred_paeth_w8_8bpc_c: 1475.8
      intra_pred_paeth_w8_8bpc_ssse3: 103.0
      intra_pred_paeth_w16_8bpc_c: 4697.8
      intra_pred_paeth_w16_8bpc_ssse3: 279.0
      intra_pred_paeth_w32_8bpc_c: 13245.1
      intra_pred_paeth_w32_8bpc_ssse3: 614.7
      intra_pred_paeth_w64_8bpc_c: 32638.9
      intra_pred_paeth_w64_8bpc_ssse3: 1477.6
      44d0de41
  14. Apr 08, 2019
  15. Apr 07, 2019
    • Martin Storsjö's avatar
      arm: Fix typos in comments · 556780b7
      Martin Storsjö authored
      The width register has been set to clz(w)-24, not the other way
      around. And the 32 bit prep function has got the h parameter in
      r4, not in r5.
      556780b7
  16. Apr 04, 2019
  17. Mar 28, 2019
    • Henrik Gramner's avatar
      CI: Check for newline at end of file · abb972a5
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      abb972a5
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_dir: optimize best cost finding for SSE · 91568b2a
      Victorien Le Couviour--Tuffet authored
      Port of 65ee1233 for AVX-2
      from Kyle Siefring to SSE41, and optimize SSSE3.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 110.3
       after: cdef_dir_8bpc_ssse3: 105.9
         new: cdef_dir_8bpc_sse4:   96.4
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 120.6
       after: cdef_dir_8bpc_ssse3: 110.7
         new: cdef_dir_8bpc_sse4:  106.5
      ------------------------------------------
      91568b2a
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use 8-bit arithmetic for SSE · 75e88fab
      Victorien Le Couviour--Tuffet authored
      Port of c204da0f for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 141.7
       after: cdef_filter_4x4_8bpc_ssse3: 131.6
      before: cdef_filter_4x4_8bpc_sse4: 128.3
       after: cdef_filter_4x4_8bpc_sse4: 119.0
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 253.4
       after: cdef_filter_4x8_8bpc_ssse3: 236.1
      before: cdef_filter_4x8_8bpc_sse4: 228.5
       after: cdef_filter_4x8_8bpc_sse4: 213.2
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 429.6
       after: cdef_filter_8x8_8bpc_ssse3: 386.9
      before: cdef_filter_8x8_8bpc_sse4: 379.9
       after: cdef_filter_8x8_8bpc_sse4: 335.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 184.3
       after: cdef_filter_4x4_8bpc_ssse3: 163.3
      before: cdef_filter_4x4_8bpc_sse4: 168.9
       after: cdef_filter_4x4_8bpc_sse4: 146.1
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 335.3
       after: cdef_filter_4x8_8bpc_ssse3: 280.7
      before: cdef_filter_4x8_8bpc_sse4: 305.1
       after: cdef_filter_4x8_8bpc_sse4: 257.9
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 579.1
       after: cdef_filter_8x8_8bpc_ssse3: 500.5
      before: cdef_filter_8x8_8bpc_sse4: 517.0
       after: cdef_filter_8x8_8bpc_sse4: 455.8
      ------------------------------------------
      75e88fab
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use a better constant for SSE4 · 22c3594d
      Victorien Le Couviour--Tuffet authored
      Port of dc2ae517 for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 141.7
      cdef_filter_4x4_8bpc_sse4: 128.3
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 253.4
      cdef_filter_4x8_8bpc_sse4: 228.5
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 429.6
      cdef_filter_8x8_8bpc_sse4: 379.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 184.3
      cdef_filter_4x4_8bpc_sse4: 168.9
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 335.3
      cdef_filter_4x8_8bpc_sse4: 305.1
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 579.1
      cdef_filter_8x8_8bpc_sse4: 517.0
      ------------------------------------------
      22c3594d
    • Victorien Le Couviour--Tuffet's avatar
  18. Mar 27, 2019
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 16x32,32x16 and 32x32 blocks in itx · bd12b1ec
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6
      inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9
      inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0
      inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2
      inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6
      inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3
      inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6
      inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0
      inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7
      inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7
      inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6
      inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9
      inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4
      inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7
      inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4
      inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0
      inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4
      inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3
      inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9
      inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1
      inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8
      inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4
      inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9
      inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0
      inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7
      inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0
      inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7
      inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3
      inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5
      inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5
      inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2
      inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9
      inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0
      inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0
      inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2
      inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0
      inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2
      inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0
      inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2
      inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2
      inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1
      inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4
      inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1
      inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9
      inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1
      inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2
      inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4
      inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1
      inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5
      inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8
      inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6
      inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
      bd12b1ec
  19. Mar 26, 2019
  20. Mar 24, 2019
  21. Mar 20, 2019
  22. Mar 19, 2019
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 8x32 and 32x8 blocks in itx · 585ac462
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7
      inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5
      inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6
      inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5
      inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4
      inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9
      inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5
      inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2
      inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3
      inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9
      inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0
      inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8
      inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5
      inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4
      inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1
      inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8
      inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4
      inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6
      inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7
      inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8
      inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3
      inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7
      inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3
      inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1
      inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8
      inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5
      inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1
      inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6
      inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1
      inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6
      inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2
      inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0
      inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
      585ac462
  23. Mar 18, 2019
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_420 and ipred_cfl_ac_422 · 5d944dc6
      Xuefeng Jiang authored and Henrik Gramner's avatar Henrik Gramner committed
      cfl_ac_420_w4_8bpc_c: 1621.0
      cfl_ac_420_w4_8bpc_ssse3: 92.5
      cfl_ac_420_w8_8bpc_c: 3344.1
      cfl_ac_420_w8_8bpc_ssse3: 115.4
      cfl_ac_420_w16_8bpc_c: 6024.9
      cfl_ac_420_w16_8bpc_ssse3: 187.8
      cfl_ac_422_w4_8bpc_c: 1762.5
      cfl_ac_422_w4_8bpc_ssse3: 81.4
      cfl_ac_422_w8_8bpc_c: 4941.2
      cfl_ac_422_w8_8bpc_ssse3: 166.5
      cfl_ac_422_w16_8bpc_c: 8261.8
      cfl_ac_422_w16_8bpc_ssse3: 272.3
      5d944dc6
  24. Mar 16, 2019
  25. Mar 14, 2019
  26. Mar 13, 2019
Loading