1. 19 Apr, 2019 1 commit
  2. 18 Apr, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the {16, 32, 64}x64 and 64 x{16, 32} blocks in itx · 589e96a1
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x64_dct_dct_0_8bpc_c: 3973.5
      inv_txfm_add_16x64_dct_dct_0_8bpc_ssse3: 185.7
      inv_txfm_add_16x64_dct_dct_1_8bpc_c: 37869.1
      inv_txfm_add_16x64_dct_dct_1_8bpc_ssse3: 2103.1
      inv_txfm_add_16x64_dct_dct_2_8bpc_c: 37822.9
      inv_txfm_add_16x64_dct_dct_2_8bpc_ssse3: 2099.1
      inv_txfm_add_16x64_dct_dct_3_8bpc_c: 37871.7
      inv_txfm_add_16x64_dct_dct_3_8bpc_ssse3: 2663.5
      inv_txfm_add_16x64_dct_dct_4_8bpc_c: 38002.9
      inv_txfm_add_16x64_dct_dct_4_8bpc_ssse3: 2589.7
      inv_txfm_add_32x64_dct_dct_0_8bpc_c: 8319.2
      inv_txfm_add_32x64_dct_dct_0_8bpc_ssse3: 376.9
      inv_txfm_add_32x64_dct_dct_1_8bpc_c: 85956.8
      inv_txfm_add_32x64_dct_dct_1_8bpc_ssse3: 4298.1
      inv_txfm_add_32x64_dct_dct_2_8bpc_c: 89906.2
      inv_txfm_add_32x64_dct_dct_2_8bpc_ssse3: 4291.3
      inv_txfm_add_32x64_dct_dct_3_8bpc_c: 83710.9
      inv_txfm_add_32x64_dct_dct_3_8bpc_ssse3: 5589.5
      inv_txfm_add_32x64_dct_dct_4_8bpc_c: 87733.5
      inv_txfm_add_32x64_dct_dct_4_8bpc_ssse3: 5658.4
      inv_txfm_add_64x16_dct_dct_0_8bpc_c: 3895.9
      inv_txfm_add_64x16_dct_dct_0_8bpc_ssse3: 179.5
      inv_txfm_add_64x16_dct_dct_1_8bpc_c: 51375.2
      inv_txfm_add_64x16_dct_dct_1_8bpc_ssse3: 3859.2
      inv_txfm_add_64x16_dct_dct_2_8bpc_c: 52562.9
      inv_txfm_add_64x16_dct_dct_2_8bpc_ssse3: 4044.1
      inv_txfm_add_64x16_dct_dct_3_8bpc_c: 51347.0
      inv_txfm_add_64x16_dct_dct_3_8bpc_ssse3: 5259.5
      inv_txfm_add_64x16_dct_dct_4_8bpc_c: 49642.2
      inv_txfm_add_64x16_dct_dct_4_8bpc_ssse3: 4008.4
      inv_txfm_add_64x32_dct_dct_0_8bpc_c: 7196.4
      inv_txfm_add_64x32_dct_dct_0_8bpc_ssse3: 355.8
      inv_txfm_add_64x32_dct_dct_1_8bpc_c: 106588.4
      inv_txfm_add_64x32_dct_dct_1_8bpc_ssse3: 4965.3
      inv_txfm_add_64x32_dct_dct_2_8bpc_c: 106230.7
      inv_txfm_add_64x32_dct_dct_2_8bpc_ssse3: 4772.0
      inv_txfm_add_64x32_dct_dct_3_8bpc_c: 107427.0
      inv_txfm_add_64x32_dct_dct_3_8bpc_ssse3: 7146.9
      inv_txfm_add_64x32_dct_dct_4_8bpc_c: 111785.7
      inv_txfm_add_64x32_dct_dct_4_8bpc_ssse3: 7156.2
      inv_txfm_add_64x64_dct_dct_0_8bpc_c: 14512.4
      inv_txfm_add_64x64_dct_dct_0_8bpc_ssse3: 674.2
      inv_txfm_add_64x64_dct_dct_1_8bpc_c: 173246.3
      inv_txfm_add_64x64_dct_dct_1_8bpc_ssse3: 8790.8
      inv_txfm_add_64x64_dct_dct_2_8bpc_c: 174264.6
      inv_txfm_add_64x64_dct_dct_2_8bpc_ssse3: 8767.6
      inv_txfm_add_64x64_dct_dct_3_8bpc_c: 170047.3
      inv_txfm_add_64x64_dct_dct_3_8bpc_ssse3: 10784.9
      inv_txfm_add_64x64_dct_dct_4_8bpc_c: 170182.2
      inv_txfm_add_64x64_dct_dct_4_8bpc_ssse3: 10795.6
      589e96a1
  3. 17 Apr, 2019 1 commit
    • Ronald S. Bultje's avatar
      Over-allocate level array by 3-bytes · 36e1490b
      Ronald S. Bultje authored
      This is a workaround so that the AVX2 implementation of deblock can
      index the levels array starting from the level type, which causes it
      to over-read by up to 3 bytes. This is intended to fix #269.
      36e1490b
  4. 16 Apr, 2019 3 commits
    • Martin Storsjö's avatar
      arm64: loopfilter: Implement NEON loop filters · 0282f6f3
      Martin Storsjö authored
      The exact relative speedup compared to C code is a bit vague and hard
      to measure, depending on eactly how many filtered blocks are skipped,
      as the NEON version always filters 16 pixels at a time, while the
      C code can skip processing individual 4 pixel blocks.
      
      Additionally, the checkasm benchmarking code runs the same function
      repeatedly on the same buffer, which can make the filter take
      different codepaths on each run, as the function updates the buffer
      which will be used as input for the next run.
      
      If tweaking the checkasm test data to try to avoid skipped blocks,
      the relative speedups compared to C is between 2x and 5x, while
      it is around 1x to 4x with the current checkasm test as such.
      
      Benchmark numbers from a tweaked checkasm that avoids skipped
      blocks:
      
                              Cortex A53     A72     A73
      lpf_h_sb_uv_w4_8bpc_c:      2954.7  1399.3  1655.3
      lpf_h_sb_uv_w4_8bpc_neon:    895.5   650.8   692.0
      lpf_h_sb_uv_w6_8bpc_c:      3879.2  1917.2  2257.7
      lpf_h_sb_uv_w6_8bpc_neon:   1125.6   759.5   838.4
      lpf_h_sb_y_w4_8bpc_c:       6711.0  3275.5  3913.7
      lpf_h_sb_y_w4_8bpc_neon:    1744.0  1342.1  1351.5
      lpf_h_sb_y_w8_8bpc_c:      10695.7  6155.8  6638.9
      lpf_h_sb_y_w8_8bpc_neon:    2146.5  1560.4  1609.1
      lpf_h_sb_y_w16_8bpc_c:     11355.8  6292.0  6995.9
      lpf_h_sb_y_w16_8bpc_neon:   2475.4  1949.6  1968.4
      lpf_v_sb_uv_w4_8bpc_c:      2639.7  1204.8  1425.9
      lpf_v_sb_uv_w4_8bpc_neon:    510.7   351.4   334.7
      lpf_v_sb_uv_w6_8bpc_c:      3468.3  1757.1  2021.5
      lpf_v_sb_uv_w6_8bpc_neon:    625.0   415.0   397.8
      lpf_v_sb_y_w4_8bpc_c:       5428.7  2731.7  3068.5
      lpf_v_sb_y_w4_8bpc_neon:    1172.6   792.1   768.0
      lpf_v_sb_y_w8_8bpc_c:       8946.1  4412.8  5121.0
      lpf_v_sb_y_w8_8bpc_neon:    1565.5  1063.6  1062.7
      lpf_v_sb_y_w16_8bpc_c:      8978.9  4411.7  5112.0
      lpf_v_sb_y_w16_8bpc_neon:   1775.0  1288.1  1236.7
      0282f6f3
    • Martin Storsjö's avatar
      arm64: looprestoration: Add a NEON implementation of SGR · 204bf211
      Martin Storsjö authored
      Relative speedup vs (autovectorized) C code:
                            Cortex A53    A72    A73
      selfguided_3x3_8bpc_neon:   2.91   2.12   2.68
      selfguided_5x5_8bpc_neon:   3.18   2.65   3.39
      selfguided_mix_8bpc_neon:   3.04   2.29   2.98
      
      The relative speedup vs non-vectorized C code is around 2.6-4.6x.
      204bf211
    • Martin Storsjö's avatar
      msac: Add a cast to indicate intended narrowing from size_t to unsigned · 003fa104
      Martin Storsjö authored
      This fixes this compiler warning with MSVC:
      ../src/msac.c(148): warning C4267: '+=': conversion from 'size_t' to 'unsigned int', possible loss of data
      003fa104
  5. 15 Apr, 2019 1 commit
  6. 10 Apr, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_paeth · 44d0de41
      Xuefeng Jiang authored
      intra_pred_paeth_w4_8bpc_c: 561.6
      intra_pred_paeth_w4_8bpc_ssse3: 49.2
      intra_pred_paeth_w8_8bpc_c: 1475.8
      intra_pred_paeth_w8_8bpc_ssse3: 103.0
      intra_pred_paeth_w16_8bpc_c: 4697.8
      intra_pred_paeth_w16_8bpc_ssse3: 279.0
      intra_pred_paeth_w32_8bpc_c: 13245.1
      intra_pred_paeth_w32_8bpc_ssse3: 614.7
      intra_pred_paeth_w64_8bpc_c: 32638.9
      intra_pred_paeth_w64_8bpc_ssse3: 1477.6
      44d0de41
  7. 08 Apr, 2019 1 commit
  8. 07 Apr, 2019 1 commit
    • Martin Storsjö's avatar
      arm: Fix typos in comments · 556780b7
      Martin Storsjö authored
      The width register has been set to clz(w)-24, not the other way
      around. And the 32 bit prep function has got the h parameter in
      r4, not in r5.
      556780b7
  9. 04 Apr, 2019 2 commits
    • Martin Storsjö's avatar
      arm: Consistently use 8/24 columns indentation for assembly · 5d888dde
      Martin Storsjö authored
      For cases with indented, nested .if/.macro in asm.S, ident those
      by 4 chars.
      
      Some initial assembly files were indented to 4/16 columns, while all
      the actual implementation files, starting with src/arm/64/mc.S, have
      used 8/24 for indentation.
      5d888dde
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_444 · 0d936a1a
      Xuefeng Jiang authored
      cfl_ac_444_w4_8bpc_c: 978.2
      cfl_ac_444_w4_8bpc_ssse3: 110.4
      cfl_ac_444_w8_8bpc_c: 2312.3
      cfl_ac_444_w8_8bpc_ssse3: 197.5
      cfl_ac_444_w16_8bpc_c: 4081.1
      cfl_ac_444_w16_8bpc_ssse3: 274.1
      cfl_ac_444_w32_8bpc_c: 9544.3
      cfl_ac_444_w32_8bpc_ssse3: 617.1
      0d936a1a
  10. 28 Mar, 2019 5 commits
    • Henrik Gramner's avatar
      CI: Check for newline at end of file · abb972a5
      Henrik Gramner authored
      abb972a5
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_dir: optimize best cost finding for SSE · 91568b2a
      Victorien Le Couviour--Tuffet authored
      Port of 65ee1233 for AVX-2
      from Kyle Siefring to SSE41, and optimize SSSE3.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 110.3
       after: cdef_dir_8bpc_ssse3: 105.9
         new: cdef_dir_8bpc_sse4:   96.4
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 120.6
       after: cdef_dir_8bpc_ssse3: 110.7
         new: cdef_dir_8bpc_sse4:  106.5
      ------------------------------------------
      91568b2a
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use 8-bit arithmetic for SSE · 75e88fab
      Victorien Le Couviour--Tuffet authored
      Port of c204da0f for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 141.7
       after: cdef_filter_4x4_8bpc_ssse3: 131.6
      before: cdef_filter_4x4_8bpc_sse4: 128.3
       after: cdef_filter_4x4_8bpc_sse4: 119.0
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 253.4
       after: cdef_filter_4x8_8bpc_ssse3: 236.1
      before: cdef_filter_4x8_8bpc_sse4: 228.5
       after: cdef_filter_4x8_8bpc_sse4: 213.2
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 429.6
       after: cdef_filter_8x8_8bpc_ssse3: 386.9
      before: cdef_filter_8x8_8bpc_sse4: 379.9
       after: cdef_filter_8x8_8bpc_sse4: 335.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 184.3
       after: cdef_filter_4x4_8bpc_ssse3: 163.3
      before: cdef_filter_4x4_8bpc_sse4: 168.9
       after: cdef_filter_4x4_8bpc_sse4: 146.1
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 335.3
       after: cdef_filter_4x8_8bpc_ssse3: 280.7
      before: cdef_filter_4x8_8bpc_sse4: 305.1
       after: cdef_filter_4x8_8bpc_sse4: 257.9
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 579.1
       after: cdef_filter_8x8_8bpc_ssse3: 500.5
      before: cdef_filter_8x8_8bpc_sse4: 517.0
       after: cdef_filter_8x8_8bpc_sse4: 455.8
      ------------------------------------------
      75e88fab
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use a better constant for SSE4 · 22c3594d
      Victorien Le Couviour--Tuffet authored
      Port of dc2ae517 for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 141.7
      cdef_filter_4x4_8bpc_sse4: 128.3
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 253.4
      cdef_filter_4x8_8bpc_sse4: 228.5
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 429.6
      cdef_filter_8x8_8bpc_sse4: 379.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 184.3
      cdef_filter_4x4_8bpc_sse4: 168.9
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 335.3
      cdef_filter_4x8_8bpc_sse4: 305.1
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 579.1
      cdef_filter_8x8_8bpc_sse4: 517.0
      ------------------------------------------
      22c3594d
    • Victorien Le Couviour--Tuffet's avatar
  11. 27 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 16x32,32x16 and 32x32 blocks in itx · bd12b1ec
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6
      inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9
      inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0
      inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2
      inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6
      inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3
      inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6
      inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0
      inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7
      inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7
      inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6
      inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9
      inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4
      inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7
      inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4
      inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0
      inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4
      inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3
      inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9
      inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1
      inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8
      inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4
      inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9
      inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0
      inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7
      inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0
      inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7
      inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3
      inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5
      inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5
      inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2
      inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9
      inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0
      inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0
      inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2
      inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0
      inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2
      inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0
      inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2
      inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2
      inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1
      inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4
      inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1
      inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9
      inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1
      inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2
      inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4
      inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1
      inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5
      inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8
      inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6
      inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
      bd12b1ec
  12. 26 Mar, 2019 1 commit
  13. 24 Mar, 2019 2 commits
  14. 20 Mar, 2019 1 commit
  15. 19 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 8x32 and 32x8 blocks in itx · 585ac462
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7
      inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5
      inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6
      inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5
      inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4
      inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9
      inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5
      inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2
      inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3
      inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9
      inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0
      inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8
      inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5
      inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4
      inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1
      inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8
      inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4
      inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6
      inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7
      inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8
      inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3
      inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7
      inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3
      inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1
      inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8
      inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5
      inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1
      inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6
      inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1
      inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6
      inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2
      inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0
      inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
      585ac462
  16. 18 Mar, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_420 and ipred_cfl_ac_422 · 5d944dc6
      Xuefeng Jiang authored
      cfl_ac_420_w4_8bpc_c: 1621.0
      cfl_ac_420_w4_8bpc_ssse3: 92.5
      cfl_ac_420_w8_8bpc_c: 3344.1
      cfl_ac_420_w8_8bpc_ssse3: 115.4
      cfl_ac_420_w16_8bpc_c: 6024.9
      cfl_ac_420_w16_8bpc_ssse3: 187.8
      cfl_ac_422_w4_8bpc_c: 1762.5
      cfl_ac_422_w4_8bpc_ssse3: 81.4
      cfl_ac_422_w8_8bpc_c: 4941.2
      cfl_ac_422_w8_8bpc_ssse3: 166.5
      cfl_ac_422_w16_8bpc_c: 8261.8
      cfl_ac_422_w16_8bpc_ssse3: 272.3
      5d944dc6
  17. 16 Mar, 2019 2 commits
  18. 14 Mar, 2019 2 commits
  19. 13 Mar, 2019 1 commit
  20. 12 Mar, 2019 1 commit
  21. 11 Mar, 2019 5 commits
  22. 09 Mar, 2019 2 commits
  23. 08 Mar, 2019 2 commits
    • Janne Grunau's avatar
      let dav1d_version() return the project version · 754487c0
      Janne Grunau authored
      Increments the soname revision number for this behavior change.
      Removes the DAV1D_VERSION and DAV1D_VERSION_INT defines and
      dav1d_version_vcs() and dav1d_version_int().
      Also cleans up the version usage in dav1d CLI.
      Refs #241, #255.
      754487c0
    • Victorien Le Couviour--Tuffet's avatar
      x86: add SSSE3 cdef dir implementation · d67e3476
      Victorien Le Couviour--Tuffet authored
      ```------------------
      x86_64:
      ```
      
      ---------------------------------------
      cdef_dir_8bpc_c: 1023.1
      cdef_dir_8bpc_ssse3: 110.3
      cdef_dir_8bpc_avx2: 71.1
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_dir_8bpc_c: 1074.8
      cdef_dir_8bpc_ssse3: 120.6
      ------------------------------------------
      
      Thanks to Ronald for the AVX2 XMM version which was a very good starting
      point.
      d67e3476
  24. 06 Mar, 2019 1 commit