Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
dav1d
dav1d
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 22
    • Issues 22
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 21
    • Merge Requests 21
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • VideoLAN
  • dav1ddav1d
  • Merge Requests
  • !985

Merged
Created May 05, 2020 by Martin Storsjö@mstorsjoDeveloper

arm64: itx: Add NEON implementation of itx for 10 bpc

  • Overview 0
  • Commits 12
  • Pipelines 4
  • Changes 10

This branch contains a number of minor fixups for the existing 8 bpc itx as well.

Add an element size specifier to the existing individual transform functions for 8 bpc, naming them e.g. inv_dct_8h_x8_neon, to clarify that they operate on input vectors of 8h, and make the symbols public, to let the 10 bpc case call them from a different object file. The same convention is used in the new itx16.S, like inv_dct_4s_x8_neon.

Make the existing itx.S compiled regardless of whether 8 bpc support is enabled. For builds with 8 bpc support disabled, this does include the unused frontend functions though, but this is hopefully tolerable to avoid having to split the file into a sharable file for transforms and a separate one for frontends.

This only implements the 10 bpc case, as that case can use transforms operating on 16 bit coefficients in the second pass.

Relative speedup vs C for a few functions:

                                     Cortex A53    A72    A73
inv_txfm_add_4x4_dct_dct_0_10bpc_neon:     4.14   4.06   4.49
inv_txfm_add_4x4_dct_dct_1_10bpc_neon:     6.51   6.49   6.42
inv_txfm_add_8x8_dct_dct_0_10bpc_neon:     5.02   4.63   6.23
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:     8.54   7.13  11.96
inv_txfm_add_16x16_dct_dct_0_10bpc_neon:   5.52   6.60   8.03
inv_txfm_add_16x16_dct_dct_1_10bpc_neon:  11.27   9.62  12.22
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   9.60   6.97   8.59
inv_txfm_add_32x32_dct_dct_0_10bpc_neon:   2.60   3.48   3.19
inv_txfm_add_32x32_dct_dct_1_10bpc_neon:  14.65  12.64  16.86
inv_txfm_add_32x32_dct_dct_2_10bpc_neon:  11.57   8.80  12.68
inv_txfm_add_32x32_dct_dct_3_10bpc_neon:   8.79   8.00   9.21
inv_txfm_add_32x32_dct_dct_4_10bpc_neon:   7.58   6.21   7.80
inv_txfm_add_64x64_dct_dct_0_10bpc_neon:   2.41   2.85   2.75
inv_txfm_add_64x64_dct_dct_1_10bpc_neon:  12.91  10.27  12.24
inv_txfm_add_64x64_dct_dct_2_10bpc_neon:  10.96   7.97  10.31
inv_txfm_add_64x64_dct_dct_3_10bpc_neon:   8.95   7.42   9.55
inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   7.97   6.12   7.82
Assignee
Assign to
Reviewer
Request review from
0.7.0
Milestone
0.7.0
Assign milestone
Time tracking
Source branch: arm64-itx-10bpc