Skip to content
Snippets Groups Projects

arm64: mc: Minor misc optimizations

Merged Martin Storsjö requested to merge mstorsjo/dav1d:arm64-mc-opt into master
  1. Feb 10, 2020
    • Martin Storsjö's avatar
      arm64: mc: Reduce the width of a register copy · d4c5ad49
      Martin Storsjö authored and Janne Grunau's avatar Janne Grunau committed
      Only copy as much as really is needed/used.
      d4c5ad49
    • Martin Storsjö's avatar
      arm64: mc: Use two regs for alternating output rows for w4/8 in avg/w_avg/mask · b1167ce1
      Martin Storsjö authored and Janne Grunau's avatar Janne Grunau committed
      It was already done this way for w32/64. Not doing it for w16 as it
      didn't help there (and instead gave a small slowdown due to the two
      setup instructions).
      
      This gives a small speedup on in-order cores like A53.
      
      Before:         Cortex A53     A72     A73
      avg_w4_8bpc_neon:     60.9    25.6    29.0
      avg_w8_8bpc_neon:    143.6    52.8    64.0
      After:
      avg_w4_8bpc_neon:     56.7    26.7    28.5
      avg_w8_8bpc_neon:    137.2    54.5    64.4
      b1167ce1
    • Martin Storsjö's avatar
      arm64: mc: Simplify avg/w_avg/mask by always using the w16 macro · 0bad117e
      Martin Storsjö authored and Janne Grunau's avatar Janne Grunau committed
      This shortens the source by 40 lines, and gives a significant
      speedup on A53, a small speedup on A72 and a very minor slowdown
      for avg/w_avg on A73.
      
      Before:           Cortex A53     A72     A73
      avg_w4_8bpc_neon:       67.4    26.1    25.4
      avg_w8_8bpc_neon:      158.7    56.3    59.1
      avg_w16_8bpc_neon:     382.9   154.1   160.7
      w_avg_w4_8bpc_neon:     99.9    43.6    39.4
      w_avg_w8_8bpc_neon:    253.2    98.3    99.0
      w_avg_w16_8bpc_neon:   543.1   285.0   301.8
      mask_w4_8bpc_neon:     110.6    51.4    45.1
      mask_w8_8bpc_neon:     295.0   129.9   114.0
      mask_w16_8bpc_neon:    654.6   365.8   369.7
      After:
      avg_w4_8bpc_neon:       60.8    26.3    29.0
      avg_w8_8bpc_neon:      142.8    52.9    64.1
      avg_w16_8bpc_neon:     378.2   153.4   160.8
      w_avg_w4_8bpc_neon:     78.7    41.0    40.9
      w_avg_w8_8bpc_neon:    190.6    90.1   105.1
      w_avg_w16_8bpc_neon:   531.1   279.3   301.4
      mask_w4_8bpc_neon:      86.6    47.2    44.9
      mask_w8_8bpc_neon:     222.0   114.3   114.9
      mask_w16_8bpc_neon:    639.5   356.0   369.8
      0bad117e
Loading