Skip to content
Snippets Groups Projects

x86: Add minor loopfilter asm improvements

Merged Henrik Gramner requested to merge gramner/dav1d:x86_loopfilter_improvements into master

AVX2 changes are mainly for code size reductions by sharing common code between luma and chroma functions, but the 8-bit AVX-512 changes also includes some small speedups due to more efficient mask calculations:

lpf_h_sb_uv_w4_8bpc_avx512icl: 131.0 -> 129.3
lpf_h_sb_uv_w6_8bpc_avx512icl: 178.9 -> 172.8

lpf_h_sb_y_w4_8bpc_avx512icl:  234.0 -> 228.6
lpf_h_sb_y_w8_8bpc_avx512icl:  384.7 -> 375.8
lpf_h_sb_y_w16_8bpc_avx512icl: 620.8 -> 587.7

lpf_v_sb_uv_w4_8bpc_avx512icl:  32.9 ->  31.1
lpf_v_sb_uv_w6_8bpc_avx512icl:  67.9 ->  64.4

lpf_v_sb_y_w4_8bpc_avx512icl:   64.7 ->  63.2
lpf_v_sb_y_w8_8bpc_avx512icl:  185.2 -> 175.6
lpf_v_sb_y_w16_8bpc_avx512icl: 350.6 -> 314.6

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading