Skip to content
Snippets Groups Projects

avcodec/riscv: add h264 dc idct rvv

Closed J. Dekker requested to merge riscv-idct_dc into riscv
3 unresolved threads
  • h264_idct4_add_dc_8bpp_c: 1.7
  • h264_idct4_add_dc_8bpp_rvv_i64: 1.2
  • h264_idct4_add_dc_9bpp_c: 1.5
  • h264_idct4_add_dc_9bpp_rvv_i64: 0.7
  • h264_idct4_add_dc_10bpp_c: 1.5
  • h264_idct4_add_dc_10bpp_rvv_i64: 0.7
  • h264_idct4_add_dc_12bpp_c: 1.5
  • h264_idct4_add_dc_12bpp_rvv_i64: 17.7
  • h264_idct4_add_dc_14bpp_c: 1.7
  • h264_idct4_add_dc_14bpp_rvv_i64: 0.7
  • h264_idct8_add_dc_8bpp_c: 6.2
  • h264_idct8_add_dc_8bpp_rvv_i64: 2.2
  • h264_idct8_add_dc_9bpp_c: 6.0
  • h264_idct8_add_dc_9bpp_rvv_i64: 1.2
  • h264_idct8_add_dc_10bpp_c: 6.0
  • h264_idct8_add_dc_10bpp_rvv_i64: 1.2
  • h264_idct8_add_dc_12bpp_c: 6.2
  • h264_idct8_add_dc_12bpp_rvv_i64: 1.2
  • h264_idct8_add_dc_14bpp_c: 6.2
  • h264_idct8_add_dc_14bpp_rvv_i64: 1.5

Signed-off-by: J. Dekker jdek@itanimul.li

Edited by J. Dekker

Merge request reports

Closed by J. DekkerJ. Dekker 7 months ago (Aug 28, 2024 10:50am UTC)

Merge details

  • The changes were not merged into riscv.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
38 .endif
39 addi a3, a3, 32
40 srai a3, a3, 6
41 .if \depth == 8
42 sh zero, 0(a1)
43 .else
44 sw zero, 0(a1)
45 .endif
46 add t2, a2, a2
47 mv t1, a4
48 add t2, t2, a2
49 1:
50 .if \depth == 8
51 vsetvli zero, a4, e8, m1
52 vle8.v v0, (a0)
53 add a0, a0, a2
  • 44 sw zero, 0(a1)
    45 .endif
    46 add t2, a2, a2
    47 mv t1, a4
    48 add t2, t2, a2
    49 1:
    50 .if \depth == 8
    51 vsetvli zero, a4, e8, m1
    52 vle8.v v0, (a0)
    53 add a0, a0, a2
    54 vle8.v v1, (a0)
    55 add a0, a0, a2
    56 vle8.v v2, (a0)
    57 add a0, a0, a2
    58 vle8.v v3, (a0)
    59 vwcvtu.x.x.v v4, v0
  • 59 vwcvtu.x.x.v v4, v0
    60 vwcvtu.x.x.v v6, v1
    61 vwcvtu.x.x.v v8, v2
    62 vwcvtu.x.x.v v10, v3
    63 vsetvli zero, a4, e16, m1
    64 .else
    65 vsetvli zero, a4, e16, m1
    66 vle16.v v4, (a0)
    67 add a0, a0, a2
    68 vle16.v v6, (a0)
    69 add a0, a0, a2
    70 vle16.v v8, (a0)
    71 add a0, a0, a2
    72 vle16.v v10, (a0)
    73 .endif
    74 vadd.vx v4, v4, a3
  • Rémi Denis-Courmont
  • J. Dekker added 368 commits

    added 368 commits

    • c6c755da...77d971c3 - 358 earlier commits
    • 7904ec2d - avcodec/vvcdec: refact, remove hf_idx and vf_idx from mc_xxx's param list
    • cae0b012 - avcodec/vvcdec: increase edge_emu_buffer for RPR
    • 1b33c9a5 - avcodec/vvcdec: support Reference Picture Resampling
    • b8eb8b4f - Changelog: add DVB compatible information for VVC decoder
    • a9dc7dd7 - checkasm: vvc_alf: Limit benchmarking to a reasonable subset of functions
    • b1adf6d1 - checkasm: add runs argument to adjust during bench
    • d43e1238 - checkasm: print bench runs when benchmarking
    • 60933671 - checkasm: h264dsp: Avoid out of buffer writes when benchmarking
    • a1e620db - avcodec/riscv: add h264 dc idct rvv
    • d16e9826 - wip

    Compare with previous version

  • Author Owner

    I tried to use m4 to reduce the number of widens/adds/narrows but according to tests they were slower than just using mf2/m1 alone and reducing the total number of vsetvlis.

    At first the idea was to do two functions which would cover low and high bit depth of 4x4 and 8x8, to me it seemed very vector-y. See ff_h264_idct4_dc_add_8_rvv_new for the idea behind doubling the number of functions (one for low4, low8, high4, high8), this reads a lot like traditional SIMD implementations though. From some (very noisy) benchmarks it seems to still be reasonably faster overall.

    I've tried rdtime and rdcycle with varying number of runs, I'm going to try clock_gettime() again since @unlord says that it's fine in dav1d checkasm.

  • Author Owner

    Using clock_gettime():

    user@canaan ~/ffmpeg $ ./tests/checkasm/checkasm --test=h264dsp --bench --runs=17
    benchmarking with native FFmpeg timers
    nop: 64.4
    checkasm: using random seed 3650982421
    checkasm: bench runs 131072 (1 << 17)
    RVVi64:
     - h264dsp.idct              [OK]
    checkasm: all 10 tests passed
    h264_idct4_add_dc_8bpp_c: 57.9
    h264_idct4_add_dc_8bpp_rvv_i64: 30.1
    h264_idct4_add_dc_9bpp_c: 57.9
    h264_idct4_add_dc_9bpp_rvv_i64: 30.1
    h264_idct4_add_dc_10bpp_c: 57.9
    h264_idct4_add_dc_10bpp_rvv_i64: 20.9
    h264_idct4_add_dc_12bpp_c: 48.6
    h264_idct4_add_dc_12bpp_rvv_i64: 21.1
    h264_idct4_add_dc_14bpp_c: 57.9
    h264_idct4_add_dc_14bpp_rvv_i64: 20.9
    h264_idct8_add_dc_8bpp_c: 224.6
    h264_idct8_add_dc_8bpp_rvv_i64: 57.9
    h264_idct8_add_dc_9bpp_c: 224.6
    h264_idct8_add_dc_9bpp_rvv_i64: 39.4
    h264_idct8_add_dc_10bpp_c: 224.6
    h264_idct8_add_dc_10bpp_rvv_i64: 39.4
    h264_idct8_add_dc_12bpp_c: 224.6
    h264_idct8_add_dc_12bpp_rvv_i64: 48.6
    h264_idct8_add_dc_14bpp_c: 224.6
    h264_idct8_add_dc_14bpp_rvv_i64: 48.6
    user@canaan ~/ffmpeg $ ./tests/checkasm/checkasm --test=h264dsp --bench --runs=17
    benchmarking with native FFmpeg timers
    nop: 51.8
    checkasm: using random seed 666058969
    checkasm: bench runs 131072 (1 << 17)
    RVVi64:
     - h264dsp.idct              [OK]
    checkasm: all 10 tests passed
    h264_idct4_add_dc_8bpp_c: 51.8
    h264_idct4_add_dc_8bpp_rvv_i64: 42.5
    h264_idct4_add_dc_9bpp_c: 61.0
    h264_idct4_add_dc_9bpp_rvv_i64: 33.3
    h264_idct4_add_dc_10bpp_c: 61.3
    h264_idct4_add_dc_10bpp_rvv_i64: 24.0
    h264_idct4_add_dc_12bpp_c: 61.0
    h264_idct4_add_dc_12bpp_rvv_i64: 24.0
    h264_idct4_add_dc_14bpp_c: 61.0
    h264_idct4_add_dc_14bpp_rvv_i64: 24.0
    h264_idct8_add_dc_8bpp_c: 227.5
    h264_idct8_add_dc_8bpp_rvv_i64: 61.0
    h264_idct8_add_dc_9bpp_c: 227.8
    h264_idct8_add_dc_9bpp_rvv_i64: 51.8
    h264_idct8_add_dc_10bpp_c: 227.8
    h264_idct8_add_dc_10bpp_rvv_i64: 42.5
    h264_idct8_add_dc_12bpp_c: 218.5
    h264_idct8_add_dc_12bpp_rvv_i64: 51.8
    h264_idct8_add_dc_14bpp_c: 227.8
    h264_idct8_add_dc_14bpp_rvv_i64: 42.5
  • closed

  • Please register or sign in to reply
    Loading