Skip to content
  1. Nov 05, 2024
    • Nathan E. Egge's avatar
      riscv64/mc: Only process w*3/4 elements in blend_v · a17c8625
      Nathan E. Egge authored
      Setting VL for this function only impacts the 16bpc performance and only
       on the SpacemiT K1 which has two vector units of length 128b each.
      
      Kendryte K230                Before             After         Delta
      
      blend_v_w2_8bpc_c:        220.0 ( 1.00x)    221.3 ( 1.00x)    0.59%
      blend_v_w2_8bpc_rvv:      145.7 ( 1.51x)    148.2 ( 1.49x)    1.72%
      blend_v_w4_8bpc_c:        942.1 ( 1.00x)    943.7 ( 1.00x)    0.17%
      blend_v_w4_8bpc_rvv:      240.4 ( 3.92x)    242.9 ( 3.89x)    1.04%
      blend_v_w8_8bpc_c:       1782.3 ( 1.00x)   1783.8 ( 1.00x)    0.08%
      blend_v_w8_8bpc_rvv:      252.6 ( 7.06x)    254.9 ( 7.00x)    0.91%
      blend_v_w16_8bpc_c:      3650.9 ( 1.00x)   3647.0 ( 1.00x)   -0.11%
      blend_v_w16_8bpc_rvv:     495.5 ( 7.37x)    494.4 ( 7.38x)   -0.22%
      blend_v_w32_8bpc_c:      7013.0 ( 1.00x)   7018.2 ( 1.00x)    0.07%
      blend_v_w32_8bpc_rvv:     807.9 ( 8.68x)    802.0 ( 8.75x)   -0.73%
      
      blend_v_w2_16bpc_c:       226.1 ( 1.00x)    225.5 ( 1.00x)   -0.27%
      blend_v_w2_16bpc_rvv:     148.6 ( 1.52x)    148.9 ( 1.51x)    0.20%
      blend_v_w4_16bpc_c:      1010.7 ( 1.00x)   1006.7 ( 1.00x)   -0.40%
      blend_v_w4_16bpc_rvv:     306.7 ( 3.30x)    307.4 ( 3.27x)    0.23%
      blend_v_w8_16bpc_c:      1990.2 ( 1.00x)   1996.1 ( 1.00x)    0.30%
      blend_v_w8_16bpc_rvv:     519.5 ( 3.83x)    523.4 ( 3.81x)    0.75%
      blend_v_w16_16bpc_c:     3744.5 ( 1.00x)   3742.4 ( 1.00x)   -0.06%
      blend_v_w16_16bpc_rvv:    899.6 ( 4.16x)    906.4 ( 4.13x)    0.76%
      blend_v_w32_16bpc_c:     7047.5 ( 1.00x)   7079.3 ( 1.00x)    0.45%
      blend_v_w32_16bpc_rvv:   1475.5 ( 4.78x)   1483.3 ( 4.77x)    0.53%
      
      SpacemiT K1                  Before             After         Delta
      
      blend_v_w2_8bpc_c:        216.3 ( 1.00x)    214.4 ( 1.00x)   -0.88%
      blend_v_w2_8bpc_rvv:      144.0 ( 1.50x)    143.6 ( 1.49x)   -0.28%
      blend_v_w4_8bpc_c:        919.8 ( 1.00x)    918.1 ( 1.00x)   -0.18%
      blend_v_w4_8bpc_rvv:      236.6 ( 3.89x)    236.4 ( 3.88x)   -0.08%
      blend_v_w8_8bpc_c:       1739.3 ( 1.00x)   1736.8 ( 1.00x)   -0.14%
      blend_v_w8_8bpc_rvv:      236.8 ( 7.34x)    236.3 ( 7.35x)   -0.21%
      blend_v_w16_8bpc_c:      3374.7 ( 1.00x)   3374.9 ( 1.00x)    0.01%
      blend_v_w16_8bpc_rvv:     297.0 (11.36x)    296.8 (11.37x)   -0.07%
      blend_v_w32_8bpc_c:      6647.5 ( 1.00x)   6645.5 ( 1.00x)   -0.03%
      blend_v_w32_8bpc_rvv:     403.3 (16.48x)    402.4 (16.51x)   -0.22%
      
      blend_v_w2_16bpc_c:       221.4 ( 1.00x)    220.1 ( 1.00x)   -0.59%
      blend_v_w2_16bpc_rvv:     146.3 ( 1.51x)    147.3 ( 1.49x)    0.68%
      blend_v_w4_16bpc_c:       973.3 ( 1.00x)    972.7 ( 1.00x)   -0.06%
      blend_v_w4_16bpc_rvv:     280.3 ( 3.47x)    282.1 ( 3.45x)    0.64%
      blend_v_w8_16bpc_c:      1814.8 ( 1.00x)   1816.2 ( 1.00x)    0.08%
      blend_v_w8_16bpc_rvv:     376.6 ( 4.82x)    376.9 ( 4.82x)    0.08%
      blend_v_w16_16bpc_c:     3485.5 ( 1.00x)   3485.5 ( 1.00x)    0.00%
      blend_v_w16_16bpc_rvv:    531.1 ( 6.56x)    525.6 ( 6.63x)   -1.04%
      blend_v_w32_16bpc_c:     6788.3 ( 1.00x)   6778.8 ( 1.00x)   -0.14%
      blend_v_w32_16bpc_rvv:    904.5 ( 7.51x)    854.6 ( 7.93x)   -5.52%
      a17c8625
  2. Nov 04, 2024
    • Nathan E. Egge's avatar
      riscv64/mc16: Unroll 16bpc RVV blend_v 2x · 907dd871
      Nathan E. Egge authored
      Kendryte K230                Before             After         Delta
      
      blend_v_w2_16bpc_c:       225.8 ( 1.00x)    225.7 ( 1.00x)   -0.04%
      blend_v_w2_16bpc_rvv:     194.7 ( 1.16x)    148.6 ( 1.52x)  -23.68%
      blend_v_w4_16bpc_c:      1011.3 ( 1.00x)   1005.8 ( 1.00x)   -0.54%
      blend_v_w4_16bpc_rvv:     387.2 ( 2.61x)    305.4 ( 3.29x)  -21.13%
      blend_v_w8_16bpc_c:      1878.5 ( 1.00x)   1872.7 ( 1.00x)   -0.31%
      blend_v_w8_16bpc_rvv:     475.3 ( 3.95x)    435.6 ( 4.30x)   -8.35%
      blend_v_w16_16bpc_c:     3601.9 ( 1.00x)   3601.6 ( 1.00x)   -0.01%
      blend_v_w16_16bpc_rvv:    891.2 ( 4.04x)    892.7 ( 4.03x)    0.17%
      blend_v_w32_16bpc_c:     7043.7 ( 1.00x)   7058.8 ( 1.00x)    0.21%
      blend_v_w32_16bpc_rvv:   1384.5 ( 5.09x)   1478.0 ( 4.78x)    6.75%
      
      SpacemiT K1                  Before             After         Delta
      
      blend_v_w2_16bpc_c:       222.6 ( 1.00x)    220.5 ( 1.00x)   -0.94%
      blend_v_w2_16bpc_rvv:     195.7 ( 1.14x)    146.6 ( 1.50x)  -25.09%
      blend_v_w4_16bpc_c:       972.3 ( 1.00x)    972.0 ( 1.00x)   -0.03%
      blend_v_w4_16bpc_rvv:     349.1 ( 2.79x)    281.9 ( 3.45x)  -19.25%
      blend_v_w8_16bpc_c:      1812.1 ( 1.00x)   1813.0 ( 1.00x)    0.05%
      blend_v_w8_16bpc_rvv:     481.5 ( 3.76x)    376.0 ( 4.82x)  -21.91%
      blend_v_w16_16bpc_c:     3488.4 ( 1.00x)   3484.6 ( 1.00x)   -0.11%
      blend_v_w16_16bpc_rvv:    608.7 ( 5.73x)    523.4 ( 6.66x)  -14.01%
      blend_v_w32_16bpc_c:     6795.3 ( 1.00x)   6792.4 ( 1.00x)   -0.04%
      blend_v_w32_16bpc_rvv:    934.8 ( 7.27x)    907.3 ( 7.49x)   -2.94%
      907dd871
    • Nathan E. Egge's avatar
      riscv64/mc16: Branchless vsetvl in blend_v function · 9710e7de
      Nathan E. Egge authored
      Kendryte K230                Before             After         Delta
      
      blend_v_w2_16bpc_c:       226.0 ( 1.00x)    226.1 ( 1.00x)    0.04%
      blend_v_w2_16bpc_rvv:     194.0 ( 1.16x)    193.9 ( 1.17x)   -0.05%
      blend_v_w4_16bpc_c:      1011.8 ( 1.00x)   1009.4 ( 1.00x)   -0.24%
      blend_v_w4_16bpc_rvv:     392.7 ( 2.58x)    390.8 ( 2.58x)   -0.48%
      blend_v_w8_16bpc_c:      1987.9 ( 1.00x)   1988.0 ( 1.00x)    0.01%
      blend_v_w8_16bpc_rvv:     561.5 ( 3.54x)    560.2 ( 3.55x)   -0.23%
      blend_v_w16_16bpc_c:     3738.1 ( 1.00x)   3739.1 ( 1.00x)    0.03%
      blend_v_w16_16bpc_rvv:    934.1 ( 4.00x)    932.2 ( 4.01x)   -0.20%
      blend_v_w32_16bpc_c:     7031.0 ( 1.00x)   7030.1 ( 1.00x)   -0.01%
      blend_v_w32_16bpc_rvv:   1403.3 ( 5.01x)   1395.8 ( 5.04x)   -0.53%
      
      SpacemiT K1                  Before             After         Delta
      
      blend_v_w2_16bpc_c:       221.0 ( 1.00x)    221.2 ( 1.00x)    0.09%
      blend_v_w2_16bpc_rvv:     195.2 ( 1.13x)    196.0 ( 1.13x)    0.41%
      blend_v_w4_16bpc_c:       969.8 ( 1.00x)    971.9 ( 1.00x)    0.22%
      blend_v_w4_16bpc_rvv:     348.8 ( 2.78x)    349.1 ( 2.78x)    0.09%
      blend_v_w8_16bpc_c:      1812.6 ( 1.00x)   1814.9 ( 1.00x)    0.13%
      blend_v_w8_16bpc_rvv:     486.1 ( 3.73x)    484.3 ( 3.75x)   -0.37%
      blend_v_w16_16bpc_c:     3483.0 ( 1.00x)   3485.1 ( 1.00x)    0.06%
      blend_v_w16_16bpc_rvv:    608.7 ( 5.72x)    607.4 ( 5.74x)   -0.21%
      blend_v_w32_16bpc_c:     6791.8 ( 1.00x)   6794.2 ( 1.00x)    0.04%
      blend_v_w32_16bpc_rvv:    940.6 ( 7.22x)    942.1 ( 7.21x)    0.16%
      9710e7de
    • Nathan E. Egge's avatar
      riscv64/mc16: Add VLEN=256 8bpc RVV blend_v function · 28d1c217
      Nathan E. Egge authored
      SpacemiT K1                  Before             After         Delta
      
      blend_v_w2_16bpc_c:       221.5 ( 1.00x)    220.3 ( 1.00x)   -0.54%
      blend_v_w2_16bpc_rvv:     193.5 ( 1.14x)    194.3 ( 1.13x)    0.41%
      blend_v_w4_16bpc_c:       968.8 ( 1.00x)    967.2 ( 1.00x)   -0.17%
      blend_v_w4_16bpc_rvv:     442.2 ( 2.19x)    347.4 ( 2.78x)  -21.44%
      blend_v_w8_16bpc_c:      1809.4 ( 1.00x)   1811.2 ( 1.00x)    0.10%
      blend_v_w8_16bpc_rvv:     557.4 ( 3.25x)    483.2 ( 3.75x)  -13.31%
      blend_v_w16_16bpc_c:     3481.4 ( 1.00x)   3473.4 ( 1.00x)   -0.23%
      blend_v_w16_16bpc_rvv:    844.3 ( 4.12x)    603.1 ( 5.76x)  -28.57%
      blend_v_w32_16bpc_c:     6783.1 ( 1.00x)   6749.8 ( 1.00x)   -0.49%
      blend_v_w32_16bpc_rvv:   1406.1 ( 4.82x)    919.4 ( 7.34x)  -34.61%
      28d1c217
    • Nathan E. Egge's avatar
      riscv64/mc16: Add 16bpc RVV blend_v function · aa2deb89
      Nathan E. Egge authored
      Kendryte K230
      
      blend_v_w2_16bpc_c:       226.5 ( 1.00x)
      blend_v_w2_16bpc_rvv:     192.2 ( 1.18x)
      blend_v_w4_16bpc_c:      1010.3 ( 1.00x)
      blend_v_w4_16bpc_rvv:     390.5 ( 2.59x)
      blend_v_w8_16bpc_c:      1994.2 ( 1.00x)
      blend_v_w8_16bpc_rvv:     561.7 ( 3.55x)
      blend_v_w16_16bpc_c:     3737.9 ( 1.00x)
      blend_v_w16_16bpc_rvv:    928.0 ( 4.03x)
      blend_v_w32_16bpc_c:     7064.7 ( 1.00x)
      blend_v_w32_16bpc_rvv:   1428.9 ( 4.94x)
      
      SpacemiT K1
      
      blend_v_w2_16bpc_c:       220.8 ( 1.00x)
      blend_v_w2_16bpc_rvv:     193.5 ( 1.14x)
      blend_v_w4_16bpc_c:       967.3 ( 1.00x)
      blend_v_w4_16bpc_rvv:     439.5 ( 2.20x)
      blend_v_w8_16bpc_c:      1810.2 ( 1.00x)
      blend_v_w8_16bpc_rvv:     555.3 ( 3.26x)
      blend_v_w16_16bpc_c:     3476.4 ( 1.00x)
      blend_v_w16_16bpc_rvv:    830.9 ( 4.18x)
      blend_v_w32_16bpc_c:     6772.9 ( 1.00x)
      blend_v_w32_16bpc_rvv:   1356.3 ( 4.99x)
      aa2deb89
  3. Oct 31, 2024
    • Nathan E. Egge's avatar
      riscv64/mc16: Unroll 16bpc RVV blend 2x · c783088f
      Nathan E. Egge authored
      Kendryte K230              Before               After         Delta
      
      blend_w4_16bpc_c:       210.0 ( 1.00x)      208.9 ( 1.00x)   -0.52%
      blend_w4_16bpc_rvv:      88.5 ( 2.37x)       66.2 ( 3.15x)  -25.20%
      blend_w8_16bpc_c:       614.1 ( 1.00x)      613.5 ( 1.00x)   -0.10%
      blend_w8_16bpc_rvv:     143.1 ( 4.29x)      126.9 ( 4.83x)  -11.32%
      blend_w16_16bpc_c:     2371.2 ( 1.00x)     2371.3 ( 1.00x)    0.00%
      blend_w16_16bpc_rvv:    461.1 ( 5.14x)      413.2 ( 5.74x)  -10.39%
      blend_w32_16bpc_c:     5998.4 ( 1.00x)     5998.4 ( 1.00x)    0.00%
      blend_w32_16bpc_rvv:    978.4 ( 6.13x)     1013.1 ( 5.92x)    3.55%
      
      SpacemiT K1                Before               After         Delta
      
      blend_w4_16bpc_c:       205.8 ( 1.00x)      205.9 ( 1.00x)    0.05%
      blend_w4_16bpc_rvv:      80.9 ( 2.54x)       64.9 ( 3.17x)  -19.78%
      blend_w8_16bpc_c:       599.9 ( 1.00x)      599.9 ( 1.00x)    0.00%
      blend_w8_16bpc_rvv:     134.4 ( 4.46x)      101.9 ( 5.89x)  -24.18%
      blend_w16_16bpc_c:     2316.5 ( 1.00x)     2316.5 ( 1.00x)    0.00%
      blend_w16_16bpc_rvv:    302.0 ( 7.67x)      262.8 ( 8.81x)  -12.98%
      blend_w32_16bpc_c:     5861.9 ( 1.00x)     5861.4 ( 1.00x)   -0.01%
      blend_w32_16bpc_rvv:    589.6 ( 9.94x)      602.2 ( 9.73x)    2.14%
      c783088f
    • Nathan E. Egge's avatar
      riscv64/mc16: Branchless vsetvl in blend function · 67c60d76
      Nathan E. Egge authored
      Kendryte K230              Before               After         Delta
      
      blend_w4_16bpc_c:       208.8 ( 1.00x)      209.9 ( 1.00x)    0.53%
      blend_w4_16bpc_rvv:      85.9 ( 2.43x)       88.6 ( 2.37x)    3.14%
      blend_w8_16bpc_c:       613.2 ( 1.00x)      614.3 ( 1.00x)    0.18%
      blend_w8_16bpc_rvv:     145.4 ( 4.22x)      143.1 ( 4.29x)   -1.58%
      blend_w16_16bpc_c:     2371.9 ( 1.00x)     2373.6 ( 1.00x)    0.07%
      blend_w16_16bpc_rvv:    464.0 ( 5.11x)      461.2 ( 5.15x)   -0.60%
      blend_w32_16bpc_c:     6005.6 ( 1.00x)     6007.7 ( 1.00x)    0.03%
      blend_w32_16bpc_rvv:    981.6 ( 6.12x)      979.4 ( 6.13x)   -0.22%
      
      SpacemiT K1                Before               After         Delta
      
      blend_w4_16bpc_c:       206.4 ( 1.00x)      205.7 ( 1.00x)   -0.34%
      blend_w4_16bpc_rvv:      79.5 ( 2.60x)       81.0 ( 2.54x)    1.89%
      blend_w8_16bpc_c:       600.7 ( 1.00x)      599.7 ( 1.00x)   -0.17%
      blend_w8_16bpc_rvv:     133.3 ( 4.51x)      134.1 ( 4.47x)    0.60%
      blend_w16_16bpc_c:     2315.9 ( 1.00x)     2315.2 ( 1.00x)   -0.03%
      blend_w16_16bpc_rvv:    305.2 ( 7.59x)      300.7 ( 7.70x)   -1.47%
      blend_w32_16bpc_c:     5861.1 ( 1.00x)     5860.2 ( 1.00x)   -0.02%
      blend_w32_16bpc_rvv:    592.5 ( 9.89x)      589.5 ( 9.94x)   -0.51%
      67c60d76
    • Nathan E. Egge's avatar
      riscv64/mc16: Add VLEN=256 8bpc RVV blend function · 3437a26b
      Nathan E. Egge authored
      SpacemiT K1                Before               After         Delta
      
      blend_w4_16bpc_c:       206.8 ( 1.00x)      206.0 ( 1.00x)   -0.39%
      blend_w4_16bpc_rvv:      95.8 ( 2.16x)       77.8 ( 2.65x)  -18.79%
      blend_w8_16bpc_c:       600.4 ( 1.00x)      600.1 ( 1.00x)   -0.05%
      blend_w8_16bpc_rvv:     161.7 ( 3.71x)      131.3 ( 4.57x)  -18.80%
      blend_w16_16bpc_c:     2317.6 ( 1.00x)     2316.5 ( 1.00x)   -0.05%
      blend_w16_16bpc_rvv:    459.6 ( 5.04x)      302.9 ( 7.65x)  -34.09%
      blend_w32_16bpc_c:     5863.0 ( 1.00x)     5863.3 ( 1.00x)    0.01%
      blend_w32_16bpc_rvv:    992.7 ( 5.91x)      578.1 (10.14x)  -41.76%
      3437a26b
  4. Oct 29, 2024
    • Nathan E. Egge's avatar
      meson: Move riscv64 8bpc only files into bitdepth sources · e542f661
      Nathan E. Egge authored
      The cdef.S, itx.S and mc.S files contain only 8bpc implementations and
       should be compiled only when building with -Dbitdepths=8 configuration.
      e542f661
    • Nathan E. Egge's avatar
      riscv64/mc16: Add 16bpc RVV blend function · ca489d8a
      Nathan E. Egge authored and Luca Barbato's avatar Luca Barbato committed
      Kendryte K230
      
      blend_w4_16bpc_c:        214.4 ( 1.00x)
      blend_w4_16bpc_rvv:       90.2 ( 2.38x)
      blend_w8_16bpc_c:        618.9 ( 1.00x)
      blend_w8_16bpc_rvv:      147.4 ( 4.20x)
      blend_w16_16bpc_c:      2376.5 ( 1.00x)
      blend_w16_16bpc_rvv:     466.0 ( 5.10x)
      blend_w32_16bpc_c:      6008.6 ( 1.00x)
      blend_w32_16bpc_rvv:     985.0 ( 6.10x)
      
      SpacemiT K1
      
      blend_w4_16bpc_c:        204.9 ( 1.00x)
      blend_w4_16bpc_rvv:       88.3 ( 2.32x)
      blend_w8_16bpc_c:        598.5 ( 1.00x)
      blend_w8_16bpc_rvv:      155.3 ( 3.85x)
      blend_w16_16bpc_c:      2315.4 ( 1.00x)
      blend_w16_16bpc_rvv:     444.4 ( 5.21x)
      blend_w32_16bpc_c:      5860.1 ( 1.00x)
      blend_w32_16bpc_rvv:     993.0 ( 5.90x)
      ca489d8a
  5. Oct 28, 2024
  6. Oct 21, 2024
    • Henrik Gramner's avatar
      x86: Improve SSSE3 SGR asm · ef4aff75
      Henrik Gramner authored
       * Use the same approach as AVX2 of using floating-point reciprocal
         instructions to replace dav1d_sgr_x_by_x[] table lookups.
      
       * Optimize clipping of p-values in the 10bpc code.
      
       * Rename some macros to clarify their functionality.
      
       * Implement various minor tweaks.
      ef4aff75
  7. Oct 18, 2024
  8. Oct 17, 2024
  9. Oct 16, 2024
    • Nathan E. Egge's avatar
      NEWS: add itx to riscv list · c3fa1db3
      Nathan E. Egge authored
      c3fa1db3
    • Nathan E. Egge's avatar
      riscv64/itx: Replace vwadd+vnsra with vnclip · 789a1f65
      Nathan E. Egge authored
      The vnclip instruction does a fixed-point saturating add then shift and
       can replace vwadd followed by vnsra in idct_4, idct_8, idct_16, iadst_8
       and iadst_16.
      Including 572c5a66 (which applies the same change to iadst_4) these
       commits give the following average improvements across all modified 2D
       transform functions:
      
                Kendryte K230     SpacemiT K1
      
         4x4       -5.50%           -4.44%
         8x8       -9.78%           -7.62%
        16x16      -9.70%           -9.04%
         4x8       -8.39%           -7.54%
         8x4       -8.10%           -4.66%
         4x16      -8.16%           -7.74%
        16x4       -8.07%           -6.96%
         8x16      -9.11%           -7.43%
        16x8       -9.87%           -7.81%
      
      Kendryte K230                                      Old     New     Delta
      
      inv_txfm_add_4x4_adst_adst_0_8bpc_rvv              99.0    93.4   -5.66%
      inv_txfm_add_4x4_adst_adst_1_8bpc_rvv              99.0    93.4   -5.66%
      inv_txfm_add_4x4_adst_dct_0_8bpc_rvv               93.4    87.2   -6.64%
      inv_txfm_add_4x4_adst_dct_1_8bpc_rvv               93.5    87.2   -6.74%
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv         100.3    94.9   -5.38%
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv         100.3    94.9   -5.38%
      inv_txfm_add_4x4_adst_identity_0_8bpc_rvv          80.5    77.2   -4.10%
      inv_txfm_add_4x4_adst_identity_1_8bpc_rvv          80.5    77.2   -4.10%
      inv_txfm_add_4x4_dct_adst_0_8bpc_rvv               94.1    88.5   -5.95%
      inv_txfm_add_4x4_dct_adst_1_8bpc_rvv               94.1    88.5   -5.95%
      inv_txfm_add_4x4_dct_dct_0_8bpc_rvv                40.3    40.3    0.00%
      inv_txfm_add_4x4_dct_dct_1_8bpc_rvv                92.2    82.1  -10.95%
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv           95.3    89.9   -5.67%
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv           95.3    89.9   -5.67%
      inv_txfm_add_4x4_dct_identity_0_8bpc_rvv           75.5    73.3   -2.91%
      inv_txfm_add_4x4_dct_identity_1_8bpc_rvv           75.5    73.3   -2.91%
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv         100.3    94.7   -5.58%
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv         100.3    94.7   -5.58%
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv           94.8    88.4   -6.75%
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv           94.8    88.5   -6.65%
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv     105.0    96.0   -8.57%
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv     105.0    95.9   -8.67%
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv      81.6    78.5   -3.80%
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv      81.6    78.4   -3.92%
      inv_txfm_add_4x4_identity_adst_0_8bpc_rvv          80.3    77.8   -3.11%
      inv_txfm_add_4x4_identity_adst_1_8bpc_rvv          80.3    77.8   -3.11%
      inv_txfm_add_4x4_identity_dct_0_8bpc_rvv           77.2    71.7   -7.12%
      inv_txfm_add_4x4_identity_dct_1_8bpc_rvv           77.2    71.7   -7.12%
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv      81.5    79.2   -2.82%
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv      81.6    79.2   -2.94%
      inv_txfm_add_4x4_identity_identity_0_8bpc_rvv      62.8    61.6   -1.91%
      inv_txfm_add_4x4_identity_identity_1_8bpc_rvv      62.8    61.6   -1.91%
      inv_txfm_add_4x4_wht_wht_0_8bpc_rvv                67.8    67.8    0.00%
      inv_txfm_add_4x4_wht_wht_1_8bpc_rvv                67.8    67.8    0.00%
      
      inv_txfm_add_8x8_adst_adst_0_8bpc_rvv             403.1   356.1  -11.66%
      inv_txfm_add_8x8_adst_adst_1_8bpc_rvv             403.1   356.0  -11.68%
      inv_txfm_add_8x8_adst_dct_0_8bpc_rvv              360.2   323.2  -10.27%
      inv_txfm_add_8x8_adst_dct_1_8bpc_rvv              360.2   323.2  -10.27%
      inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv         405.2   358.4  -11.55%
      inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv         405.2   358.4  -11.55%
      inv_txfm_add_8x8_adst_identity_0_8bpc_rvv         284.3   261.0   -8.20%
      inv_txfm_add_8x8_adst_identity_1_8bpc_rvv         284.4   260.9   -8.26%
      inv_txfm_add_8x8_dct_adst_0_8bpc_rvv              360.2   322.0  -10.61%
      inv_txfm_add_8x8_dct_adst_1_8bpc_rvv              360.0   321.9  -10.58%
      inv_txfm_add_8x8_dct_dct_0_8bpc_rvv                76.6    77.0    0.52%
      inv_txfm_add_8x8_dct_dct_1_8bpc_rvv               317.2   289.0   -8.89%
      inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv          363.7   324.3  -10.83%
      inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv          363.8   324.3  -10.86%
      inv_txfm_add_8x8_dct_identity_0_8bpc_rvv          241.2   226.9   -5.93%
      inv_txfm_add_8x8_dct_identity_1_8bpc_rvv          241.3   227.0   -5.93%
      inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv         404.9   358.0  -11.58%
      inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv         405.0   358.1  -11.58%
      inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv          365.1   323.8  -11.31%
      inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv          365.2   323.9  -11.31%
      inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv     407.2   359.6  -11.69%
      inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv     406.4   359.5  -11.54%
      inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv     285.8   261.9   -8.36%
      inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv     285.9   261.8   -8.43%
      inv_txfm_add_8x8_identity_adst_0_8bpc_rvv         269.9   244.5   -9.41%
      inv_txfm_add_8x8_identity_adst_1_8bpc_rvv         269.8   244.5   -9.38%
      inv_txfm_add_8x8_identity_dct_0_8bpc_rvv          225.5   209.6   -7.05%
      inv_txfm_add_8x8_identity_dct_1_8bpc_rvv          225.6   209.5   -7.14%
      inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv     270.5   246.5   -8.87%
      inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv     270.5   246.5   -8.87%
      inv_txfm_add_8x8_identity_identity_0_8bpc_rvv     146.5   145.4   -0.75%
      inv_txfm_add_8x8_identity_identity_1_8bpc_rvv     146.4   145.4   -0.68%
      
      inv_txfm_add_16x16_adst_adst_0_8bpc_rvv          1363.4  1212.0  -11.10%
      inv_txfm_add_16x16_adst_adst_1_8bpc_rvv          1363.6  1212.2  -11.10%
      inv_txfm_add_16x16_adst_adst_2_8bpc_rvv          1813.7  1601.4  -11.71%
      inv_txfm_add_16x16_adst_dct_0_8bpc_rvv           1185.9  1074.6   -9.39%
      inv_txfm_add_16x16_adst_dct_1_8bpc_rvv           1186.0  1074.7   -9.38%
      inv_txfm_add_16x16_adst_dct_2_8bpc_rvv           1639.5  1468.9  -10.41%
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv      1374.8  1214.8  -11.64%
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv      1374.7  1214.6  -11.65%
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv      1819.3  1610.9  -11.45%
      inv_txfm_add_16x16_dct_adst_0_8bpc_rvv           1283.3  1139.1  -11.24%
      inv_txfm_add_16x16_dct_adst_1_8bpc_rvv           1283.2  1139.2  -11.22%
      inv_txfm_add_16x16_dct_adst_2_8bpc_rvv           1632.4  1471.9   -9.83%
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv             160.9   158.7   -1.37%
      inv_txfm_add_16x16_dct_dct_1_8bpc_rvv            1099.5   997.1   -9.31%
      inv_txfm_add_16x16_dct_dct_2_8bpc_rvv            1465.3  1335.2   -8.88%
      inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv       1286.8  1143.2  -11.16%
      inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv       1286.8  1143.3  -11.15%
      inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv       1638.6  1473.5  -10.08%
      inv_txfm_add_16x16_dct_identity_0_8bpc_rvv        806.6   783.3   -2.89%
      inv_txfm_add_16x16_dct_identity_1_8bpc_rvv        806.7   783.4   -2.89%
      inv_txfm_add_16x16_dct_identity_2_8bpc_rvv       1163.1  1105.3   -4.97%
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv      1374.3  1216.0  -11.52%
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv      1374.3  1216.2  -11.50%
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv      1817.5  1609.7  -11.43%
      inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv       1190.4  1073.8   -9.80%
      inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv       1190.4  1073.9   -9.79%
      inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv       1640.4  1472.6  -10.23%
      inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv  1376.0  1224.2  -11.03%
      inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv  1376.0  1224.1  -11.04%
      inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv  1829.3  1616.6  -11.63%
      inv_txfm_add_16x16_identity_dct_0_8bpc_rvv        952.9   882.0   -7.44%
      inv_txfm_add_16x16_identity_dct_1_8bpc_rvv        952.8   881.9   -7.44%
      inv_txfm_add_16x16_identity_dct_2_8bpc_rvv       1172.0  1100.1   -6.13%
      inv_txfm_add_16x16_identity_identity_0_8bpc_rvv   657.6   659.8    0.33%
      inv_txfm_add_16x16_identity_identity_1_8bpc_rvv   657.6   659.7    0.32%
      inv_txfm_add_16x16_identity_identity_2_8bpc_rvv   876.2   878.1    0.22%
      
      inv_txfm_add_4x8_adst_adst_0_8bpc_rvv             197.3   178.0   -9.78%
      inv_txfm_add_4x8_adst_adst_1_8bpc_rvv             197.4   178.0   -9.83%
      inv_txfm_add_4x8_adst_dct_0_8bpc_rvv              174.9   159.9   -8.58%
      inv_txfm_add_4x8_adst_dct_1_8bpc_rvv              174.9   159.9   -8.58%
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv         199.2   180.2   -9.54%
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv         199.2   180.2   -9.54%
      inv_txfm_add_4x8_adst_identity_0_8bpc_rvv         123.3   118.0   -4.30%
      inv_txfm_add_4x8_adst_identity_1_8bpc_rvv         123.3   118.0   -4.30%
      inv_txfm_add_4x8_dct_adst_0_8bpc_rvv              191.1   171.8  -10.10%
      inv_txfm_add_4x8_dct_adst_1_8bpc_rvv              191.1   171.7  -10.15%
      inv_txfm_add_4x8_dct_dct_0_8bpc_rvv               168.9   153.6   -9.06%
      inv_txfm_add_4x8_dct_dct_1_8bpc_rvv               169.0   153.6   -9.11%
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv          193.0   173.9   -9.90%
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv          193.0   173.9   -9.90%
      inv_txfm_add_4x8_dct_identity_0_8bpc_rvv          117.0   111.7   -4.53%
      inv_txfm_add_4x8_dct_identity_1_8bpc_rvv          117.0   111.7   -4.53%
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv         198.0   178.6   -9.80%
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv         198.0   178.6   -9.80%
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv          175.8   160.5   -8.70%
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv          175.8   160.5   -8.70%
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv     199.9   180.5   -9.70%
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv     199.9   180.5   -9.70%
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv     123.6   118.6   -4.05%
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv     123.6   118.6   -4.05%
      inv_txfm_add_4x8_identity_adst_0_8bpc_rvv         171.3   154.2   -9.98%
      inv_txfm_add_4x8_identity_adst_1_8bpc_rvv         171.3   154.2   -9.98%
      inv_txfm_add_4x8_identity_dct_0_8bpc_rvv          148.6   136.5   -8.14%
      inv_txfm_add_4x8_identity_dct_1_8bpc_rvv          148.6   136.5   -8.14%
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv     173.1   156.4   -9.65%
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv     173.2   156.4   -9.70%
      inv_txfm_add_4x8_identity_identity_0_8bpc_rvv      94.3    94.2   -0.11%
      inv_txfm_add_4x8_identity_identity_1_8bpc_rvv      94.2    94.2    0.00%
      
      inv_txfm_add_8x4_adst_adst_0_8bpc_rvv             201.2   188.4   -6.36%
      inv_txfm_add_8x4_adst_adst_1_8bpc_rvv             201.2   188.4   -6.36%
      inv_txfm_add_8x4_adst_dct_0_8bpc_rvv              194.9   175.7   -9.85%
      inv_txfm_add_8x4_adst_dct_1_8bpc_rvv              194.9   175.7   -9.85%
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_adst_identity_0_8bpc_rvv         170.1   155.7   -8.47%
      inv_txfm_add_8x4_adst_identity_1_8bpc_rvv         170.1   155.7   -8.47%
      inv_txfm_add_8x4_dct_adst_0_8bpc_rvv              178.0   162.1   -8.93%
      inv_txfm_add_8x4_dct_adst_1_8bpc_rvv              178.0   162.1   -8.93%
      inv_txfm_add_8x4_dct_dct_0_8bpc_rvv               172.8   157.0   -9.14%
      inv_txfm_add_8x4_dct_dct_1_8bpc_rvv               172.9   157.0   -9.20%
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv          180.3   163.7   -9.21%
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv          180.3   163.7   -9.21%
      inv_txfm_add_8x4_dct_identity_0_8bpc_rvv          147.9   137.9   -6.76%
      inv_txfm_add_8x4_dct_identity_1_8bpc_rvv          147.9   137.9   -6.76%
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv         202.4   182.3   -9.93%
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv          196.3   175.9  -10.39%
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv          196.3   175.9  -10.39%
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv     203.7   183.4   -9.97%
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv     203.7   183.4   -9.97%
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv     171.1   155.9   -8.88%
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv     171.1   155.9   -8.88%
      inv_txfm_add_8x4_identity_adst_0_8bpc_rvv         126.8   120.9   -4.65%
      inv_txfm_add_8x4_identity_adst_1_8bpc_rvv         126.8   120.9   -4.65%
      inv_txfm_add_8x4_identity_dct_0_8bpc_rvv          121.5   117.0   -3.70%
      inv_txfm_add_8x4_identity_dct_1_8bpc_rvv          121.6   117.0   -3.78%
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv     129.1   122.3   -5.27%
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv     129.1   122.3   -5.27%
      inv_txfm_add_8x4_identity_identity_0_8bpc_rvv      98.5    95.7   -2.84%
      inv_txfm_add_8x4_identity_identity_1_8bpc_rvv      98.5    95.7   -2.84%
      
      inv_txfm_add_4x16_adst_adst_0_8bpc_rvv            384.4   344.6  -10.35%
      inv_txfm_add_4x16_adst_adst_1_8bpc_rvv            384.5   344.6  -10.38%
      inv_txfm_add_4x16_adst_adst_2_8bpc_rvv            429.3   387.3   -9.78%
      inv_txfm_add_4x16_adst_dct_0_8bpc_rvv             333.7   304.3   -8.81%
      inv_txfm_add_4x16_adst_dct_1_8bpc_rvv             333.7   304.2   -8.84%
      inv_txfm_add_4x16_adst_dct_2_8bpc_rvv             381.2   354.2   -7.08%
      inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv        385.7   349.1   -9.49%
      inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv        385.7   349.1   -9.49%
      inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv        433.0   389.3  -10.09%
      inv_txfm_add_4x16_adst_identity_0_8bpc_rvv        251.6   244.2   -2.94%
      inv_txfm_add_4x16_adst_identity_1_8bpc_rvv        251.5   244.1   -2.94%
      inv_txfm_add_4x16_adst_identity_2_8bpc_rvv        300.4   289.6   -3.60%
      inv_txfm_add_4x16_dct_adst_0_8bpc_rvv             378.5   335.6  -11.33%
      inv_txfm_add_4x16_dct_adst_1_8bpc_rvv             378.5   335.5  -11.36%
      inv_txfm_add_4x16_dct_adst_2_8bpc_rvv             420.6   369.5  -12.15%
      inv_txfm_add_4x16_dct_dct_0_8bpc_rvv              323.5   295.3   -8.72%
      inv_txfm_add_4x16_dct_dct_1_8bpc_rvv              323.2   295.2   -8.66%
      inv_txfm_add_4x16_dct_dct_2_8bpc_rvv              362.9   333.0   -8.24%
      inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv         375.3   339.4   -9.57%
      inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv         375.4   339.0   -9.70%
      inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv         414.8   372.2  -10.27%
      inv_txfm_add_4x16_dct_identity_0_8bpc_rvv         240.8   234.7   -2.53%
      inv_txfm_add_4x16_dct_identity_1_8bpc_rvv         240.7   234.7   -2.49%
      inv_txfm_add_4x16_dct_identity_2_8bpc_rvv         283.2   268.0   -5.37%
      inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv        384.2   345.8   -9.99%
      inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv        384.1   345.8   -9.97%
      inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv        432.5   387.7  -10.36%
      inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv         334.9   307.0   -8.33%
      inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv         335.0   307.1   -8.33%
      inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv         386.1   347.2  -10.08%
      inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv    386.7   349.4   -9.65%
      inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv    386.8   349.5   -9.64%
      inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv    436.6   392.9  -10.01%
      inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv    252.4   247.4   -1.98%
      inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv    252.4   247.5   -1.94%
      inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv    302.1   286.7   -5.10%
      inv_txfm_add_4x16_identity_adst_0_8bpc_rvv        348.3   317.4   -8.87%
      inv_txfm_add_4x16_identity_adst_1_8bpc_rvv        348.4   317.5   -8.87%
      inv_txfm_add_4x16_identity_adst_2_8bpc_rvv        361.4   329.0   -8.97%
      inv_txfm_add_4x16_identity_dct_0_8bpc_rvv         301.8   275.8   -8.61%
      inv_txfm_add_4x16_identity_dct_1_8bpc_rvv         301.8   275.8   -8.61%
      inv_txfm_add_4x16_identity_dct_2_8bpc_rvv         312.0   287.4   -7.88%
      inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv    352.2   321.9   -8.60%
      inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv    352.2   322.0   -8.57%
      inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv    363.7   332.5   -8.58%
      inv_txfm_add_4x16_identity_identity_0_8bpc_rvv    215.8   215.0   -0.37%
      inv_txfm_add_4x16_identity_identity_1_8bpc_rvv    215.8   215.1   -0.32%
      inv_txfm_add_4x16_identity_identity_2_8bpc_rvv    228.0   227.0   -0.44%
      
      inv_txfm_add_16x4_adst_adst_0_8bpc_rvv            430.3   388.5   -9.71%
      inv_txfm_add_16x4_adst_adst_1_8bpc_rvv            430.3   388.5   -9.71%
      inv_txfm_add_16x4_adst_adst_2_8bpc_rvv            430.2   388.5   -9.69%
      inv_txfm_add_16x4_adst_dct_0_8bpc_rvv             412.1   374.1   -9.22%
      inv_txfm_add_16x4_adst_dct_1_8bpc_rvv             412.0   374.3   -9.15%
      inv_txfm_add_16x4_adst_dct_2_8bpc_rvv             412.1   374.2   -9.20%
      inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv        432.9   391.0   -9.68%
      inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv        432.8   391.1   -9.63%
      inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv        432.4   391.0   -9.57%
      inv_txfm_add_16x4_adst_identity_0_8bpc_rvv        358.4   332.1   -7.34%
      inv_txfm_add_16x4_adst_identity_1_8bpc_rvv        358.4   332.3   -7.28%
      inv_txfm_add_16x4_adst_identity_2_8bpc_rvv        358.5   332.5   -7.25%
      inv_txfm_add_16x4_dct_adst_0_8bpc_rvv             386.9   347.1  -10.29%
      inv_txfm_add_16x4_dct_adst_1_8bpc_rvv             386.8   347.1  -10.26%
      inv_txfm_add_16x4_dct_adst_2_8bpc_rvv             387.0   346.8  -10.39%
      inv_txfm_add_16x4_dct_dct_0_8bpc_rvv              363.3   330.9   -8.92%
      inv_txfm_add_16x4_dct_dct_1_8bpc_rvv              363.3   330.9   -8.92%
      inv_txfm_add_16x4_dct_dct_2_8bpc_rvv              363.2   331.0   -8.87%
      inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv         383.7   349.8   -8.84%
      inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv         384.3   349.8   -8.98%
      inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv         384.3   349.7   -9.00%
      inv_txfm_add_16x4_dct_identity_0_8bpc_rvv         310.2   288.4   -7.03%
      inv_txfm_add_16x4_dct_identity_1_8bpc_rvv         310.2   288.4   -7.03%
      inv_txfm_add_16x4_dct_identity_2_8bpc_rvv         310.3   288.5   -7.03%
      inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv        434.1   391.5   -9.81%
      inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv        434.1   392.0   -9.70%
      inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv        434.1   392.0   -9.70%
      inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv         423.5   375.5  -11.33%
      inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv         423.5   375.4  -11.36%
      inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv         423.5   375.5  -11.33%
      inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv    438.0   396.1   -9.57%
      inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv    438.1   396.0   -9.61%
      inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv    438.0   395.8   -9.63%
      inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv    361.9   333.0   -7.99%
      inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv    362.4   333.0   -8.11%
      inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv    362.4   333.0   -8.11%
      inv_txfm_add_16x4_identity_adst_0_8bpc_rvv        308.3   296.3   -3.89%
      inv_txfm_add_16x4_identity_adst_1_8bpc_rvv        308.4   296.4   -3.89%
      inv_txfm_add_16x4_identity_adst_2_8bpc_rvv        308.4   296.4   -3.89%
      inv_txfm_add_16x4_identity_dct_0_8bpc_rvv         289.9   279.9   -3.45%
      inv_txfm_add_16x4_identity_dct_1_8bpc_rvv         289.9   280.0   -3.41%
      inv_txfm_add_16x4_identity_dct_2_8bpc_rvv         290.0   279.9   -3.48%
      inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv    311.2   298.9   -3.95%
      inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv    311.1   298.9   -3.92%
      inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv    310.9   298.9   -3.86%
      inv_txfm_add_16x4_identity_identity_0_8bpc_rvv    238.4   243.2    2.01%
      inv_txfm_add_16x4_identity_identity_1_8bpc_rvv    238.4   243.2    2.01%
      inv_txfm_add_16x4_identity_identity_2_8bpc_rvv    238.5   243.2    1.97%
      
      inv_txfm_add_8x16_adst_adst_0_8bpc_rvv            701.5   624.2  -11.02%
      inv_txfm_add_8x16_adst_adst_1_8bpc_rvv            701.6   624.2  -11.03%
      inv_txfm_add_8x16_adst_adst_2_8bpc_rvv            853.5   755.2  -11.52%
      inv_txfm_add_8x16_adst_dct_0_8bpc_rvv             611.1   551.6   -9.74%
      inv_txfm_add_8x16_adst_dct_1_8bpc_rvv             611.2   551.7   -9.73%
      inv_txfm_add_8x16_adst_dct_2_8bpc_rvv             765.0   682.8  -10.75%
      inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv        703.4   629.3  -10.53%
      inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv        703.4   629.5  -10.51%
      inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv        858.1   763.9  -10.98%
      inv_txfm_add_8x16_adst_identity_0_8bpc_rvv        463.7   440.2   -5.07%
      inv_txfm_add_8x16_adst_identity_1_8bpc_rvv        464.3   440.2   -5.19%
      inv_txfm_add_8x16_adst_identity_2_8bpc_rvv        618.6   571.7   -7.58%
      inv_txfm_add_8x16_dct_adst_0_8bpc_rvv             660.3   590.5  -10.57%
      inv_txfm_add_8x16_dct_adst_1_8bpc_rvv             660.2   590.3  -10.59%
      inv_txfm_add_8x16_dct_adst_2_8bpc_rvv             776.2   687.9  -11.38%
      inv_txfm_add_8x16_dct_dct_0_8bpc_rvv              566.9   516.3   -8.93%
      inv_txfm_add_8x16_dct_dct_1_8bpc_rvv              567.1   516.4   -8.94%
      inv_txfm_add_8x16_dct_dct_2_8bpc_rvv              685.9   616.6  -10.10%
      inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv         663.3   593.5  -10.52%
      inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv         663.2   593.5  -10.51%
      inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv         771.7   690.5  -10.52%
      inv_txfm_add_8x16_dct_identity_0_8bpc_rvv         421.3   406.1   -3.61%
      inv_txfm_add_8x16_dct_identity_1_8bpc_rvv         421.3   406.1   -3.61%
      inv_txfm_add_8x16_dct_identity_2_8bpc_rvv         536.6   503.6   -6.15%
      inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv        703.3   627.1  -10.83%
      inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv        703.4   627.2  -10.83%
      inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv        857.7   763.7  -10.96%
      inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv         613.5   552.8   -9.89%
      inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv         613.4   552.7   -9.90%
      inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv         771.0   693.1  -10.10%
      inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv    706.3   631.4  -10.60%
      inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv    706.5   631.7  -10.59%
      inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv    861.1    76.9  -11.17%
      inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv    467.0   443.0   -5.14%
      inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv    467.0   443.0   -5.14%
      inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv    623.7   575.1   -7.79%
      inv_txfm_add_8x16_identity_adst_0_8bpc_rvv        565.6   512.0   -9.48%
      inv_txfm_add_8x16_identity_adst_1_8bpc_rvv        565.6   512.9   -9.32%
      inv_txfm_add_8x16_identity_adst_2_8bpc_rvv        585.6   532.8   -9.02%
      inv_txfm_add_8x16_identity_dct_0_8bpc_rvv         476.4   439.9   -7.66%
      inv_txfm_add_8x16_identity_dct_1_8bpc_rvv         476.4   440.0   -7.64%
      inv_txfm_add_8x16_identity_dct_2_8bpc_rvv         496.3   459.5   -7.41%
      inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv    570.7   516.4   -9.51%
      inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv    570.6   516.3   -9.52%
      inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv    590.2   540.0   -8.51%
      inv_txfm_add_8x16_identity_identity_0_8bpc_rvv    330.9   329.9   -0.30%
      inv_txfm_add_8x16_identity_identity_1_8bpc_rvv    330.9   329.9   -0.30%
      inv_txfm_add_8x16_identity_identity_2_8bpc_rvv    350.8   349.7   -0.31%
      
      inv_txfm_add_16x8_adst_adst_0_8bpc_rvv            855.5   752.1  -12.09%
      inv_txfm_add_16x8_adst_adst_1_8bpc_rvv            855.5   751.9  -12.11%
      inv_txfm_add_16x8_adst_adst_2_8bpc_rvv            855.4   752.1  -12.08%
      inv_txfm_add_16x8_adst_dct_0_8bpc_rvv             765.4   685.5  -10.44%
      inv_txfm_add_16x8_adst_dct_1_8bpc_rvv             765.5   685.3  -10.48%
      inv_txfm_add_16x8_adst_dct_2_8bpc_rvv             765.5   685.5  -10.45%
      inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv        859.2   755.8  -12.03%
      inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv        859.1   756.0  -12.00%
      inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv        859.1   755.9  -12.01%
      inv_txfm_add_16x8_adst_identity_0_8bpc_rvv        612.8   561.9   -8.31%
      inv_txfm_add_16x8_adst_identity_1_8bpc_rvv        612.9   561.9   -8.32%
      inv_txfm_add_16x8_adst_identity_2_8bpc_rvv        612.8   561.9   -8.31%
      inv_txfm_add_16x8_dct_adst_0_8bpc_rvv             765.1   676.0  -11.65%
      inv_txfm_add_16x8_dct_adst_1_8bpc_rvv             765.0   676.2  -11.61%
      inv_txfm_add_16x8_dct_adst_2_8bpc_rvv             765.0   676.2  -11.61%
      inv_txfm_add_16x8_dct_dct_0_8bpc_rvv              674.5   612.0   -9.27%
      inv_txfm_add_16x8_dct_dct_1_8bpc_rvv              674.5   612.1   -9.25%
      inv_txfm_add_16x8_dct_dct_2_8bpc_rvv              674.6   612.0   -9.28%
      inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv         777.2   679.9  -12.52%
      inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv         777.1   680.1  -12.48%
      inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv         777.1   680.0  -12.50%
      inv_txfm_add_16x8_dct_identity_0_8bpc_rvv         522.2   488.2   -6.51%
      inv_txfm_add_16x8_dct_identity_1_8bpc_rvv         522.1   488.2   -6.49%
      inv_txfm_add_16x8_dct_identity_2_8bpc_rvv         522.1   487.5   -6.63%
      inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv        859.2   753.5  -12.30%
      inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv        859.2   753.6  -12.29%
      inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv        859.2   753.5  -12.30%
      inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv         768.9   689.0  -10.39%
      inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv         768.9   689.2  -10.37%
      inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv         768.8   689.2  -10.35%
      inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv    863.0   758.7  -12.09%
      inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv    862.9   758.7  -12.08%
      inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv    863.0   758.6  -12.10%
      inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv    616.5   566.7   -8.08%
      inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv    616.6   566.6   -8.11%
      inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv    616.3   567.0   -8.00%
      inv_txfm_add_16x8_identity_adst_0_8bpc_rvv        618.1   564.5   -8.67%
      inv_txfm_add_16x8_identity_adst_1_8bpc_rvv        618.0   564.5   -8.66%
      inv_txfm_add_16x8_identity_adst_2_8bpc_rvv        617.7   564.6   -8.60%
      inv_txfm_add_16x8_identity_dct_0_8bpc_rvv         527.9   500.6   -5.17%
      inv_txfm_add_16x8_identity_dct_1_8bpc_rvv         527.8   500.7   -5.13%
      inv_txfm_add_16x8_identity_dct_2_8bpc_rvv         527.7   500.7   -5.12%
      inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv    622.3   568.5   -8.65%
      inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv    622.2   568.5   -8.63%
      inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv    622.3   568.4   -8.66%
      inv_txfm_add_16x8_identity_identity_0_8bpc_rvv    373.4   374.4    0.27%
      inv_txfm_add_16x8_identity_identity_1_8bpc_rvv    373.4   374.5    0.29%
      inv_txfm_add_16x8_identity_identity_2_8bpc_rvv    373.4   374.4    0.27%
      
      SpacemiT K1                                        Old     New     Delta
      
      inv_txfm_add_4x4_adst_adst_0_8bpc_rvv             101.0    96.8   -4.16%
      inv_txfm_add_4x4_adst_adst_1_8bpc_rvv             101.1    96.8   -4.25%
      inv_txfm_add_4x4_adst_dct_0_8bpc_rvv               96.8    91.7   -5.27%
      inv_txfm_add_4x4_adst_dct_1_8bpc_rvv               95.9    91.8   -4.28%
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_rvv         102.2    97.9   -4.21%
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_rvv         102.2    97.9   -4.21%
      inv_txfm_add_4x4_adst_identity_0_8bpc_rvv          82.4    80.4   -2.43%
      inv_txfm_add_4x4_adst_identity_1_8bpc_rvv          82.4    80.5   -2.31%
      inv_txfm_add_4x4_dct_adst_0_8bpc_rvv               97.3    92.6   -4.83%
      inv_txfm_add_4x4_dct_adst_1_8bpc_rvv               97.2    92.3   -5.04%
      inv_txfm_add_4x4_dct_dct_0_8bpc_rvv                41.2    41.3    0.24%
      inv_txfm_add_4x4_dct_dct_1_8bpc_rvv                96.0    87.5   -8.85%
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_rvv           98.5    94.5   -4.06%
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_rvv           98.6    94.7   -3.96%
      inv_txfm_add_4x4_dct_identity_0_8bpc_rvv           78.6    76.1   -3.18%
      inv_txfm_add_4x4_dct_identity_1_8bpc_rvv           78.6    76.0   -3.31%
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_rvv         104.3    99.1   -4.99%
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_rvv         104.4    99.1   -5.08%
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_rvv           98.0    94.6   -3.47%
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_rvv           98.1    94.4   -3.77%
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_rvv     104.2    99.2   -4.80%
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_rvv     104.3    99.2   -4.89%
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_rvv      86.9    81.8   -5.87%
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_rvv      87.0    81.9   -5.86%
      inv_txfm_add_4x4_identity_adst_0_8bpc_rvv          86.0    80.8   -6.05%
      inv_txfm_add_4x4_identity_adst_1_8bpc_rvv          85.9    81.4   -5.24%
      inv_txfm_add_4x4_identity_dct_0_8bpc_rvv           78.5    76.1   -3.06%
      inv_txfm_add_4x4_identity_dct_1_8bpc_rvv           78.6    76.1   -3.18%
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_rvv      85.9    82.5   -3.96%
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_rvv      85.9    82.3   -4.19%
      inv_txfm_add_4x4_identity_identity_0_8bpc_rvv      65.9    64.9   -1.52%
      inv_txfm_add_4x4_identity_identity_1_8bpc_rvv      65.9    64.8   -1.67%
      inv_txfm_add_4x4_wht_wht_0_8bpc_rvv                71.2    71.3    0.14%
      inv_txfm_add_4x4_wht_wht_1_8bpc_rvv                71.2    71.3    0.14%
      
      inv_txfm_add_8x8_adst_adst_0_8bpc_rvv             440.6   399.3   -9.37%
      inv_txfm_add_8x8_adst_adst_1_8bpc_rvv             440.6   399.3   -9.37%
      inv_txfm_add_8x8_adst_dct_0_8bpc_rvv              401.7   368.4   -8.29%
      inv_txfm_add_8x8_adst_dct_1_8bpc_rvv              401.8   368.4   -8.31%
      inv_txfm_add_8x8_adst_flipadst_0_8bpc_rvv         442.4   401.2   -9.31%
      inv_txfm_add_8x8_adst_flipadst_1_8bpc_rvv         442.4   401.1   -9.34%
      inv_txfm_add_8x8_adst_identity_0_8bpc_rvv         329.7   310.1   -5.94%
      inv_txfm_add_8x8_adst_identity_1_8bpc_rvv         329.7   310.1   -5.94%
      inv_txfm_add_8x8_dct_adst_0_8bpc_rvv              401.8   367.4   -8.56%
      inv_txfm_add_8x8_dct_adst_1_8bpc_rvv              401.7   367.3   -8.56%
      inv_txfm_add_8x8_dct_dct_0_8bpc_rvv                79.5    80.2    0.88%
      inv_txfm_add_8x8_dct_dct_1_8bpc_rvv               362.1   335.8   -7.26%
      inv_txfm_add_8x8_dct_flipadst_0_8bpc_rvv          405.0   369.2   -8.84%
      inv_txfm_add_8x8_dct_flipadst_1_8bpc_rvv          405.1   369.2   -8.86%
      inv_txfm_add_8x8_dct_identity_0_8bpc_rvv          290.9   278.2   -4.37%
      inv_txfm_add_8x8_dct_identity_1_8bpc_rvv          290.8   278.2   -4.33%
      inv_txfm_add_8x8_flipadst_adst_0_8bpc_rvv         442.5   401.1   -9.36%
      inv_txfm_add_8x8_flipadst_adst_1_8bpc_rvv         442.5   401.2   -9.33%
      inv_txfm_add_8x8_flipadst_dct_0_8bpc_rvv          405.8   369.2   -9.02%
      inv_txfm_add_8x8_flipadst_dct_1_8bpc_rvv          405.8   369.1   -9.04%
      inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_rvv     444.3   403.0   -9.30%
      inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_rvv     444.3   403.1   -9.27%
      inv_txfm_add_8x8_flipadst_identity_0_8bpc_rvv     331.6   310.9   -6.24%
      inv_txfm_add_8x8_flipadst_identity_1_8bpc_rvv     331.6   310.9   -6.24%
      inv_txfm_add_8x8_identity_adst_0_8bpc_rvv         313.3   292.6   -6.61%
      inv_txfm_add_8x8_identity_adst_1_8bpc_rvv         313.1   292.6   -6.55%
      inv_txfm_add_8x8_identity_dct_0_8bpc_rvv          274.5   260.6   -5.06%
      inv_txfm_add_8x8_identity_dct_1_8bpc_rvv          274.4   260.7   -4.99%
      inv_txfm_add_8x8_identity_flipadst_0_8bpc_rvv     315.3   294.4   -6.63%
      inv_txfm_add_8x8_identity_flipadst_1_8bpc_rvv     315.3   294.4   -6.63%
      inv_txfm_add_8x8_identity_identity_0_8bpc_rvv     202.5   202.5    0.00%
      inv_txfm_add_8x8_identity_identity_1_8bpc_rvv     202.6   202.5   -0.05%
      
      inv_txfm_add_16x16_adst_adst_0_8bpc_rvv          1418.8  1268.2  -10.61%
      inv_txfm_add_16x16_adst_adst_1_8bpc_rvv          1418.9  1268.3  -10.61%
      inv_txfm_add_16x16_adst_adst_2_8bpc_rvv          1943.3  1733.6  -10.79%
      inv_txfm_add_16x16_adst_dct_0_8bpc_rvv           1241.7  1134.6   -8.63%
      inv_txfm_add_16x16_adst_dct_1_8bpc_rvv           1241.5  1134.5   -8.62%
      inv_txfm_add_16x16_adst_dct_2_8bpc_rvv           1772.5  1599.8   -9.74%
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_rvv      1429.8  1270.3  -11.16%
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_rvv      1429.7  1270.1  -11.16%
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_rvv      1951.1  1741.4  -10.75%
      inv_txfm_add_16x16_dct_adst_0_8bpc_rvv           1337.8  1195.8  -10.61%
      inv_txfm_add_16x16_dct_adst_1_8bpc_rvv           1337.5  1196.0  -10.58%
      inv_txfm_add_16x16_dct_adst_2_8bpc_rvv           1763.2  1604.6   -9.00%
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv             179.3   181.1    1.00%
      inv_txfm_add_16x16_dct_dct_1_8bpc_rvv            1153.8  1060.7   -8.07%
      inv_txfm_add_16x16_dct_dct_2_8bpc_rvv            1601.6  1470.6   -8.18%
      inv_txfm_add_16x16_dct_flipadst_0_8bpc_rvv       1340.7  1199.8  -10.51%
      inv_txfm_add_16x16_dct_flipadst_1_8bpc_rvv       1340.4  1199.8  -10.49%
      inv_txfm_add_16x16_dct_flipadst_2_8bpc_rvv       1771.2  1606.6   -9.29%
      inv_txfm_add_16x16_dct_identity_0_8bpc_rvv        877.9   854.9   -2.62%
      inv_txfm_add_16x16_dct_identity_1_8bpc_rvv        877.7   855.2   -2.56%
      inv_txfm_add_16x16_dct_identity_2_8bpc_rvv       1311.6  1254.1   -4.38%
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_rvv      1428.2  1270.5  -11.04%
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_rvv      1428.3  1270.6  -11.04%
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_rvv      1947.3  1737.3  -10.78%
      inv_txfm_add_16x16_flipadst_dct_0_8bpc_rvv       1245.8  1133.5   -9.01%
      inv_txfm_add_16x16_flipadst_dct_1_8bpc_rvv       1246.0  1133.7   -9.01%
      inv_txfm_add_16x16_flipadst_dct_2_8bpc_rvv       1769.9  1603.9   -9.38%
      inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_rvv  1428.7  1279.7  -10.43%
      inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_rvv  1428.8  1279.5  -10.45%
      inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_rvv  1960.8  1745.8  -10.96%
      inv_txfm_add_16x16_identity_dct_0_8bpc_rvv       1016.6   948.8   -6.67%
      inv_txfm_add_16x16_identity_dct_1_8bpc_rvv       1016.7   948.8   -6.68%
      inv_txfm_add_16x16_identity_dct_2_8bpc_rvv       1319.8  1247.7   -5.46%
      inv_txfm_add_16x16_identity_identity_0_8bpc_rvv   735.4   736.6    0.16%
      inv_txfm_add_16x16_identity_identity_1_8bpc_rvv   735.3   736.4    0.15%
      inv_txfm_add_16x16_identity_identity_2_8bpc_rvv  1037.8  1036.7   -0.11%
      
      inv_txfm_add_4x8_adst_adst_0_8bpc_rvv             197.2   179.9   -8.77%
      inv_txfm_add_4x8_adst_adst_1_8bpc_rvv             197.1   180.0   -8.68%
      inv_txfm_add_4x8_adst_dct_0_8bpc_rvv              177.5   164.2   -7.49%
      inv_txfm_add_4x8_adst_dct_1_8bpc_rvv              177.5   164.3   -7.44%
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_rvv         199.3   181.8   -8.78%
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_rvv         199.0   181.8   -8.64%
      inv_txfm_add_4x8_adst_identity_0_8bpc_rvv         126.7   121.8   -3.87%
      inv_txfm_add_4x8_adst_identity_1_8bpc_rvv         126.7   121.9   -3.79%
      inv_txfm_add_4x8_dct_adst_0_8bpc_rvv              189.8   172.4   -9.17%
      inv_txfm_add_4x8_dct_adst_1_8bpc_rvv              189.8   172.4   -9.17%
      inv_txfm_add_4x8_dct_dct_0_8bpc_rvv               170.2   156.8   -7.87%
      inv_txfm_add_4x8_dct_dct_1_8bpc_rvv               170.2   156.9   -7.81%
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_rvv          192.6   174.2   -9.55%
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_rvv          192.6   174.2   -9.55%
      inv_txfm_add_4x8_dct_identity_0_8bpc_rvv          119.4   114.3   -4.27%
      inv_txfm_add_4x8_dct_identity_1_8bpc_rvv          119.6   114.2   -4.52%
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_rvv         197.7   180.5   -8.70%
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_rvv         197.8   180.6   -8.70%
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_rvv          178.3   165.0   -7.46%
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_rvv          178.3   164.9   -7.52%
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_rvv     199.7   182.5   -8.61%
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_rvv     200.0   182.4   -8.80%
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_rvv     127.2   122.3   -3.85%
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_rvv     127.3   122.5   -3.77%
      inv_txfm_add_4x8_identity_adst_0_8bpc_rvv         172.1   155.0   -9.94%
      inv_txfm_add_4x8_identity_adst_1_8bpc_rvv         172.1   155.0   -9.94%
      inv_txfm_add_4x8_identity_dct_0_8bpc_rvv          148.7   139.4   -6.25%
      inv_txfm_add_4x8_identity_dct_1_8bpc_rvv          148.7   139.5   -6.19%
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_rvv     171.7   156.8   -8.68%
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_rvv     171.6   156.9   -8.57%
      inv_txfm_add_4x8_identity_identity_0_8bpc_rvv      96.8    96.8    0.00%
      inv_txfm_add_4x8_identity_identity_1_8bpc_rvv      96.7    96.7    0.00%
      
      inv_txfm_add_8x4_adst_adst_0_8bpc_rvv             228.1   220.0   -3.55%
      inv_txfm_add_8x4_adst_adst_1_8bpc_rvv             227.9   219.9   -3.51%
      inv_txfm_add_8x4_adst_dct_0_8bpc_rvv              219.4   206.4   -5.93%
      inv_txfm_add_8x4_adst_dct_1_8bpc_rvv              219.4   206.4   -5.93%
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_rvv         229.4   214.7   -6.41%
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_rvv         229.4   214.8   -6.36%
      inv_txfm_add_8x4_adst_identity_0_8bpc_rvv         195.6   187.6   -4.09%
      inv_txfm_add_8x4_adst_identity_1_8bpc_rvv         195.8   187.6   -4.19%
      inv_txfm_add_8x4_dct_adst_0_8bpc_rvv              207.0   195.2   -5.70%
      inv_txfm_add_8x4_dct_adst_1_8bpc_rvv              206.9   195.2   -5.65%
      inv_txfm_add_8x4_dct_dct_0_8bpc_rvv               199.4   188.2   -5.62%
      inv_txfm_add_8x4_dct_dct_1_8bpc_rvv               199.4   188.5   -5.47%
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_rvv          209.5   196.5   -6.21%
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_rvv          209.7   196.6   -6.25%
      inv_txfm_add_8x4_dct_identity_0_8bpc_rvv          175.7   169.5   -3.53%
      inv_txfm_add_8x4_dct_identity_1_8bpc_rvv          175.9   169.6   -3.58%
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_rvv         229.0   214.7   -6.24%
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_rvv         229.3   214.5   -6.45%
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_rvv          220.9   206.7   -6.43%
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_rvv          220.6   206.5   -6.39%
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_rvv     230.6   215.9   -6.37%
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_rvv     230.7   215.9   -6.42%
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_rvv     196.9   188.9   -4.06%
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_rvv     196.9   188.9   -4.06%
      inv_txfm_add_8x4_identity_adst_0_8bpc_rvv         157.6   154.7   -1.84%
      inv_txfm_add_8x4_identity_adst_1_8bpc_rvv         157.5   154.9   -1.65%
      inv_txfm_add_8x4_identity_dct_0_8bpc_rvv          150.0   147.9   -1.40%
      inv_txfm_add_8x4_identity_dct_1_8bpc_rvv          150.0   147.7   -1.53%
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_rvv     159.6   155.9   -2.32%
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_rvv     159.8   155.6   -2.63%
      inv_txfm_add_8x4_identity_identity_0_8bpc_rvv     128.6   128.8    0.16%
      inv_txfm_add_8x4_identity_identity_1_8bpc_rvv     128.4   129.3    0.70%
      
      inv_txfm_add_4x16_adst_adst_0_8bpc_rvv            373.8   335.9  -10.14%
      inv_txfm_add_4x16_adst_adst_1_8bpc_rvv            373.8   335.7  -10.19%
      inv_txfm_add_4x16_adst_adst_2_8bpc_rvv            417.4   380.0   -8.96%
      inv_txfm_add_4x16_adst_dct_0_8bpc_rvv             328.3   301.7   -8.10%
      inv_txfm_add_4x16_adst_dct_1_8bpc_rvv             328.0   302.0   -7.93%
      inv_txfm_add_4x16_adst_dct_2_8bpc_rvv             374.3   351.3   -6.14%
      inv_txfm_add_4x16_adst_flipadst_0_8bpc_rvv        374.5   339.8   -9.27%
      inv_txfm_add_4x16_adst_flipadst_1_8bpc_rvv        374.3   339.4   -9.32%
      inv_txfm_add_4x16_adst_flipadst_2_8bpc_rvv        422.0   383.8   -9.05%
      inv_txfm_add_4x16_adst_identity_0_8bpc_rvv        248.0   242.9   -2.06%
      inv_txfm_add_4x16_adst_identity_1_8bpc_rvv        248.0   242.2   -2.34%
      inv_txfm_add_4x16_adst_identity_2_8bpc_rvv        298.6   290.3   -2.78%
      inv_txfm_add_4x16_dct_adst_0_8bpc_rvv             370.5   329.4  -11.09%
      inv_txfm_add_4x16_dct_adst_1_8bpc_rvv             370.8   329.0  -11.27%
      inv_txfm_add_4x16_dct_adst_2_8bpc_rvv             409.1   360.9  -11.78%
      inv_txfm_add_4x16_dct_dct_0_8bpc_rvv              321.1   293.7   -8.53%
      inv_txfm_add_4x16_dct_dct_1_8bpc_rvv              321.0   294.3   -8.32%
      inv_txfm_add_4x16_dct_dct_2_8bpc_rvv              357.8   329.8   -7.83%
      inv_txfm_add_4x16_dct_flipadst_0_8bpc_rvv         369.7   332.9   -9.95%
      inv_txfm_add_4x16_dct_flipadst_1_8bpc_rvv         370.4   333.0  -10.10%
      inv_txfm_add_4x16_dct_flipadst_2_8bpc_rvv         405.5   364.9  -10.01%
      inv_txfm_add_4x16_dct_identity_0_8bpc_rvv         241.6   236.6   -2.07%
      inv_txfm_add_4x16_dct_identity_1_8bpc_rvv         241.8   235.6   -2.56%
      inv_txfm_add_4x16_dct_identity_2_8bpc_rvv         281.9   266.9   -5.32%
      inv_txfm_add_4x16_flipadst_adst_0_8bpc_rvv        371.9   337.3   -9.30%
      inv_txfm_add_4x16_flipadst_adst_1_8bpc_rvv        372.2   337.1   -9.43%
      inv_txfm_add_4x16_flipadst_adst_2_8bpc_rvv        419.8   381.5   -9.12%
      inv_txfm_add_4x16_flipadst_dct_0_8bpc_rvv         328.3   302.9   -7.74%
      inv_txfm_add_4x16_flipadst_dct_1_8bpc_rvv         328.4   303.3   -7.64%
      inv_txfm_add_4x16_flipadst_dct_2_8bpc_rvv         380.6   343.7   -9.70%
      inv_txfm_add_4x16_flipadst_flipadst_0_8bpc_rvv    377.7   341.1   -9.69%
      inv_txfm_add_4x16_flipadst_flipadst_1_8bpc_rvv    377.6   341.5   -9.56%
      inv_txfm_add_4x16_flipadst_flipadst_2_8bpc_rvv    423.6   386.7   -8.71%
      inv_txfm_add_4x16_flipadst_identity_0_8bpc_rvv    250.0   245.7   -1.72%
      inv_txfm_add_4x16_flipadst_identity_1_8bpc_rvv    249.3   246.0   -1.32%
      inv_txfm_add_4x16_flipadst_identity_2_8bpc_rvv    296.4   284.7   -3.95%
      inv_txfm_add_4x16_identity_adst_0_8bpc_rvv        343.0   311.2   -9.27%
      inv_txfm_add_4x16_identity_adst_1_8bpc_rvv        342.9   311.0   -9.30%
      inv_txfm_add_4x16_identity_adst_2_8bpc_rvv        354.8   325.0   -8.40%
      inv_txfm_add_4x16_identity_dct_0_8bpc_rvv         298.9   274.9   -8.03%
      inv_txfm_add_4x16_identity_dct_1_8bpc_rvv         298.8   275.0   -7.97%
      inv_txfm_add_4x16_identity_dct_2_8bpc_rvv         310.3   289.1   -6.83%
      inv_txfm_add_4x16_identity_flipadst_0_8bpc_rvv    344.7   314.9   -8.65%
      inv_txfm_add_4x16_identity_flipadst_1_8bpc_rvv    344.5   314.8   -8.62%
      inv_txfm_add_4x16_identity_flipadst_2_8bpc_rvv    358.3   328.6   -8.29%
      inv_txfm_add_4x16_identity_identity_0_8bpc_rvv    219.6   216.1   -1.59%
      inv_txfm_add_4x16_identity_identity_1_8bpc_rvv    218.3   216.3   -0.92%
      inv_txfm_add_4x16_identity_identity_2_8bpc_rvv    231.3   229.6   -0.73%
      
      inv_txfm_add_16x4_adst_adst_0_8bpc_rvv            468.5   428.8   -8.47%
      inv_txfm_add_16x4_adst_adst_1_8bpc_rvv            468.5   428.9   -8.45%
      inv_txfm_add_16x4_adst_adst_2_8bpc_rvv            468.5   428.9   -8.45%
      inv_txfm_add_16x4_adst_dct_0_8bpc_rvv             453.8   414.5   -8.66%
      inv_txfm_add_16x4_adst_dct_1_8bpc_rvv             453.8   414.5   -8.66%
      inv_txfm_add_16x4_adst_dct_2_8bpc_rvv             453.9   414.4   -8.70%
      inv_txfm_add_16x4_adst_flipadst_0_8bpc_rvv        471.0   431.5   -8.39%
      inv_txfm_add_16x4_adst_flipadst_1_8bpc_rvv        471.0   431.3   -8.43%
      inv_txfm_add_16x4_adst_flipadst_2_8bpc_rvv        471.0   431.5   -8.39%
      inv_txfm_add_16x4_adst_identity_0_8bpc_rvv        402.2   375.0   -6.76%
      inv_txfm_add_16x4_adst_identity_1_8bpc_rvv        402.1   375.0   -6.74%
      inv_txfm_add_16x4_adst_identity_2_8bpc_rvv        402.0   375.3   -6.64%
      inv_txfm_add_16x4_dct_adst_0_8bpc_rvv             432.8   392.5   -9.31%
      inv_txfm_add_16x4_dct_adst_1_8bpc_rvv             432.8   392.5   -9.31%
      inv_txfm_add_16x4_dct_adst_2_8bpc_rvv             432.8   392.5   -9.31%
      inv_txfm_add_16x4_dct_dct_0_8bpc_rvv              407.9   378.3   -7.26%
      inv_txfm_add_16x4_dct_dct_1_8bpc_rvv              407.8   378.1   -7.28%
      inv_txfm_add_16x4_dct_dct_2_8bpc_rvv              407.8   378.1   -7.28%
      inv_txfm_add_16x4_dct_flipadst_0_8bpc_rvv         426.0   395.1   -7.25%
      inv_txfm_add_16x4_dct_flipadst_1_8bpc_rvv         425.9   395.0   -7.26%
      inv_txfm_add_16x4_dct_flipadst_2_8bpc_rvv         426.0   395.1   -7.25%
      inv_txfm_add_16x4_dct_identity_0_8bpc_rvv         357.1   338.7   -5.15%
      inv_txfm_add_16x4_dct_identity_1_8bpc_rvv         357.1   338.7   -5.15%
      inv_txfm_add_16x4_dct_identity_2_8bpc_rvv         357.2   338.7   -5.18%
      inv_txfm_add_16x4_flipadst_adst_0_8bpc_rvv        472.4   432.6   -8.43%
      inv_txfm_add_16x4_flipadst_adst_1_8bpc_rvv        472.2   432.6   -8.39%
      inv_txfm_add_16x4_flipadst_adst_2_8bpc_rvv        472.3   432.7   -8.38%
      inv_txfm_add_16x4_flipadst_dct_0_8bpc_rvv         464.3   418.2   -9.93%
      inv_txfm_add_16x4_flipadst_dct_1_8bpc_rvv         464.2   418.2   -9.91%
      inv_txfm_add_16x4_flipadst_dct_2_8bpc_rvv         464.2   418.2   -9.91%
      inv_txfm_add_16x4_flipadst_flipadst_0_8bpc_rvv    474.7   435.1   -8.34%
      inv_txfm_add_16x4_flipadst_flipadst_1_8bpc_rvv    474.8   435.1   -8.36%
      inv_txfm_add_16x4_flipadst_flipadst_2_8bpc_rvv    474.7   435.1   -8.34%
      inv_txfm_add_16x4_flipadst_identity_0_8bpc_rvv    405.9   378.8   -6.68%
      inv_txfm_add_16x4_flipadst_identity_1_8bpc_rvv    406.0   378.8   -6.70%
      inv_txfm_add_16x4_flipadst_identity_2_8bpc_rvv    406.0   378.8   -6.70%
      inv_txfm_add_16x4_identity_adst_0_8bpc_rvv        353.7   342.2   -3.25%
      inv_txfm_add_16x4_identity_adst_1_8bpc_rvv        353.8   342.3   -3.25%
      inv_txfm_add_16x4_identity_adst_2_8bpc_rvv        353.7   342.4   -3.19%
      inv_txfm_add_16x4_identity_dct_0_8bpc_rvv         338.1   327.9   -3.02%
      inv_txfm_add_16x4_identity_dct_1_8bpc_rvv         338.1   327.9   -3.02%
      inv_txfm_add_16x4_identity_dct_2_8bpc_rvv         338.2   327.9   -3.05%
      inv_txfm_add_16x4_identity_flipadst_0_8bpc_rvv    357.5   344.8   -3.55%
      inv_txfm_add_16x4_identity_flipadst_1_8bpc_rvv    357.5   344.9   -3.52%
      inv_txfm_add_16x4_identity_flipadst_2_8bpc_rvv    357.5   344.7   -3.58%
      inv_txfm_add_16x4_identity_identity_0_8bpc_rvv    287.1   297.0    3.45%
      inv_txfm_add_16x4_identity_identity_1_8bpc_rvv    287.2   297.0    3.41%
      inv_txfm_add_16x4_identity_identity_2_8bpc_rvv    287.2   297.0    3.41%
      
      inv_txfm_add_8x16_adst_adst_0_8bpc_rvv            774.3   704.8   -8.98%
      inv_txfm_add_8x16_adst_adst_1_8bpc_rvv            774.4   704.8   -8.99%
      inv_txfm_add_8x16_adst_adst_2_8bpc_rvv            929.5   839.9   -9.64%
      inv_txfm_add_8x16_adst_dct_0_8bpc_rvv             687.9   634.9   -7.70%
      inv_txfm_add_8x16_adst_dct_1_8bpc_rvv             688.0   634.8   -7.73%
      inv_txfm_add_8x16_adst_dct_2_8bpc_rvv             845.5   768.4   -9.12%
      inv_txfm_add_8x16_adst_flipadst_0_8bpc_rvv        779.5   708.5   -9.11%
      inv_txfm_add_8x16_adst_flipadst_1_8bpc_rvv        779.5   708.5   -9.11%
      inv_txfm_add_8x16_adst_flipadst_2_8bpc_rvv        933.3   849.9   -8.94%
      inv_txfm_add_8x16_adst_identity_0_8bpc_rvv        546.5   529.0   -3.20%
      inv_txfm_add_8x16_adst_identity_1_8bpc_rvv        546.5   529.0   -3.20%
      inv_txfm_add_8x16_adst_identity_2_8bpc_rvv        702.5   664.1   -5.47%
      inv_txfm_add_8x16_dct_adst_0_8bpc_rvv             739.9   672.7   -9.08%
      inv_txfm_add_8x16_dct_adst_1_8bpc_rvv             739.9   672.7   -9.08%
      inv_txfm_add_8x16_dct_adst_2_8bpc_rvv             863.1   776.1  -10.08%
      inv_txfm_add_8x16_dct_dct_0_8bpc_rvv              651.2   601.9   -7.57%
      inv_txfm_add_8x16_dct_dct_1_8bpc_rvv              651.2   601.8   -7.59%
      inv_txfm_add_8x16_dct_dct_2_8bpc_rvv              777.6   706.5   -9.14%
      inv_txfm_add_8x16_dct_flipadst_0_8bpc_rvv         742.4   678.9   -8.55%
      inv_txfm_add_8x16_dct_flipadst_1_8bpc_rvv         742.5   678.9   -8.57%
      inv_txfm_add_8x16_dct_flipadst_2_8bpc_rvv         858.8   779.3   -9.26%
      inv_txfm_add_8x16_dct_identity_0_8bpc_rvv         510.8   496.4   -2.82%
      inv_txfm_add_8x16_dct_identity_1_8bpc_rvv         510.6   496.5   -2.76%
      inv_txfm_add_8x16_dct_identity_2_8bpc_rvv         630.0   599.7   -4.81%
      inv_txfm_add_8x16_flipadst_adst_0_8bpc_rvv        778.3   707.2   -9.14%
      inv_txfm_add_8x16_flipadst_adst_1_8bpc_rvv        778.3   707.1   -9.15%
      inv_txfm_add_8x16_flipadst_adst_2_8bpc_rvv        934.4   843.5   -9.73%
      inv_txfm_add_8x16_flipadst_dct_0_8bpc_rvv         689.3   634.7   -7.92%
      inv_txfm_add_8x16_flipadst_dct_1_8bpc_rvv         689.2   634.8   -7.89%
      inv_txfm_add_8x16_flipadst_dct_2_8bpc_rvv         845.8   774.4   -8.44%
      inv_txfm_add_8x16_flipadst_flipadst_0_8bpc_rvv    779.9   710.5   -8.90%
      inv_txfm_add_8x16_flipadst_flipadst_1_8bpc_rvv    780.0   710.4   -8.92%
      inv_txfm_add_8x16_flipadst_flipadst_2_8bpc_rvv    936.4   848.1   -9.43%
      inv_txfm_add_8x16_flipadst_identity_0_8bpc_rvv    550.4   531.3   -3.47%
      inv_txfm_add_8x16_flipadst_identity_1_8bpc_rvv    550.4   531.3   -3.47%
      inv_txfm_add_8x16_flipadst_identity_2_8bpc_rvv    705.3   669.4   -5.09%
      inv_txfm_add_8x16_identity_adst_0_8bpc_rvv        649.0   599.7   -7.60%
      inv_txfm_add_8x16_identity_adst_1_8bpc_rvv        649.0   599.7   -7.60%
      inv_txfm_add_8x16_identity_adst_2_8bpc_rvv        682.8   633.4   -7.23%
      inv_txfm_add_8x16_identity_dct_0_8bpc_rvv         562.1   527.9   -6.08%
      inv_txfm_add_8x16_identity_dct_1_8bpc_rvv         562.0   527.9   -6.07%
      inv_txfm_add_8x16_identity_dct_2_8bpc_rvv         597.4   561.5   -6.01%
      inv_txfm_add_8x16_identity_flipadst_0_8bpc_rvv    652.7   603.6   -7.52%
      inv_txfm_add_8x16_identity_flipadst_1_8bpc_rvv    652.8   603.6   -7.54%
      inv_txfm_add_8x16_identity_flipadst_2_8bpc_rvv    686.6   640.5   -6.71%
      inv_txfm_add_8x16_identity_identity_0_8bpc_rvv    421.6   424.4    0.66%
      inv_txfm_add_8x16_identity_identity_1_8bpc_rvv    421.7   424.4    0.64%
      inv_txfm_add_8x16_identity_identity_2_8bpc_rvv    455.5   458.1    0.57%
      
      inv_txfm_add_16x8_adst_adst_0_8bpc_rvv            935.2   843.2   -9.84%
      inv_txfm_add_16x8_adst_adst_1_8bpc_rvv            935.2   843.3   -9.83%
      inv_txfm_add_16x8_adst_adst_2_8bpc_rvv            935.2   843.1   -9.85%
      inv_txfm_add_16x8_adst_dct_0_8bpc_rvv             857.0   781.1   -8.86%
      inv_txfm_add_16x8_adst_dct_1_8bpc_rvv             856.9   781.1   -8.85%
      inv_txfm_add_16x8_adst_dct_2_8bpc_rvv             856.9   781.0   -8.86%
      inv_txfm_add_16x8_adst_flipadst_0_8bpc_rvv        938.9   846.8   -9.81%
      inv_txfm_add_16x8_adst_flipadst_1_8bpc_rvv        938.8   847.0   -9.78%
      inv_txfm_add_16x8_adst_flipadst_2_8bpc_rvv        938.9   847.0   -9.79%
      inv_txfm_add_16x8_adst_identity_0_8bpc_rvv        711.2   661.6   -6.97%
      inv_txfm_add_16x8_adst_identity_1_8bpc_rvv        711.2   661.6   -6.97%
      inv_txfm_add_16x8_adst_identity_2_8bpc_rvv        711.2   661.6   -6.97%
      inv_txfm_add_16x8_dct_adst_0_8bpc_rvv             846.1   771.5   -8.82%
      inv_txfm_add_16x8_dct_adst_1_8bpc_rvv             845.9   771.5   -8.80%
      inv_txfm_add_16x8_dct_adst_2_8bpc_rvv             846.2   772.1   -8.76%
      inv_txfm_add_16x8_dct_dct_0_8bpc_rvv              767.8   710.3   -7.49%
      inv_txfm_add_16x8_dct_dct_1_8bpc_rvv              767.8   710.4   -7.48%
      inv_txfm_add_16x8_dct_dct_2_8bpc_rvv              767.4   710.4   -7.43%
      inv_txfm_add_16x8_dct_flipadst_0_8bpc_rvv         856.6   775.6   -9.46%
      inv_txfm_add_16x8_dct_flipadst_1_8bpc_rvv         856.5   775.1   -9.50%
      inv_txfm_add_16x8_dct_flipadst_2_8bpc_rvv         856.6   775.2   -9.50%
      inv_txfm_add_16x8_dct_identity_0_8bpc_rvv         623.3   589.9   -5.36%
      inv_txfm_add_16x8_dct_identity_1_8bpc_rvv         623.3   590.0   -5.34%
      inv_txfm_add_16x8_dct_identity_2_8bpc_rvv         623.3   589.7   -5.39%
      inv_txfm_add_16x8_flipadst_adst_0_8bpc_rvv        939.8   846.9   -9.89%
      inv_txfm_add_16x8_flipadst_adst_1_8bpc_rvv        939.8   847.0   -9.87%
      inv_txfm_add_16x8_flipadst_adst_2_8bpc_rvv        939.9   846.9   -9.89%
      inv_txfm_add_16x8_flipadst_dct_0_8bpc_rvv         860.8   784.9   -8.82%
      inv_txfm_add_16x8_flipadst_dct_1_8bpc_rvv         860.7   784.8   -8.82%
      inv_txfm_add_16x8_flipadst_dct_2_8bpc_rvv         860.8   784.9   -8.82%
      inv_txfm_add_16x8_flipadst_flipadst_0_8bpc_rvv    942.7   852.2   -9.60%
      inv_txfm_add_16x8_flipadst_flipadst_1_8bpc_rvv    942.7   852.1   -9.61%
      inv_txfm_add_16x8_flipadst_flipadst_2_8bpc_rvv    942.8   852.1   -9.62%
      inv_txfm_add_16x8_flipadst_identity_0_8bpc_rvv    714.9   667.0   -6.70%
      inv_txfm_add_16x8_flipadst_identity_1_8bpc_rvv    715.0   666.9   -6.73%
      inv_txfm_add_16x8_flipadst_identity_2_8bpc_rvv    715.0   666.9   -6.73%
      inv_txfm_add_16x8_identity_adst_0_8bpc_rvv        707.9   667.2   -5.75%
      inv_txfm_add_16x8_identity_adst_1_8bpc_rvv        707.9   667.3   -5.74%
      inv_txfm_add_16x8_identity_adst_2_8bpc_rvv        707.9   667.2   -5.75%
      inv_txfm_add_16x8_identity_dct_0_8bpc_rvv         630.6   604.8   -4.09%
      inv_txfm_add_16x8_identity_dct_1_8bpc_rvv         630.7   604.9   -4.09%
      inv_txfm_add_16x8_identity_dct_2_8bpc_rvv         630.6   604.8   -4.09%
      inv_txfm_add_16x8_identity_flipadst_0_8bpc_rvv    711.7   671.1   -5.70%
      inv_txfm_add_16x8_identity_flipadst_1_8bpc_rvv    711.9   671.1   -5.73%
      inv_txfm_add_16x8_identity_flipadst_2_8bpc_rvv    711.8   671.2   -5.70%
      inv_txfm_add_16x8_identity_identity_0_8bpc_rvv    485.2   486.2    0.21%
      inv_txfm_add_16x8_identity_identity_1_8bpc_rvv    485.2   486.3    0.23%
      inv_txfm_add_16x8_identity_identity_2_8bpc_rvv    485.2   486.3    0.23%
      789a1f65
  10. Oct 14, 2024
  11. Oct 13, 2024
  12. Oct 12, 2024
  13. Oct 09, 2024
    • Bogdan Gligorijević's avatar
      riscv64/mc: warp_8x8 and warp_8x8t 8bpc · b2e7f06c
      Bogdan Gligorijević authored
      Benchmarks:
      - Kendryte K230:
      warp_8x8_8bpc_c:      4549.7 ( 1.00x)
      warp_8x8_8bpc_rvv:    2504.7 ( 1.82x)
      warp_8x8t_8bpc_c:     4414.7 ( 1.00x)
      warp_8x8t_8bpc_rvv:   2465.7 ( 1.79x)
      
      - Banana Pi BPI-F3:
      warp_8x8_8bpc_c:      4431.2 ( 1.00x)
      warp_8x8_8bpc_rvv:    3297.4 ( 1.34x)
      warp_8x8t_8bpc_c:     4299.3 ( 1.00x)
      warp_8x8t_8bpc_rvv:   3255.7 ( 1.32x)
      b2e7f06c
    • Niklas Haas's avatar
      riscv64/mc: Re-order instructions · 56f6d166
      Niklas Haas authored
      To avoid read-after-write. Speedup is about 1% for width=4 on a K230.
      56f6d166
    • Niklas Haas's avatar
      riscv64/mc: Add bidir functions · 3d12677c
      Niklas Haas authored
      This code compromises between the performance of a dedicated kernel per
      VLEN/width pair, and the flexibility of a fully VLEN-dynamic loop, by
      using a single special case for w=4, and subdividing the rest into the
      unrolled four line fast path, and the general-purpose slow path (for
      large width on small VLEN).
      
      Kendryte K230
      
      avg_w4_8bpc_c:          346.8 ( 1.00x)
      avg_w4_8bpc_rvv:         50.3 ( 6.90x)
      avg_w8_8bpc_c:         1054.9 ( 1.00x)
      avg_w8_8bpc_rvv:        139.1 ( 7.58x)
      avg_w16_8bpc_c:        3396.3 ( 1.00x)
      avg_w16_8bpc_rvv:       350.6 ( 9.69x)
      avg_w32_8bpc_c:       13734.3 ( 1.00x)
      avg_w32_8bpc_rvv:      1226.3 (11.20x)
      avg_w64_8bpc_c:       33260.9 ( 1.00x)
      avg_w64_8bpc_rvv:      3869.4 ( 8.60x)
      avg_w128_8bpc_c:      83441.3 ( 1.00x)
      avg_w128_8bpc_rvv:     9765.1 ( 8.54x)
      
      w_avg_w4_8bpc_c:        444.3 ( 1.00x)
      w_avg_w4_8bpc_rvv:       75.8 ( 5.86x)
      w_avg_w8_8bpc_c:       1365.6 ( 1.00x)
      w_avg_w8_8bpc_rvv:      208.8 ( 6.54x)
      w_avg_w16_8bpc_c:      4420.8 ( 1.00x)
      w_avg_w16_8bpc_rvv:     570.7 ( 7.75x)
      w_avg_w32_8bpc_c:     18010.9 ( 1.00x)
      w_avg_w32_8bpc_rvv:    2074.4 ( 8.68x)
      w_avg_w64_8bpc_c:     43050.4 ( 1.00x)
      w_avg_w64_8bpc_rvv:    5799.5 ( 7.42x)
      w_avg_w128_8bpc_c:   107153.6 ( 1.00x)
      w_avg_w128_8bpc_rvv:  14272.0 ( 7.51x)
      
      mask_w4_8bpc_c:        497.6 ( 1.00x)
      mask_w4_8bpc_rvv:       88.5 ( 5.63x)
      mask_w8_8bpc_c:       1528.5 ( 1.00x)
      mask_w8_8bpc_rvv:      253.1 ( 6.04x)
      mask_w16_8bpc_c:      4953.8 ( 1.00x)
      mask_w16_8bpc_rvv:     679.0 ( 7.30x)
      mask_w32_8bpc_c:     20298.3 ( 1.00x)
      mask_w32_8bpc_rvv:    3012.9 ( 6.74x)
      mask_w64_8bpc_c:     49718.8 ( 1.00x)
      mask_w64_8bpc_rvv:    7291.7 ( 6.82x)
      mask_w128_8bpc_c:   126740.3 ( 1.00x)
      mask_w128_8bpc_rvv:  18351.1 ( 6.91x)
      3d12677c
    • Niklas Haas's avatar
      riscv: Add $vtype helper definitions · 50ac8260
      Niklas Haas authored
      50ac8260
    • Nathan E. Egge's avatar
      riscv64/mc: Branchless vsetvl in blend_v function · cc7d8773
      Nathan E. Egge authored
      Kendryte K230
      
      blend_v_w2_8bpc_c:       221.4 ( 1.00x)
      blend_v_w2_8bpc_rvv:     147.7 ( 1.50x)
      blend_v_w4_8bpc_c:       945.3 ( 1.00x)
      blend_v_w4_8bpc_rvv:     243.3 ( 3.89x)
      blend_v_w8_8bpc_c:      1786.9 ( 1.00x)
      blend_v_w8_8bpc_rvv:     256.1 ( 6.98x)
      blend_v_w16_8bpc_c:     3472.1 ( 1.00x)
      blend_v_w16_8bpc_rvv:    351.1 ( 9.89x)
      blend_v_w32_8bpc_c:     6832.1 ( 1.00x)
      blend_v_w32_8bpc_rvv:    635.4 (10.75x)
      
      SpacemiT K1
      
      blend_v_w2_8bpc_c:       218.0 ( 1.00x)
      blend_v_w2_8bpc_rvv:     144.3 ( 1.51x)
      blend_v_w4_8bpc_c:       921.7 ( 1.00x)
      blend_v_w4_8bpc_rvv:     237.1 ( 3.89x)
      blend_v_w8_8bpc_c:      1739.8 ( 1.00x)
      blend_v_w8_8bpc_rvv:     237.4 ( 7.33x)
      blend_v_w16_8bpc_c:     3376.6 ( 1.00x)
      blend_v_w16_8bpc_rvv:    296.3 (11.40x)
      blend_v_w32_8bpc_c:     6647.2 ( 1.00x)
      blend_v_w32_8bpc_rvv:    408.1 (16.29x)
      cc7d8773
    • Nathan E. Egge's avatar
      riscv64/mc: Branchless vsetvl in blend_h function · 2da8107e
      Nathan E. Egge authored
      Kendryte K230
      
      blend_h_w2_8bpc_c:        165.9 ( 1.00x)
      blend_h_w2_8bpc_rvv:       83.8 ( 1.98x)
      blend_h_w4_8bpc_c:        295.2 ( 1.00x)
      blend_h_w4_8bpc_rvv:       83.8 ( 3.52x)
      blend_h_w8_8bpc_c:        557.9 ( 1.00x)
      blend_h_w8_8bpc_rvv:       92.5 ( 6.03x)
      blend_h_w16_8bpc_c:      1078.8 ( 1.00x)
      blend_h_w16_8bpc_rvv:     117.3 ( 9.19x)
      blend_h_w32_8bpc_c:      2117.8 ( 1.00x)
      blend_h_w32_8bpc_rvv:     200.5 (10.57x)
      blend_h_w64_8bpc_c:      4194.7 ( 1.00x)
      blend_h_w64_8bpc_rvv:     363.2 (11.55x)
      blend_h_w128_8bpc_c:    10271.4 ( 1.00x)
      blend_h_w128_8bpc_rvv:    844.5 (12.16x)
      
      SpacemiT K1
      
      blend_h_w2_8bpc_c:        162.5 ( 1.00x)
      blend_h_w2_8bpc_rvv:       83.9 ( 1.94x)
      blend_h_w4_8bpc_c:        288.6 ( 1.00x)
      blend_h_w4_8bpc_rvv:       83.7 ( 3.45x)
      blend_h_w8_8bpc_c:        544.7 ( 1.00x)
      blend_h_w8_8bpc_rvv:       84.0 ( 6.48x)
      blend_h_w16_8bpc_c:      1052.8 ( 1.00x)
      blend_h_w16_8bpc_rvv:     102.9 (10.23x)
      blend_h_w32_8bpc_c:      2068.0 ( 1.00x)
      blend_h_w32_8bpc_rvv:     131.4 (15.73x)
      blend_h_w64_8bpc_c:      4093.7 ( 1.00x)
      blend_h_w64_8bpc_rvv:     220.3 (18.58x)
      blend_h_w128_8bpc_c:    10023.1 ( 1.00x)
      blend_h_w128_8bpc_rvv:    467.3 (21.45x)
      2da8107e
    • Nathan E. Egge's avatar
      riscv64/mc: Branchless vsetvl in blend function · b374b24c
      Nathan E. Egge authored
      Kendryte K230
      
      blend_w4_8bpc_c:       204.8 ( 1.00x)
      blend_w4_8bpc_rvv:      59.8 ( 3.42x)
      blend_w8_8bpc_c:       608.9 ( 1.00x)
      blend_w8_8bpc_rvv:      87.2 ( 6.98x)
      blend_w16_8bpc_c:     2362.4 ( 1.00x)
      blend_w16_8bpc_rvv:    225.2 (10.49x)
      blend_w32_8bpc_c:     5990.4 ( 1.00x)
      blend_w32_8bpc_rvv:    518.3 (11.56x)
      
      SpacemiT K1
      
      blend_w4_8bpc_c:       201.6 ( 1.00x)
      blend_w4_8bpc_rvv:      58.0 ( 3.48x)
      blend_w8_8bpc_c:       595.1 ( 1.00x)
      blend_w8_8bpc_rvv:      82.1 ( 7.25x)
      blend_w16_8bpc_c:     2308.8 ( 1.00x)
      blend_w16_8bpc_rvv:    189.0 (12.22x)
      blend_w32_8bpc_c:     5853.1 ( 1.00x)
      blend_w32_8bpc_rvv:    339.5 (17.24x)
      b374b24c
    • Nathan E. Egge's avatar
      riscv64/mc: Add VLEN=256 8bpc RVV blend_v function · 0e3f70e8
      Nathan E. Egge authored
      SpacemiT K1
      
      blend_v_w2_8bpc_c:       217.0 ( 1.00x)
      blend_v_w2_8bpc_rvv:     143.3 ( 1.51x)
      blend_v_w4_8bpc_c:       921.6 ( 1.00x)
      blend_v_w4_8bpc_rvv:     236.3 ( 3.90x)
      blend_v_w8_8bpc_c:      1738.2 ( 1.00x)
      blend_v_w8_8bpc_rvv:     238.1 ( 7.30x)
      blend_v_w16_8bpc_c:     3376.1 ( 1.00x)
      blend_v_w16_8bpc_rvv:    298.0 (11.33x)
      blend_v_w32_8bpc_c:     6648.0 ( 1.00x)
      blend_v_w32_8bpc_rvv:    409.5 (16.24x)
      0e3f70e8
    • Nathan E. Egge's avatar
      riscv64/mc: Add VLEN=256 8bpc RVV blend_h function · a5b95448
      Nathan E. Egge authored
      SpacemiT K1
      
      blend_h_w2_8bpc_c:        161.8 ( 1.00x)
      blend_h_w2_8bpc_rvv:       83.5 ( 1.94x)
      blend_h_w4_8bpc_c:        288.4 ( 1.00x)
      blend_h_w4_8bpc_rvv:       83.7 ( 3.45x)
      blend_h_w8_8bpc_c:        543.9 ( 1.00x)
      blend_h_w8_8bpc_rvv:       84.5 ( 6.44x)
      blend_h_w16_8bpc_c:      1051.6 ( 1.00x)
      blend_h_w16_8bpc_rvv:     103.8 (10.13x)
      blend_h_w32_8bpc_c:      2066.0 ( 1.00x)
      blend_h_w32_8bpc_rvv:     133.8 (15.44x)
      blend_h_w64_8bpc_c:      4092.7 ( 1.00x)
      blend_h_w64_8bpc_rvv:     225.2 (18.18x)
      blend_h_w128_8bpc_c:    10011.3 ( 1.00x)
      blend_h_w128_8bpc_rvv:    474.7 (21.09x)
      a5b95448
    • Nathan E. Egge's avatar
      riscv64/mc: Add VLEN=256 8bpc RVV blend function · 83485c50
      Nathan E. Egge authored
      SpacemiT K1
      
      blend_w4_8bpc_c:       201.3 ( 1.00x)
      blend_w4_8bpc_rvv:      59.3 ( 3.40x)
      blend_w8_8bpc_c:       595.1 ( 1.00x)
      blend_w8_8bpc_rvv:      84.1 ( 7.07x)
      blend_w16_8bpc_c:     2309.0 ( 1.00x)
      blend_w16_8bpc_rvv:    190.5 (12.12x)
      blend_w32_8bpc_c:     5854.7 ( 1.00x)
      blend_w32_8bpc_rvv:    341.6 (17.14x)
      83485c50
    • Nathan E. Egge's avatar
      7f2bb2fb
    • Nathan E. Egge's avatar
      riscv64/mc: Add 8bpc RVV blend_v function · 01da36eb
      Nathan E. Egge authored
      Kendryte K230
      
      blend_v_w2_8bpc_c:       219.6 ( 1.00x)
      blend_v_w2_8bpc_rvv:     141.8 ( 1.55x)
      blend_v_w4_8bpc_c:       942.9 ( 1.00x)
      blend_v_w4_8bpc_rvv:     240.9 ( 3.91x)
      blend_v_w8_8bpc_c:      1783.5 ( 1.00x)
      blend_v_w8_8bpc_rvv:     254.7 ( 7.00x)
      blend_v_w16_8bpc_c:     3466.5 ( 1.00x)
      blend_v_w16_8bpc_rvv:    350.5 ( 9.89x)
      blend_v_w32_8bpc_c:     6825.2 ( 1.00x)
      blend_v_w32_8bpc_rvv:    635.1 (10.75x)
      01da36eb
    • Nathan E. Egge's avatar
      riscv64/mc: Add 8bpc RVV blend_h function · d3a94f11
      Nathan E. Egge authored
      Kendryte K230
      
      blend_h_w2_8bpc_c:        165.4 ( 1.00x)
      blend_h_w2_8bpc_rvv:       79.4 ( 2.08x)
      blend_h_w4_8bpc_c:        294.6 ( 1.00x)
      blend_h_w4_8bpc_rvv:       81.5 ( 3.61x)
      blend_h_w8_8bpc_c:        556.9 ( 1.00x)
      blend_h_w8_8bpc_rvv:       90.2 ( 6.17x)
      blend_h_w16_8bpc_c:      1077.6 ( 1.00x)
      blend_h_w16_8bpc_rvv:     116.1 ( 9.29x)
      blend_h_w32_8bpc_c:      2116.2 ( 1.00x)
      blend_h_w32_8bpc_rvv:     200.5 (10.55x)
      blend_h_w64_8bpc_c:      4191.8 ( 1.00x)
      blend_h_w64_8bpc_rvv:     363.3 (11.54x)
      blend_h_w128_8bpc_c:    10264.6 ( 1.00x)
      blend_h_w128_8bpc_rvv:    844.1 (12.16x)
      d3a94f11
    • Nathan E. Egge's avatar
      riscv64/mc: Add 8bpc RVV blend function · f851fcd0
      Nathan E. Egge authored
      Kendryte K230
      
      blend_w4_8bpc_c:       204.5 ( 1.00x)
      blend_w4_8bpc_rvv:      56.4 ( 3.62x)
      blend_w8_8bpc_c:       608.6 ( 1.00x)
      blend_w8_8bpc_rvv:      87.3 ( 6.97x)
      blend_w16_8bpc_c:     2363.8 ( 1.00x)
      blend_w16_8bpc_rvv:    225.1 (10.50x)
      blend_w32_8bpc_c:     5990.3 ( 1.00x)
      blend_w32_8bpc_rvv:    518.8 (11.55x)
      f851fcd0
    • Bogdan Gligorijević's avatar
      Tone down loop to only 2 iterations · 848c5a2d
      Bogdan Gligorijević authored
      Benchmark pending
      848c5a2d
    • Bogdan Gligorijević's avatar
      Scalar dc calculation · a0a08d85
      Bogdan Gligorijević authored
      Current benchmark:
      
      - Kendryte K230:
      inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1729.4 ( 1.00x)
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:    153.2 (11.29x)
      
      - spacemiT K1:
      inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1533.4 ( 1.00x)
      inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:    176.8 ( 8.67x)
      a0a08d85
    • Bogdan Gligorijević's avatar
      riscv64/itx: Special case 16x16 8bpc dct_dct eob=0 · c8749f06
      Bogdan Gligorijević authored
      Performance comparison:
      
      - SpacemiT K1:                             Master branch:       itx_16x16:
        inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1534.1 ( 1.00x)      1534.9 ( 1.00x)
        inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:   1173.6 ( 1.31x)       173.1 ( 8.87x)
      
      - Kendryte K230:                           Master branch:       itx_16x16:
        inv_txfm_add_16x16_dct_dct_0_8bpc_c:     1576.0 ( 1.00x)      1579.1 ( 1.00x)
        inv_txfm_add_16x16_dct_dct_0_8bpc_rvv:   1095.5 ( 1.44x)       146.8 (10.75x)
      c8749f06
    • Bogdan Gligorijević's avatar
      ipred_paeth · 0cdf1b4b
      Bogdan Gligorijević authored
      Benchmarks:
      - Kendryte K230:
      intra_pred_paeth_w4_8bpc_c:       412.9 ( 1.00x)
      intra_pred_paeth_w4_8bpc_rvv:     688.0 ( 0.60x)
      intra_pred_paeth_w8_8bpc_c:      1206.6 ( 1.00x)
      intra_pred_paeth_w8_8bpc_rvv:    1094.3 ( 1.10x)
      intra_pred_paeth_w16_8bpc_c:     3889.7 ( 1.00x)
      intra_pred_paeth_w16_8bpc_rvv:   1796.7 ( 2.16x)
      intra_pred_paeth_w32_8bpc_c:     9797.2 ( 1.00x)
      intra_pred_paeth_w32_8bpc_rvv:   4323.9 ( 2.27x)
      intra_pred_paeth_w64_8bpc_c:    24242.5 ( 1.00x)
      intra_pred_paeth_w64_8bpc_rvv:  10739.8 ( 2.26x)
      
      - Banana Pi BPI-F3
      intra_pred_paeth_w4_8bpc_c:       395.1 ( 1.00x)
      intra_pred_paeth_w4_8bpc_rvv:     705.4 ( 0.56x)
      intra_pred_paeth_w8_8bpc_c:      1184.9 ( 1.00x)
      intra_pred_paeth_w8_8bpc_rvv:    1125.3 ( 1.05x)
      intra_pred_paeth_w16_8bpc_c:     3807.8 ( 1.00x)
      intra_pred_paeth_w16_8bpc_rvv:   1850.8 ( 2.06x)
      intra_pred_paeth_w32_8bpc_c:     9985.1 ( 1.00x)
      intra_pred_paeth_w32_8bpc_rvv:   2235.5 ( 4.47x)
      intra_pred_paeth_w64_8bpc_c:    24040.4 ( 1.00x)
      intra_pred_paeth_w64_8bpc_rvv:   5450.0 ( 4.41x)
      0cdf1b4b
Loading