User-submitted performance reports
In light of not really having a place for people to submit dav1d performance reports, here's an issue that people can reply to with their reports.
Please use the following template:
- hardware:
- software:
- dav1d revision:
- dav1d command line:
- input file:
- framerate:
- (optional) perf report:
You can get a perf report by installing the perf
package and running dav1d like perf record ./dav1d ...
. Don't use framerate info from the same run though, as perf may slow things down a bit. You can then use perf report
to get the relevant report information.
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- hardware: IBM Power System S822L (20x 2.061GHz POWER8E cores, 256GB RAM)
- software: Alpine Linux edge (2018/09/22), gcc 6.4.0
- dav1d revision: fbba7321 (2018/09/23) + local thread stack size increase patch
-
dav1d command line:
./dav1d -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o f.y4m --framethreads 10 --tilethreads 2
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 8929 / 444,82 = ~20,1
- perf report:
# Overhead Command Shared Object Symbol # ........ ....... ........................ ............................................ # 31.05% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_8x8_c 11.77% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 6.93% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 6.02% dav1d libdav1d.so.0.0.1 [.] wiener_c 5.49% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 2.59% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 2.12% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.0 1.65% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_cdf_q15 1.52% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.41% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1.22% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.4 1.18% dav1d libdav1d.so.0.0.1 [.] decode_coefs.constprop.6 1.15% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 1.10% dav1d ld-musl-powerpc64le.so.1 [.] memcpy 1.08% dav1d libdav1d.so.0.0.1 [.] decode_b 1.01% dav1d libdav1d.so.0.0.1 [.] av1_init_ref_mv_tile_row 1.00% dav1d libdav1d.so.0.0.1 [.] av1_find_mv_refs 0.96% dav1d libdav1d.so.0.0.1 [.] loop_filter 0.94% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.92% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.89% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0.81% dav1d ld-musl-powerpc64le.so.1 [.] memset 0.80% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.3 0.70% dav1d libdav1d.so.0.0.1 [.] decode_coefs.constprop.7 0.64% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d
Edited by Shiz 1- Contributor
-
hardware: AMD Threadripper 2990WX, 64GB RAM
-
dav1d revision: 462204ab
-
dav1d command line: dav1d -i ~/Videos/Chimera/Chimera-AV1-10bit-1920x1080-6191kbps.ivf --framethreads 32 --tilethreads 2 -o /dev/null.y4m
-
input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
-
framerate: 34.8
-
(optional) perf report:
28.68% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_8x8_c 15.36% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 11.31% dav1d libdav1d.so.0.0.1 [.] wiener_c 5.36% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 4.97% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 2.90% dav1d libdav1d.so.0.0.1 [.] loop_filter 2.46% dav1d libdav1d.so.0.0.1 [.] decode_b 2.44% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 2.08% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.0 2.00% dav1d libdav1d.so.0.0.1 [.] selfguided_c 1.48% dav1d libc-2.27.so [.] __memcpy_ssse3 1.42% dav1d libdav1d.so.0.0.1 [.] msac_decode_symbol 1.29% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.8 1.01% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1.00% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.6 0.98% dav1d libdav1d.so.0.0.1 [.] update_cdf 0.84% dav1d libdav1d.so.0.0.1 [.] av1_fill_motion_field 0.83% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.9 0.80% dav1d libdav1d.so.0.0.1 [.] av1_find_mv_refs 0.78% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_16bpc 0.75% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.68% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.60% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.11 0.56% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0.48% dav1d libdav1d.so.0.0.1 [.] blend_c 0.45% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_16bpc 0.42% dav1d libdav1d.so.0.0.1 [.] w_mask_420_c 0.38% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0.37% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.35% dav1d libdav1d.so.0.0.1 [.] recon_b_inter_16bpc 0.34% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.7 0.28% dav1d libdav1d.so.0.0.1 [.] inv_dct4_1d 0.28% dav1d libdav1d.so.0.0.1 [.] read_coef_blocks_16bpc 0.28% dav1d libdav1d.so.0.0.1 [.] recon_b_intra_16bpc 0.27% dav1d libdav1d.so.0.0.1 [.] decode_sb 0.27% dav1d libdav1d.so.0.0.1 [.] inv_adst8_1d 0.26% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 0.24% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0.23% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8_c 0.23% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0.23% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.10 0.23% dav1d libdav1d.so.0.0.1 [.] avg_c 0.20% dav1d libdav1d.so.0.0.1 [.] prepare_intra_edges_16bpc 0.20% dav1d libdav1d.so.0.0.1 [.] inv_dct64_1d 0.18% dav1d libdav1d.so.0.0.1 [.] prep_c 0.17% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3 0.17% dav1d libdav1d.so.0.0.1 [.] add_ref_mv_candidate.isra.18 0.16% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra 0.15% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8t_c 0.15% dav1d libdav1d.so.0.0.1 [.] mc.isra.4 0.14% dav1d libdav1d.so.0.0.1 [.] mask_c
1 -
How do we get the "perf report"?
- hardware: Intel Xeon E5-2650 v4, 256GB RAM
- software: Alpine Linux edge (2018/09/23), gcc 6.4.0
- dav1d revision: 0b7be94f (2018/09/23) + local thread stack size increase patch
-
dav1d command line:
./dav1d -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o f.y4m --framethreads 24 --tilethreads 2
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 8929 / 520 = ~17,5
- perf report:
# Overhead Command Shared Object Symbol # ........ ....... ................... .......................................... # 27.06% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_8x8_c 11.66% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 8.15% dav1d ld-musl-x86_64.so.1 [.] memcpy 6.34% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 5.36% dav1d libdav1d.so.0.0.1 [.] wiener_c 5.22% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 2.55% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 2.44% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.0 2.06% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 1.63% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.53% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_cdf_q15 1.47% dav1d libdav1d.so.0.0.1 [.] decode_b 1.27% dav1d libdav1d.so.0.0.1 [.] decode_coefs.constprop.6 1.20% dav1d libdav1d.so.0.0.1 [.] av1_find_mv_refs 1.16% dav1d libdav1d.so.0.0.1 [.] loop_filter 1.11% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1.04% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.4 1.03% dav1d libdav1d.so.0.0.1 [.] av1_init_ref_mv_tile_row 0.90% dav1d ld-musl-x86_64.so.1 [.] memset 0.85% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.78% dav1d libdav1d.so.0.0.1 [.] decode_coefs.constprop.7 0.73% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.69% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_8bpc 0.65% dav1d libdav1d.so.0.0.1 [.] avg_c 0.61% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.3
- Shiz changed the description
changed the description
- Maintainer
- hardware: RK3399 aka OP1 2xCortex-A72, 4xCortex-A53
- software: ChromeOS Linux container
- dav1d revision: fd120ba (2018-09-23)
- dav1d command line: time ./dav1d -i ~/samples/Chimera-AV1-8bit-1920x1080-6736kbps.ivf --framethreads 3 --tilethreads 3 --muxer yuv4mpeg2 -o /dev/null
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 8929 / 945 = 9.45
- perf report:
Overhead Command Shared Object Symbol 20.86% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_8x8_c 9.42% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 9.24% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 9.08% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 5.13% dav1d libdav1d.so.0.0.1 [.] wiener_c 2.62% dav1d libc-2.24.so [.] memcpy 2.33% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 2.28% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.0 1.96% dav1d libdav1d.so.0.0.1 [.] decode_b 1.75% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.4 1.71% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.6 1.70% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.65% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_cdf_q15 1.46% dav1d libdav1d.so.0.0.1 [.] loop_filter 1.44% dav1d libdav1d.so.0.0.1 [.] av1_init_ref_mv_tile_row 1.40% dav1d libdav1d.so.0.0.1 [.] av1_find_mv_refs 1.25% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1.11% dav1d libc-2.24.so [.] __GI_memset 1.04% dav1d libdav1d.so.0.0.1 [.] blend_c 1.03% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.7 0.93% dav1d libdav1d.so.0.0.1 [.] avg_c 0.88% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.88% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0.86% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_8bpc 0.80% dav1d libdav1d.so.0.0.1 [.] prep_c 0.80% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.78% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.3 0.77% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.11 0.76% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.2 0.68% dav1d libdav1d.so.0.0.1 [.] recon_b_inter_8bpc 0.65% dav1d libdav1d.so.0.0.1 [.] decode_sb 0.63% dav1d libdav1d.so.0.0.1 [.] recon_b_intra_8bpc 0.62% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0.61% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0.58% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.3 0.50% dav1d libdav1d.so.0.0.1 [.] w_mask_c.constprop.1 0.50% dav1d libdav1d.so.0.0.1 [.] prepare_intra_edges_8bpc 0.48% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.47% dav1d libdav1d.so.0.0.1 [.] inv_adst8_1d 0.46% dav1d libdav1d.so.0.0.1 [.] read_coef_blocks_8bpc 0.42% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.10 0.39% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8_c 0.38% dav1d libdav1d.so.0.0.1 [.] inv_dct4_1d 0.34% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.1 0.30% dav1d libdav1d.so.0.0.1 [.] mask_c 0.29% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.constprop.1 0.28% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.constprop.0 0.28% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0.27% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3 0.26% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter
- hardware: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4C/8T) + 16 GiB RAM
- software: Fedora Linux 28, GCC 8.1.1
- dav1d revision: 5e05e657 (From: https://copr.fedorainfracloud.org/coprs/eclipseo/dav1d/build/801862/)
- dav1d command line: dav1d --framethreads 8 --tilethreads 2 -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o Chimera-AV1-8bit-1920x1080-6736kbps.y4m
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 8929 / 403.36 = 22.137 fps
- perf report:
38,76% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c 11,00% dav1d libdav1d.so.0.0.1 [.] wiener_c 7,05% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 5,37% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 5,31% dav1d libdav1d.so.0.0.1 [.] loopfilter.c 4,20% dav1d libdav1d.so.0.0.1 [.] cdef.c 2,56% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3 2,23% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.2 2,17% dav1d libdav1d.so.0.0.1 [.] decode_b 1,61% dav1d libdav1d.so.0.0.1 [.] update_cdf 1,06% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1,05% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_cdf_q15 1,01% dav1d libdav1d.so.0.0.1 [.] filter_intra_c 0,77% dav1d libdav1d.so.0.0.1 [.] od_ec_dec_normalize 0,76% dav1d libdav1d.so.0.0.1 [.] motion_field_projection 0,73% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0,72% dav1d libc-2.27.so [.] __memmove_avx_unaligned_erms 0,71% dav1d libdav1d.so.0.0.1 [.] avg_c 0,59% dav1d libdav1d.so.0.0.1 [.] blend_c 0,59% dav1d libdav1d.so.0.0.1 [.] av1_find_mv_refs 0,52% dav1d libdav1d.so.0.0.1 [.] lf_apply.c 0,51% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0,50% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0,48% dav1d libdav1d.so.0.0.1 [.] looprestoration.c 0,46% dav1d libdav1d.so.0.0.1 [.] mc.c 0,42% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_bool_q15 0,41% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0,39% dav1d libdav1d.so.0.0.1 [.] recon_b_inter_8bpc 0,38% dav1d libdav1d.so.0.0.1 [.] read_coef_blocks_8bpc 0,37% dav1d libdav1d.so.0.0.1 [.] recon_b_intra_8bpc 0,37% dav1d libdav1d.so.0.0.1 [.] add_tpl_ref_mv 0,34% dav1d libdav1d.so.0.0.1 [.] inv_adst8_1d 0,33% dav1d libdav1d.so.0.0.1 [.] w_mask_c 0,31% dav1d libdav1d.so.0.0.1 [.] ref_mvs.c 0,31% dav1d libdav1d.so.0.0.1 [.] restore2x8 0,27% dav1d libdav1d.so.0.0.1 [.] cdef_apply.c 0,26% dav1d libdav1d.so.0.0.1 [.] decode_sb 0,25% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 0,24% dav1d libdav1d.so.0.0.1 [.] ipred_prepare.c 0,24% dav1d libdav1d.so.0.0.1 [.] itx.c 0,23% dav1d libdav1d.so.0.0.1 [.] add_ref_mv_candidate.isra.18 0,23% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8_c 0,20% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0,18% dav1d libdav1d.so.0.0.1 [.] read_coef_tree 0,18% dav1d libdav1d.so.0.0.1 [.] mc.isra.4 0,18% dav1d libdav1d.so.0.0.1 [.] lf_mask.c 0,17% dav1d libdav1d.so.0.0.1 [.] mask_c 0,16% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8t_c 0,16% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra 0,16% dav1d libdav1d.so.0.0.1 [.] inv_dct64_1d
Edited by Robert-André Mauchin- Jean-Baptiste Kempf added performance label
added performance label
- Reporter
- hardware: AMD RYZEN 7 2700X 8-Core 3.7 GHz (8C/16T), 16GB DDR4
- software: Windows 10 build 17134
- dav1d revision: dav1d-0.0.1-37-9075f0ee (SmilingWolf's build)
-
dav1d command line:
.\dav1d.exe -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o out.y4m --framethreads 8 --tilethreads 2
- input file: Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 48.3 fps
deab2534 update: 52.2 fps on the same setup, 54.2 fps with
--framethreads 16 --tilethreads 2
. ba789ebf: 69.3 fps with--framethreads 16 --tilethreads 2
.bfd16f58 (!253 (merged)): 116.7 fps on Chimera. 133.5 fps on Oliver in 1080p.
480p: Oliver decodes at 443.2 fps, Mountain Bike at 351.5 fps.
720p: 261.5 fps for Oliver, 211.1 fps for Mountain Bike.Adding more frame threads can bump performance up a bit (~11% from 16 to 100 with 720p Mountain Bike).
Using--muxer null -o -
bumps Chimera up to about 226.5 fps.Edited by Raphaël Zumer - Contributor
Not directly comparable to my previous run as that one accidentally used a 10 bit video.
- hardware: AMD Threadripper 2990WX, 64GB RAM
- dav1d revision: dd576607
- dav1d command line: ./dav1d -i ~/Videos/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --muxer yuv --framethreads 16 --tilethreads 2
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 48.7 fps
libaom gives 53.6 fps.
-
hardware: HP 15-AF116NT AMD A6-6310 APU 4GB, 4 cores, 4 threads (more info on https://support.hp.com/ie-en/document/c05055077#AbT0)
-
software: Ubuntu 18.04 amd64, gcc 7.3.0
-
dav1d revision: 73d5a46c
-
dav1d command line:
./dav1d -i ../../Chimera-AV1-8bit-1920x1080-6736kbps.ivf --framethreads 2 --tilethreads 2 -o Chimera-AV1-8bit-1920x1080-6736kbps.y4m
-
input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
-
framerate: 8929/1319,831=6,77
-
(optional) perf report:
Overhead Command Shared Object Symbol
........ ....... .................. ...............................................
21.46% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c 9.73% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 8.40% dav1d libdav1d.so.0.0.1 [.] wiener_c 7.92% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 7.14% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 3.19% dav1d libdav1d.so.0.0.1 [.] decode_b 2.86% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.0 2.37% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 1.76% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.69% dav1d libdav1d.so.0.0.1 [.] msac_decode_symbol 1.65% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.6 1.46% dav1d libdav1d.so.0.0.1 [.] setup_ref_mv_list 1.40% dav1d libc-2.27.so [.] __memcpy_ssse3 1.33% dav1d libdav1d.so.0.0.1 [.] loop_filter 1.13% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.4 1.08% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1.02% dav1d libdav1d.so.0.0.1 [.] av1_init_ref_mv_tile_row 1.02% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.7 0.92% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.15 0.87% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.77% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.2 0.76% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0.76% dav1d libdav1d.so.0.0.1 [.] read_coef_blocks_8bpc 0.75% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.73% dav1d libdav1d.so.0.0.1 [.] avg_c 0.67% dav1d libdav1d.so.0.0.1 [.] blend_c 0.63% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.3 0.62% dav1d libdav1d.so.0.0.1 [.] recon_b_inter_8bpc 0.56% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0.55% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.3 0.53% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0.53% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 0.50% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.14 0.47% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0.46% dav1d libdav1d.so.0.0.1 [.] prep_c 0.46% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_8bpc 0.40% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0.38% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8_c 0.37% dav1d libdav1d.so.0.0.1 [.] inv_adst8_1d 0.36% dav1d libdav1d.so.0.0.1 [.] w_mask_c.constprop.1 0.36% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.36% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.1 0.33% dav1d libdav1d.so.0.0.1 [.] recon_b_intra_8bpc 0.31% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0.constprop.2 0.31% dav1d libdav1d.so.0.0.1 [.] decode_sb 0.31% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0.constprop.1 0.30% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra 0.30% dav1d libdav1d.so.0.0.1 [.] add_ref_mv_candidate.isra.16
-
-
hardware: HP Compaq 6730s with Intel Celeron T1600 CPU 1.66 Ghz dual core, 2 threads, 3 Gb RAM (more info on https://support.hp.com/us-en/document/c01531960)
-
software: Ubuntu 18.04 amd64, gcc 7.3.0
-
dav1d revision: 73d5a46c
-
dav1d command line:
./dav1d -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --framethreads 1 --tilethreads 1 -o Chimera-AV1-8bit-1920x1080-6736kbps-2.y4m
-
input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
-
framerate: 8929/((48×60)+28,704)= 3,07
-
(optional) perf report:
Overhead Command Shared Object Symbol
........ ....... .................. ...............................................
24.09% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c 10.34% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 8.84% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 7.41% dav1d libdav1d.so.0.0.1 [.] wiener_c 6.42% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 2.95% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 2.83% dav1d libdav1d.so.0.0.1 [.] decode_b 2.77% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.0 2.03% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.90% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.6 1.50% dav1d libc-2.27.so [.] __memcpy_ssse3 1.50% dav1d libdav1d.so.0.0.1 [.] msac_decode_symbol 1.49% dav1d libdav1d.so.0.0.1 [.] setup_ref_mv_list 1.28% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.7 1.17% dav1d libdav1d.so.0.0.1 [.] loop_filter 1.06% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.4 0.96% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 0.89% dav1d libdav1d.so.0.0.1 [.] recon_b_inter_8bpc 0.87% dav1d libdav1d.so.0.0.1 [.] avg_c 0.79% dav1d libdav1d.so.0.0.1 [.] av1_init_ref_mv_tile_row 0.77% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.74% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0.72% dav1d libdav1d.so.0.0.1 [.] blend_c 0.71% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.70% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.3 0.66% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 0.57% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.3 0.55% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.15 0.51% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0.49% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0.49% dav1d libdav1d.so.0.0.1 [.] recon_b_intra_8bpc 0.47% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.2 0.42% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0.41% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_8bpc 0.40% dav1d libdav1d.so.0.0.1 [.] w_mask_c.constprop.1 0.39% dav1d [kernel.kallsyms] [k] copy_user_generic_unrolled 0.38% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.36% dav1d libdav1d.so.0.0.1 [.] prep_c 0.35% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0.34% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.1 0.33% dav1d libdav1d.so.0.0.1 [.] inv_adst8_1d 0.33% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.14 0.31% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0.constprop.2 0.29% dav1d libdav1d.so.0.0.1 [.] prepare_intra_edges_8bpc 0.27% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0.constprop.1 0.27% dav1d libdav1d.so.0.0.1 [.] add_ref_mv_candidate.isra.16 0.25% dav1d libdav1d.so.0.0.1 [.] decode_sb 0.24% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra
-
- hardware: Intel Pentium J3160, 8GB RAM
- software: Fedora 28 (latest updates)
- dav1d revision: 858689e1
-
dav1d command line:
dav1d -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --muxer yuv4mpeg2 --framethreads [2|4] --tilethreads [1|2|4]
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
-
framerate: 2FT/2TT=5.5 fps, 4FT/1TT=6.6 fps, 4FT/2T=8.2 fps, 4FT/4TT=7.97 fps (aomdec, d11a4b7d1, averaged 12.9 fps with
--threads=8
) - perf report:
25.15% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c 10.85% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 10.37% dav1d libdav1d.so.0.0.1 [.] wiener_c 7.99% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 6.35% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 3.04% dav1d libdav1d.so.0.0.1 [.] loop_filter 2.69% dav1d libdav1d.so.0.0.1 [.] decode_b 2.60% dav1d libdav1d.so.0.0.1 [.] loop_filter.constprop.0 2.02% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 1.78% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.63% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.6 1.41% dav1d libdav1d.so.0.0.1 [.] msac_decode_symbol 1.41% dav1d libdav1d.so.0.0.1 [.] av1_find_mv_refs 1.35% dav1d libc-2.27.so [.] __memmove_sse2_unaligned_erms 1.13% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1.12% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.6 1.11% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3.constprop.7 0.96% dav1d libdav1d.so.0.0.1 [.] av1_fill_motion_field 0.85% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.81% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0.76% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.71% dav1d libdav1d.so.0.0.1 [.] blend_c 0.62% dav1d libdav1d.so.0.0.1 [.] filter_intra_c.constprop.15 0.58% dav1d libdav1d.so.0.0.1 [.] recon_b_inter_8bpc 0.58% dav1d libdav1d.so.0.0.1 [.] read_coef_blocks_8bpc 0.55% dav1d libdav1d.so.0.0.1 [.] selfguided_c 0.54% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.54% dav1d libdav1d.so.0.0.1 [.] avg_c 0.54% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0.51% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0.51% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_8bpc
Edited by Gideon Mayhak- hardware: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4C/8T) + 16 GiB RAM
- software: Fedora Linux 29 Beta, GCC 8.2.1
- dav1d revision: da97ba3f (From: https://copr.fedorainfracloud.org/coprs/eclipseo/dav1d/build/805946/)
- dav1d command line: dav1d --framethreads 8 --tilethreads 2 -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o Chimera-AV1-8bit-1920x1080-6736kbps.y4m
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 8929 / 319.16 = 27.977 fps
- perf report:
39,07% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c 14,88% dav1d libdav1d.so.0.0.1 [.] wiener_c 7,54% dav1d libdav1d.so.0.0.1 [.] loop_filter 6,03% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 3,69% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3 3,14% dav1d libdav1d.so.0.0.1 [.] decode_b 2,30% dav1d libdav1d.so.0.0.1 [.] update_cdf 1,53% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_cdf_q15 1,43% dav1d libdav1d.so.0.0.1 [.] filter_intra_c 1,13% dav1d libdav1d.so.0.0.1 [.] motion_field_projection 1,11% dav1d libdav1d.so.0.0.1 [.] od_ec_dec_normalize 0,85% dav1d libc-2.28.so [.] __memmove_avx_unaligned_erms 0,85% dav1d libdav1d.so.0.0.1 [.] av1_find_mv_refs 0,84% dav1d libdav1d.so.0.0.1 [.] blend_c 0,75% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_8bpc 0,65% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0 0,63% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_bool_q15 0,61% dav1d libdav1d.so.0.0.1 [.] dav1d_prep_8tap_avx2.hv_w8_loop 0,60% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0,54% dav1d libdav1d.so.0.0.1 [.] dav1d_read_coef_blocks_8bpc 0,53% dav1d libdav1d.so.0.0.1 [.] add_tpl_ref_mv 0,51% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_intra_8bpc 0,45% dav1d libdav1d.so.0.0.1 [.] get_mv_projection.isra.11 0,45% dav1d libdav1d.so.0.0.1 [.] dav1d_put_8tap_avx2.hv_w8_loop 0,44% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_inter_8bpc 0,42% dav1d libdav1d.so.0.0.1 [.] backup2x8 0,40% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 0,37% dav1d libdav1d.so.0.0.1 [.] decode_sb 0,37% dav1d libdav1d.so.0.0.1 [.] restore2x8 0,34% dav1d libdav1d.so.0.0.1 [.] add_ref_mv_candidate.isra.18 0,34% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8_c 0,33% dav1d libdav1d.so.0.0.1 [.] dav1d_prepare_intra_edges_8bpc 0,29% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0,28% dav1d libdav1d.so.0.0.1 [.] mc.isra.4 0,26% dav1d libdav1d.so.0.0.1 [.] read_coef_tree 0,26% dav1d libdav1d.so.0.0.1 [.] decomp_tx 0,23% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra 0,22% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8t_c 0,21% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0,21% dav1d libdav1d.so.0.0.1 [.] smooth_c 0,20% dav1d libc-2.28.so [.] __memset_avx2_unaligned_erms 0,17% dav1d libdav1d.so.0.0.1 [.] scan_row_mbmi.isra.20 0,15% dav1d libdav1d.so.0.0.1 [.] obmc 0,15% dav1d libdav1d.so.0.0.1 [.] scan_col_mbmi.isra.21 0,13% dav1d libdav1d.so.0.0.1 [.] dav1d_put_8tap_avx2.hv_w8_loop0 0,12% dav1d libdav1d.so.0.0.1 [.] od_ec_dec_refill 0,11% dav1d libdav1d.so.0.0.1 [.] dav1d_prep_8tap_avx2.hv_w8_loop0 0,10% dav1d libdav1d.so.0.0.1 [.] v_c 0,10% dav1d libdav1d.so.0.0.1 [.] dav1d_prep_8tap_avx2.h_loop 0,09% dav1d libdav1d.so.0.0.1 [.] dav1d_thread_picture_wait
Impressive 6 fps gain! Keep up the good work!
-
hardware: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz + 8GB DDR3 RAM
-
software: Arch Linux
-
dav1d revision: acd90b71
-
dav1d command line: dav1d -i red.ivf -o /dev/null --framethreads=4 --muxer yuv4mpeg2
-
input file: RED HDR Reel 1080p (youtube-dl -f 399 https://www.youtube.com/watch?v=PiWyCQV52h0)
-
framerate: 2380/(1m6.543s)=35.77fps
-
- Owner
Testing again all your past samples, now that some CFEF asm got merged would be very useful
- Maintainer
cdef avx2 simd is not completely merged and !253 (merged) is more important.
- Reporter
I edited my previous report with a new measurement on the current master version and the current state of !253 (merged).
Edited by Raphaël Zumer - Owner
!253 (merged) is merged :)
- Contributor
- hardware: AMD Threadripper 2990WX, 64GB RAM
- dav1d revision: 46a3fd20
- dav1d command line: ./dav1d -i ~/Videos/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --muxer yuv --framethreads 16 --tilethreads 2
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 139.4 fps
On a whim I also tried out maxing the number of frame threads.
- dav1d command line: ./dav1d -i ~/Videos/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --muxer yuv --framethreads 256 --tilethreads 4
- framerate: 322.5 fps
EDIT: ran wrong binary, fps numbers updated.
Edited by Thomas Daede - Owner
framerate: 139.4 fps
It was 48fps in your post, one month ago. :)
- hardware: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4C/8T) + 16 GiB RAM
- software: Fedora Linux 29, GCC 8.2.1
- dav1d revision: 8fbd87e5 (From: https://copr.fedorainfracloud.org/coprs/eclipseo/dav1d/build/816870/)
- dav1d command line: dav1d --framethreads 8 --tilethreads 2 -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o Chimera-AV1-8bit-1920x1080-6736kbps.y4m
- input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
- framerate: 8929 / 99.38 = 89.847 fps
- perf report:
10,88% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.3 9,59% dav1d libdav1d.so.0.0.1 [.] decode_b 6,38% dav1d libdav1d.so.0.0.1 [.] update_cdf 5,56% dav1d libdav1d.so.0.0.1 [.] msac_decode_symbol 4,69% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_filter_8x8_avx2.k_loop 3,50% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 3,36% dav1d libdav1d.so.0.0.1 [.] motion_field_projection 2,81% dav1d libdav1d.so.0.0.1 [.] od_ec_decode_bool_q15 2,76% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_filter_4x4_avx2.k_loop 2,64% dav1d libdav1d.so.0.0.1 [.] blend_c 2,11% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0 1,84% dav1d libdav1d.so.0.0.1 [.] dav1d_prep_8tap_avx2.hv_w8_loop 1,66% dav1d libdav1d.so.0.0.1 [.] dav1d_read_coef_blocks_8bpc 1,54% dav1d libdav1d.so.0.0.1 [.] add_tpl_ref_mv 1,51% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_intra_8bpc 1,48% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_dir_avx2 1,38% dav1d libc-2.28.so [.] __memmove_avx_unaligned_erms 1,35% dav1d libdav1d.so.0.0.1 [.] get_mv_projection.isra.15 1,34% dav1d libdav1d.so.0.0.1 [.] dav1d_put_8tap_avx2.hv_w8_loop 1,29% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_inter_8bpc 1,21% dav1d libdav1d.so.0.0.1 [.] dav1d_lpf_h_sb_y_avx2.loop 1,15% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 1,07% dav1d libdav1d.so.0.0.1 [.] decode_sb 1,04% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8_c 0,98% dav1d libdav1d.so.0.0.1 [.] dav1d_prepare_intra_edges_8bpc 0,98% dav1d libdav1d.so.0.0.1 [.] add_ref_mv_candidate.isra.16 0,97% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0,93% dav1d libdav1d.so.0.0.1 [.] dav1d_wiener_filter_h_avx2.main_loop 0,91% dav1d libdav1d.so.0.0.1 [.] mc.isra.4 0,84% dav1d libdav1d.so.0.0.1 [.] dav1d_wiener_filter_v_avx2.loop 0,81% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0,79% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra 0,78% dav1d libdav1d.so.0.0.1 [.] decomp_tx 0,78% dav1d libdav1d.so.0.0.1 [.] read_coef_tree 0,77% dav1d libdav1d.so.0.0.1 [.] backup2x8 0,71% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8t_c 0,59% dav1d libc-2.28.so [.] __memset_avx2_unaligned_erms 0,57% dav1d libdav1d.so.0.0.1 [.] dav1d_lpf_v_sb_y_avx2.loop 0,55% dav1d libdav1d.so.0.0.1 [.] dav1d_lpf_h_sb_uv_avx2.loop 0,51% dav1d libdav1d.so.0.0.1 [.] scan_row_mbmi.isra.17 0,45% dav1d libdav1d.so.0.0.1 [.] od_ec_dec_refill 0,44% dav1d libdav1d.so.0.0.1 [.] obmc 0,43% dav1d libdav1d.so.0.0.1 [.] scan_col_mbmi.isra.18 0,40% dav1d libdav1d.so.0.0.1 [.] dav1d_lpf_h_sb_y_avx2.no_flat16 0,36% dav1d libdav1d.so.0.0.1 [.] dav1d_put_8tap_avx2.hv_w8_loop0 0,32% dav1d libdav1d.so.0.0.1 [.] dav1d_prep_8tap_avx2.hv_w8_loop0 0,31% dav1d libdav1d.so.0.0.1 [.] dav1d_loopfilter_sbrow_8bpc 0,30% dav1d libdav1d.so.0.0.1 [.] dav1d_prep_8tap_avx2.h_loop 0,28% dav1d libdav1d.so.0.0.1 [.] selfguided_c 0,25% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_filter_4x4_avx2.bottom_done
3 times the perf. Next time I'll test with a 2160p version.
- Contributor
All the used files and results: https://drive.google.com/drive/folders/1VVwyimg3IaHv-H6V1tffNQFHbFMGVVsD?usp=sharing
- hardware: HP ZBook 15 G2 (Core i7-4710MQ, 8GB DDR3)
- software: Windows 10 64-bit (Version 1809, Build 17763.55)
- dav1d revision: 2018-09-30 (1c9c2534), 2018-10-23 (5cf1cf1f) and 2018-10-30 (8cef1efc)
Build date Version 8-bit / ft 8, tt 2 8-bit / ft 16, tt 4 10-bit / ft 8, tt 2 10-bit / ft 16, tt 4 2018-09-30 1c9c2534 32,15 32,55 24,02 24,20 2018-10-23 5cf1cf1f 40,02 40,85 24,45 24,62 2018-10-30 8cef1efc 97,74 103,80 24,07 24,25 Tested one of today's builds with one of 7 days ago and one of 30 days ago on Windows. Performance more than tripled on 8-bit decoding, 10-bit didn't improve over the margin of error.
ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\Chimera-8b-1080p.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\Chimera-10b-1080p.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\Chimera-10b-1080p.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\Chimera-8b-1080p.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\Chimera-8b-1080p.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\Chimera-10b-1080p.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\Chimera-10b-1080p.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\Chimera-8b-1080p.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\Chimera-8b-1080p.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\Chimera-10b-1080p.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 60 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\Chimera-10b-1080p.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME
Edited by Ewout ter Hoeven - Contributor
Also ran some 4k and 5k files (same machine). They have an extremely low bit-rate (a 32 hour mistake in ffmpeg encoding parameters...), so this enables relatively high performance. Files are added to the Google Drive link above.
Build date Version 4k / ft 8, tt 2 4k / ft 16, tt 4 5k / ft 8, tt 2 5k / ft 16, tt 4 2018-09-30 1c9c2534 28,39 29,64 17,06 17,83 2018-10-23 5cf1cf1f 32,64 34,35 18,85 20,05 2018-10-30 8cef1efc 60,42 59,37 35,52 36,20 ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\av1_test_Morocco8K_3840.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\av1_test_Morocco8K_3840.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\av1_test_Morocco8K_5120.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\av1_test_Morocco8K_5120.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\av1_test_Morocco8K_3840.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\av1_test_Morocco8K_3840.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\av1_test_Morocco8K_5120.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\av1_test_Morocco8K_5120.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\av1_test_Morocco8K_3840.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\av1_test_Morocco8K_3840.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\av1_test_Morocco8K_5120.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\av1_test_Morocco8K_5120.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME
Edited by Ewout ter Hoeven - Contributor
Decoded a higher bitrate 4K video from Elecard (Summer Nature). Files are in the Drive. Direct link to .ivf.
Build date Version 4k / ft 8, tt 2 4k / ft 16, tt 4 2018-09-30 1c9c2534 13,85 14,46 2018-10-23 5cf1cf1f 18,33 18,54 2018-10-30 8cef1efc 29,44 31,09 ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\AV1_4k_23mbps.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_0930\bin\dav1d -i C:\AV1_4k_23mbps.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\AV1_4k_23mbps.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1023\bin\dav1d -i C:\AV1_4k_23mbps.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\AV1_4k_23mbps.ivf -o NUL.y4m --framethreads 8 --tilethreads 2 & ECHO:| TIME & sleep 40 & ^ ECHO:| TIME & C:\dav1d_install_1030\bin\dav1d -i C:\AV1_4k_23mbps.ivf -o NUL.y4m --framethreads 16 --tilethreads 4 & ECHO:| TIME
Edited by Ewout ter Hoeven - Contributor
Frame threads (vertical) vs tile threads (horizontal) with 8cef1efc. Same system (Core i7-4710MQ, 4-core 8-thread), Chimera 8-bit 1080p. Data.
1 2 4 8 4 72,17 92,38 97,56 98,21 8 83,00 97,74 105,52 105,23 16 87,59 96,70 104,65 104,92 32 90,59 97,37 103,13 99,38 Edited by Ewout ter Hoeven - Owner
105 fps is nice for 1080p :)
4K, being 4 times bigger, is nice that you get around 31fps.
- Contributor
@jbk the progress for 8-bit content is outstanding indeed!
My next idea is to test the same video with different bitrates, but that takes some more time because I need encodes first. Any other tests that would be useful?
- Owner
@EwoutH What is the CPU usage, when you get the 105fps? All cores are full?
- Contributor
@jbk Yes (screenshot). RAM usage between 310 and 320 MB.
C:\dav1d_install_1030\bin\dav1d -i C:\Chimera-8b-1080p.ivf -o NUL.y4m --framethreads 8 --tilethreads 4
Edited by Ewout ter Hoeven - Contributor
Also tested a 32-bit build (8cef1efc) vs the same 64-bit build. Performance is about a quarter to a third on 8-bit content and about 20% lower on 10-bit content. Used commands and output.
Build date Build Version 8-bit / ft 8, tt 2 8-bit / ft 16, tt 4 10-bit / ft 8, tt 2 10-bit / ft 16, tt 4 Mor3840 / ft 8, tt 2 Mor3840 / ft 16, tt 4 Mor5120 / ft 8, tt 2 Mor5120 / ft 16, tt 4 4K_23 / ft 8, tt 2 4K_23 / ft 16, tt 4 2018-10-30 8cef1efc 64-bit 97,74 103,80 24,07 24,25 60,42 59,37 35,52 36,20 29,44 31,09 2018-10-30 8cef1efc 32-bit 24,75 23,86 19,36 19,74 22,54 failed failed failed 9,23 failed Some 4k and 5k runs failed a failure to allocate enough memory. The error messages:
Failed to allocate memory of size 12533760: Not enough space Decoded 3/960 frames (0.3%) Failed to allocate memory of size 22609920: Not enough space Decoded 1/3604 frames (0.0%)Failed to allocate memory of size 12533760: Not enough space
Edited by Ewout ter Hoeven - Owner
It would mean the 32bits version does not have AVX-2
- Contributor
@jbk It certainly looks like it. I used the same system (Core i7-4710MQ, Windows 10 64-bit) so the results should give a fair comparison between 32- and 64-bit builds. Shall I open a issue about the missing AVX-2 support?
No idea why, but in my tests I've seen no enhancements at all. I'm pasting the details below.
-
hardware: HP Compaq 6730s with Intel Celeron T1600 CPU 1.66 Ghz dual core, 2 threads, 3 Gb RAM
-
software: Ubuntu 18.04 amd64
-
dav1d revision: 8cef1efc
-
dav1d command line: ./dav1d -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --framethreads 1 --tilethreads 1 -o Chimera-AV1-8bit-1920x1080-6736kbps-2.y4m
-
input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
-
framerate: 8929/48m0,827s = 3,099
-
(optional) perf report:
Overhead Command Shared Object Symbol
........ ....... .................. ...............................................
23.90% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c 10.22% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 8.99% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 7.62% dav1d libdav1d.so.0.0.1 [.] wiener_c 6.52% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 5.73% dav1d libdav1d.so.0.0.1 [.] loop_filter 3.03% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 2.80% dav1d libdav1d.so.0.0.1 [.] decode_b 2.07% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 1.95% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.83% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.2.constprop.5 1.34% dav1d libdav1d.so.0.0.1 [.] msac_decode_symbol 1.31% dav1d libc-2.27.so [.] __memcpy_ssse3 1.26% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.2.constprop.6 1.08% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.4 0.98% dav1d libdav1d.so.0.0.1 [.] ipred_filter_c 0.97% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 0.90% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_inter_8bpc 0.87% dav1d libdav1d.so.0.0.1 [.] avg_c 0.80% dav1d libdav1d.so.0.0.1 [.] av1_init_ref_mv_tile_row 0.73% dav1d libdav1d.so.0.0.1 [.] blend_c 0.72% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.65% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.57% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.3 0.53% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0.50% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0.48% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_intra_8bpc 0.44% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0.42% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0.40% dav1d [kernel.kallsyms] [k] copy_user_generic_unrolled 0.40% dav1d libdav1d.so.0.0.1 [.] w_mask_c.constprop.1 0.40% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0.39% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.38% dav1d libdav1d.so.0.0.1 [.] prep_c 0.34% dav1d libdav1d.so.0.0.1 [.] inv_adst8_1d 0.31% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0.constprop.2 0.29% dav1d libdav1d.so.0.0.1 [.] dav1d_prepare_intra_edges_8bpc 0.28% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0.constprop.1 0.27% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra
And the second box:
-
hardware: HP 15-AF116NT AMD A6-6310 APU 4GB, 4 cores, 4 threads
-
software: Ubuntu 18.04 amd64
-
dav1d revision: 8cef1efc
-
dav1d command line: ./dav1d -i ../../Chimera-AV1-8bit-1920x1080-6736kbps.ivf --framethreads 2 --tilethreads 2 -o Chimera-AV1-8bit-1920x1080-6736kbps.y4m
-
input file: http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Chimera/Chimera-AV1-8bit-1920x1080-6736kbps.ivf
-
framerate: 8929/22m13,625s = 6,695
-
(optional) perf report:
Overhead Command Shared Object Symbol
........ ....... .................. ...............................................
21.49% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c 9.56% dav1d libdav1d.so.0.0.1 [.] cdef_filter_block_c.constprop.1 8.21% dav1d libdav1d.so.0.0.1 [.] wiener_c 7.94% dav1d libdav1d.so.0.0.1 [.] prep_8tap_c 7.14% dav1d libdav1d.so.0.0.1 [.] put_8tap_c 6.62% dav1d libdav1d.so.0.0.1 [.] loop_filter 3.30% dav1d libdav1d.so.0.0.1 [.] decode_b 2.68% dav1d libdav1d.so.0.0.1 [.] cdef_find_dir_c 1.86% dav1d libdav1d.so.0.0.1 [.] av1_find_ref_mvs 1.63% dav1d libdav1d.so.0.0.1 [.] update_cdf 1.59% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.2.constprop.5 1.54% dav1d libdav1d.so.0.0.1 [.] ipred_filter_c 1.46% dav1d libdav1d.so.0.0.1 [.] msac_decode_symbol 1.39% dav1d libc-2.27.so [.] __memcpy_ssse3 1.11% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.4 1.06% dav1d libdav1d.so.0.0.1 [.] inv_adst16_1d 1.04% dav1d libdav1d.so.0.0.1 [.] decode_coefs.isra.2.constprop.6 1.00% dav1d libdav1d.so.0.0.1 [.] av1_init_ref_mv_tile_row 0.74% dav1d libdav1d.so.0.0.1 [.] inv_dct16_1d 0.72% dav1d libdav1d.so.0.0.1 [.] avg_c 0.71% dav1d libdav1d.so.0.0.1 [.] msac_decode_bool 0.67% dav1d libdav1d.so.0.0.1 [.] blend_c 0.65% dav1d libdav1d.so.0.0.1 [.] dav1d_read_coef_blocks_8bpc 0.62% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_inter_8bpc 0.57% dav1d libdav1d.so.0.0.1 [.] dav1d_cdef_brow_8bpc 0.55% dav1d libdav1d.so.0.0.1 [.] inv_dct8_1d 0.54% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.3 0.52% dav1d libdav1d.so.0.0.1 [.] inv_dct32_1d 0.51% dav1d libdav1d.so.0.0.1 [.] prep_c 0.46% dav1d libdav1d.so.0.0.1 [.] mask_edges_inter 0.40% dav1d libdav1d.so.0.0.1 [.] decode_sb 0.40% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_inter 0.38% dav1d libdav1d.so.0.0.1 [.] warp_affine_8x8_c 0.36% dav1d libdav1d.so.0.0.1 [.] inv_adst8_1d 0.36% dav1d libdav1d.so.0.0.1 [.] dav1d_recon_b_intra_8bpc 0.35% dav1d libdav1d.so.0.0.1 [.] w_mask_c.constprop.1 0.35% dav1d libdav1d.so.0.0.1 [.] inv_txfm_add_c.isra.0.constprop.5 0.32% dav1d libdav1d.so.0.0.1 [.] add_ref_mv_candidate.isra.14 0.32% dav1d libdav1d.so.0.0.1 [.] dav1d_create_lf_mask_intra 0.31% dav1d libdav1d.so.0.0.1 [.] selfguided_filter.isra.0.constprop.2
-
- Owner
@cardpuncher your CPU does not have AVX-2. (It does not even have AVX or SSE4)
- Ewout ter Hoeven mentioned in issue #139 (closed)
mentioned in issue #139 (closed)
- Thierry mentioned in issue #142 (closed)
mentioned in issue #142 (closed)
- Contributor
All the used files and results: https://drive.google.com/drive/folders/1VVwyimg3IaHv-H6V1tffNQFHbFMGVVsD?usp=sharing
- hardware: HP ZBook 15 G2 (Core i7-4710MQ, 8GB DDR3)
- software: Windows 10 64-bit (Version 1809, Build 17763.55)
- dav1d revision: 2018-09-30 (1c9c2534), 2018-10-23 (5cf1cf1f), 2018-10-30 (8cef1efc) and 2018-11-05 (44ad79e9)
Build date Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p 2018-09-30 1c9c2534 32,5 24,2 14,5 88,8 2018-10-23 5cf1cf1f 40,8 24,6 18,5 144,2 2018-10-30 8cef1efc 103,8 24,2 31,1 144,4 2018-11-05 44ad79e9 106,0 23,6 33,4 173,2 And normalized to last weeks build:
Build date Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p 2018-09-30 1c9c2534 31,4% 99,8% 46,5% 61,5% 2018-10-23 5cf1cf1f 39,4% 101,5% 59,6% 99,9% 2018-10-30 8cef1efc 100,0% 100,0% 100,0% 100,0% 2018-11-05 44ad79e9 102,1% 97,5% 107,4% 120,0% Edited by Ewout ter Hoeven - Owner
Could you include a comparison to 24518a7e ?
- Contributor
@jbk Here you go (thanks to u/MrSmilingWolf for the build)
Build date Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p 2018-09-27 24518a7e 27,8 22,5 10,3 63,4 2018-09-30 1c9c2534 32,5 24,2 14,5 88,8 2018-10-23 5cf1cf1f 40,8 24,6 18,5 144,2 2018-10-30 8cef1efc 103,8 24,2 31,1 144,4 2018-11-05 44ad79e9 106,0 23,6 33,4 173,2 Build date Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p 2018-09-27 24518a7e 100,0% 100,0% 100,0% 100,0% 2018-09-30 1c9c2534 117,2% 107,5% 140,6% 140,1% 2018-10-23 5cf1cf1f 147,1% 109,3% 180,3% 227,4% 2018-10-30 8cef1efc 373,9% 107,7% 302,4% 227,6% 2018-11-05 44ad79e9 381,9% 105,0% 324,7% 273,0% Edited by Ewout ter Hoeven - Contributor
A comparison yesterday's build of aomdec to dav1d with ffmpeg (all files).
- hardware: HP ZBook 15 G2 (Core i7-4710MQ, 8GB DDR3)
- software: Windows 10 64-bit (Version 1809, Build 17763.55) | ffmpeg version N-92374-gd96ae9d5ea
Commands used:
ffmpeg -c:v libaom-av1 -hide_banner -benchmark -y -i video.ivf NUL.yuv ffmpeg -c:v libdav1d -hide_banner -tilethreads 4 -benchmark -y -i video.ivf NUL.yuv
Results:
Decoder Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p libaom-av1 1.0.0-884-gf8b03215b 39,7 20,6 12,3 66,4 libdav1d bd747b11 106,6 22,4 29,8 158,6 And normalized:
Decoder Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p libaom-av1 1.0.0-884-gf8b03215b 100,0% 100,0% 100,0% 100,0% libdav1d bd747b11 268,6% 109,0% 242,4% 239,0% On average 114,7% faster, and if we ignore the 10-bit video even 150,0% faster!
Edited by Ewout ter Hoeven - Owner
@EwoutH are you comparing to the last libaom? Because 1.0.0 is old.
- Owner
Oh, f8b03215b is quite recent...
- Developer
Does that build have
CONFIG_LOWBITDEPTH=0
? - Contributor
Tested aomdec directly from aom master (without ffmpeg), results are a lot higher, especially with the Youtube video (Oliver 1080p). Now, dav1d is 77% faster on average (100% faster on average with only 8-bit content).
dav1d -i video.ivf -o NUL.y4m --framethreads 8 --tilethreads 4 aomdec --summary -t 8 -o NUL.y4m video.ivf
Decoder Build date Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p dav1d 2018-09-27 24518a7e 27,8 22,5 10,3 63,4 dav1d 2018-09-30 1c9c2534 32,5 24,2 14,5 88,8 dav1d 2018-10-23 5cf1cf1f 40,8 24,6 18,5 144,2 dav1d 2018-10-30 8cef1efc 103,8 24,2 31,1 144,4 dav1d 2018-11-05 44ad79e9 106,0 23,6 33,4 173,2 dav1d 2018-11-08 bbc11c99 113,2 23,8 33,6 187,6 aomdec 2018-11-07 1.0.0-895-g40582174a 48,3 21,8 16,8 113,1 Normalized:
Decoder Build date Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p Average dav1d 2018-09-27 24518a7e 57,5% 103,1% 61,1% 56,1% 69,4% dav1d 2018-09-30 1c9c2534 67,5% 110,8% 85,9% 78,5% 85,7% dav1d 2018-10-23 5cf1cf1f 84,7% 112,7% 110,1% 127,5% 108,8% dav1d 2018-10-30 8cef1efc 215,1% 111,0% 184,7% 127,6% 159,6% dav1d 2018-11-05 44ad79e9 219,7% 108,2% 198,3% 153,1% 169,8% dav1d 2018-11-08 bbc11c99 234,5% 108,9% 199,4% 165,8% 177,2% aomdec 2018-11-08 1.0.0-895-g40582174a 100,0% 100,0% 100,0% 100,0% 100,0% Edited by Ewout ter Hoeven - Owner
@EwoutH did you compile with LOW_BITDEPTH ?
- Contributor
CONFIG_LOWBITDEPTH=1 has been the default for a couple weeks: https://aomedia.googlesource.com/aom/+/4c118dc5e4765601e652612bbe9d1b9bdfb9e473
However, I see on the tracker that there have been some divergences between av1-normative and master, so there is a revert for that patch ready to merge: https://aomedia-review.googlesource.com/c/aom/+/74441
The 16,8 FPS for the 4K "Summer Nature" video are compatible with my own results:
CPU: Intel i7-4770 @ 3.40 GHz
Video: Summer Nature 4K from Elecard
Software: Win7 64bits, GCC 8.2aomdec: 1.0.0-895-g40582174a aomdec.exe --threads=8 --progress -o NUL Stream2_AV1_4K_22.7mbps.webm 3604 decoded frames/3604 showed frames in 150295827 us (23.98 fps)
dav1d: bbc11c9 time dav1d -q -i Stream2_AV1_4K_22.7mbps.ivf -o /dev/null --framethreads 8 --tilethreads 4 --muxer yuv4mpeg2 real 1m20,184s -> 44.9 FPS
Edited by SmilingWolf - Contributor
@jbk I used this build, in the source code it looks like it is enabled
-DCONFIG_LOWBITDEPTH=1"
.Meanwhile I ran the whole stack on a Ryzen 5 1600, Windows 10 system:
dav1d -i C:\Chimera-8b-1080p.ivf -o NUL.y4m --framethreads 12 --tilethreads 4 -q
Build date Build Chimera 8 Chimera 10 Elecard 4K Oliver 1080p Dua Lipa 1080p 2018-09-27 24518a7e 42,4 34,6 20,5 fail 50,2 2018-09-30 1c9c2534 46,2 35,6 24,6 fail fail 2018-10-23 5cf1cf1f 57,7 36,1 32,1 232,5 70,3 2018-10-30 8cef1efc 186,6 36,5 57,9 234,1 183,1 2018-11-05 44ad79e9 194,2 36,6 60,6 248,0 205,4 2018-11-08 bbc11c99 195,9 36,8 61,1 278,1 210,4 2018-11-08 e4fbbbce 197,6 36,7 61,9 278,0 213,3 4K at 60fps :)
Edited by Ewout ter Hoeven