- Jan 03, 2023
-
-
Performance has improved from 11.27fps to 20.50fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) hadamard_ac_8x8 117 21 hadamard_ac_8x16 236 42 hadamard_ac_16x8 235 31 hadamard_ac_16x16 473 60 intra_sad_x3_4x4 50 21 intra_sad_x3_8x8 183 34 intra_sad_x3_8x8c 181 36 intra_sad_x3_16x16 643 68 intra_satd_x3_4x4 83 61 intra_satd_x3_8x8c 344 81 intra_satd_x3_16x16 1389 136 sa8d_8x8 97 19 sa8d_16x16 394 68 satd_4x4 24 8 satd_4x8 51 11 satd_4x16 103 24 satd_8x4 52 9 satd_8x8 108 12 satd_8x16 218 24 satd_16x8 218 19 satd_16x16 437 38 ssd_4x4 10 5 ssd_4x8 24 8 ssd_4x16 42 15 ssd_8x4 23 5 ssd_8x8 37 9 ssd_8x16 74 17 ssd_16x8 72 11 ssd_16x16 140 23 var2_8x8 91 37 var2_8x16 176 66 var_8x8 50 15 var_8x16 65 29 var_16x16 132 56 Signed-off-by:
gxw <guxiwei-hf@loongson.cn>
-
Performance has improved from 10.53fps to 11.27fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) add4x4_idct 34 9 add8x8_idct 139 31 add8x8_idct8 269 39 add8x8_idct_dc 67 7 add16x16_idct 564 123 add16x16_idct_dc 260 22 dct4x4dc 18 10 idct4x4dc 16 9 sub4x4_dct 25 7 sub8x8_dct 101 12 sub8x8_dct8 160 25 sub16x16_dct 403 52 sub16x16_dct8 646 68 zigzag_scan_4x4_frame 4 1 Signed-off-by:
gxw <guxiwei-hf@loongson.cn>
-
Performance has improved from 6.78fps to 10.53fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) avg_4x2 16 5 avg_4x4 30 6 avg_4x8 63 10 avg_4x16 124 19 avg_8x4 60 6 avg_8x8 119 10 avg_8x16 233 19 avg_16x8 229 21 avg_16x16 451 41 get_ref_4x4 30 9 get_ref_4x8 52 11 get_ref_8x4 45 9 get_ref_8x8 80 11 get_ref_8x16 156 16 get_ref_12x10 137 13 get_ref_16x8 147 11 get_ref_16x16 282 16 get_ref_20x18 278 22 hpel_filter 5163 686 lowres_init 5440 286 mc_chroma_2x2 24 7 mc_chroma_2x4 42 10 mc_chroma_4x2 41 7 mc_chroma_4x4 75 10 mc_chroma_4x8 144 19 mc_chroma_8x4 137 15 mc_chroma_8x8 269 28 mc_luma_4x4 30 10 mc_luma_4x8 52 12 mc_luma_8x4 44 10 mc_luma_8x8 80 13 mc_luma_8x16 156 19 mc_luma_16x8 147 13 mc_luma_16x16 281 19 memcpy_aligned 14 9 memzero_aligned 24 4 offsetadd_w4 79 18 offsetadd_w8 142 18 offsetadd_w16 277 25 offsetadd_w20 1118 38 offsetsub_w4 75 18 offsetsub_w8 140 18 offsetsub_w16 265 25 offsetsub_w20 989 39 weight_w4 111 19 weight_w8 205 19 weight_w16 396 29 weight_w20 1143 45 deinterleave_chroma_fdec 76 9 deinterleave_chroma_fenc 86 9 plane_copy_deinterleave 733 90 plane_copy_interleave 791 245 store_interleave_chroma 82 12 Signed-off-by:
yuanhecai <yuanhecai@loongson.cn>
-
Performance has improved from 6.34fps to 6.78fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) coeff_last15 3 2 coeff_last16 3 1 coeff_last64 42 6 decimate_score15 8 12 decimate_score16 8 11 decimate_score64 61 43 dequant_4x4_cqm 16 5 dequant_4x4_dc_cqm 13 5 dequant_4x4_dc_flat 13 5 dequant_4x4_flat 16 5 dequant_8x8_cqm 71 9 dequant_8x8_flat 71 9 Signed-off-by:
yuanhecai <yuanhecai@loongson.cn>
-
Performance has improved from 6.32fps to 6.34fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) intra_predict_4x4_dc 3 2 intra_predict_4x4_dc8 1 1 intra_predict_4x4_dcl 2 1 intra_predict_4x4_dct 2 1 intra_predict_4x4_ddl 7 2 intra_predict_4x4_h 2 1 intra_predict_4x4_v 1 1 intra_predict_8x8_dc 8 2 intra_predict_8x8_dc8 1 1 intra_predict_8x8_dcl 5 2 intra_predict_8x8_dct 5 2 intra_predict_8x8_ddl 27 3 intra_predict_8x8_ddr 26 3 intra_predict_8x8_h 4 2 intra_predict_8x8_v 3 1 intra_predict_8x8_vl 29 3 intra_predict_8x8_vr 31 4 intra_predict_8x8c_dc 8 5 intra_predict_8x8c_dc8 1 1 intra_predict_8x8c_dcl 5 3 intra_predict_8x8c_dct 5 3 intra_predict_8x8c_h 4 2 intra_predict_8x8c_p 58 30 intra_predict_8x8c_v 4 1 intra_predict_16x16_dc 32 8 intra_predict_16x16_dc8 9 4 intra_predict_16x16_dcl 26 6 intra_predict_16x16_dct 26 6 intra_predict_16x16_h 23 7 intra_predict_16x16_p 182 44 intra_predict_16x16_v 22 4 Signed-off-by:
wanglu <wanglu@loongson.cn>
-
Performance has improved from 4.92fps to 6.32fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) sad_4x4 13 3 sad_4x8 26 7 sad_4x16 57 13 sad_8x4 24 3 sad_8x8 54 8 sad_8x16 108 13 sad_16x8 95 8 sad_16x16 189 13 sad_x3_4x4 37 6 sad_x3_4x8 71 13 sad_x3_8x4 70 8 sad_x3_8x8 162 14 sad_x3_8x16 323 25 sad_x3_16x8 279 15 sad_x3_16x16 555 27 sad_x4_4x4 49 8 sad_x4_4x8 95 17 sad_x4_8x4 94 8 sad_x4_8x8 214 16 sad_x4_8x16 429 33 sad_x4_16x8 372 18 sad_x4_16x16 740 34 Signed-off-by:
wanglu <wanglu@loongson.cn>
-
Performance has improved from 4.76fps to 4.92fps by using the following command: ./configure && make -j5 ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv functions performance performance (c) (asm) deblock_luma[0] 79 39 deblock_luma[1] 91 18 deblock_luma_intra[0] 63 44 deblock_luma_intra[1] 71 18 deblock_strength 104 33 Signed-off-by:
gxw <guxiwei-hf@loongson.cn>
-
Signed-off-by:
gxw <guxiwei-hf@loongson.cn>
-
Signed-off-by:
gxw <guxiwei-hf@loongson.cn>
-
LSX/LASX is the LOONGARCH 128-bit/256-bit SIMD Architecture. Signed-off-by:
gxw <guxiwei-hf@loongson.cn>
-
- Dec 17, 2022
-
-
Roger Hardiman authored
-
- Oct 28, 2022
-
-
Hubert Mazur authored
Provide routines for sad functions for high bit depth, i.e. 10 bits. Benchmarks run on AWS Gravtion 2 instances. sad_4x4_c: 583 sad_4x4_neon: 273 sad_4x8_c: 1179 sad_4x8_neon: 366 sad_4x16_c: 2121 sad_4x16_neon: 550 sad_8x4_c: 924 sad_8x4_neon: 213 sad_8x8_c: 1711 sad_8x8_neon: 316 sad_8x16_c: 3505 sad_8x16_neon: 497 sad_16x8_c: 3070 sad_16x8_neon: 635 sad_16x16_c: 6113 sad_16x16_neon: 1118 Signed-off-by:
Hubert Mazur <hum@semihalf.com> Signed-off-by:
Grzegorz Bernacki <gjb@semihalf.com>
-
- Oct 05, 2022
-
-
Anton Mitrofanov authored
-
- Oct 01, 2022
-
-
Henrik Gramner authored
On most systems any whitespace is fine, but MSYS2 wants ASCII 0x20.
-
- Sep 19, 2022
-
-
Sergei Trofimovich authored
Without the change parallel build occasionally fails as: $ make --shuffle ... gcc ... -c common/opencl.c -o common/opencl-8.o ... common/opencl.c:116:10: fatal error: common/oclobj.h: No such file or directory 116 | #include "common/oclobj.h" | ^~~~~~~~~~~~~~~~~ Best reproducible with `make --shuffle` mode: https://savannah.gnu.org/bugs/index.php?62100 This happens because `common/oclobj.h` is an autogenerated file. Normally `.depend` would contain this autogenerated dependency. But nothing forces `common/oclobj.h` to be generated. The change moves dependency of $(GENERATED) from final binaries to `.depend` itself: .depend: $(GENERATED)
-
- Sep 05, 2022
-
-
- Sep 01, 2022
-
-
Anton Mitrofanov authored
Use pkg-config from the custom PATH.
-
- Aug 31, 2022
-
-
Anton Mitrofanov authored
-
- Jun 01, 2022
-
-
Anton Mitrofanov authored
Use perl for in-place editing because sed doesn't work with symlinks.
-
- Feb 22, 2022
-
-
-
Anton Mitrofanov authored
-
- Feb 21, 2022
-
-
-
-
-
Henrik Gramner authored
When operating on large blocks of data it's common to repeatedly use an instruction on multiple registers. Using the REPX macro makes it easy to quickly write dense code to achieve this without having to explicitly duplicate the same instruction over and over. For example, REPX {paddw x, m4}, m0, m1, m2, m3 REPX {mova [r0+16*x], m5}, 0, 1, 2, 3 will expand to paddw m0, m4 paddw m1, m4 paddw m2, m4 paddw m3, m4 mova [r0+16*0], m5 mova [r0+16*1], m5 mova [r0+16*2], m5 mova [r0+16*3], m5
-
Henrik Gramner authored
Correctly handle emulation of 4-operand instructions (e.g. 'shufps') where src1 is a memory operand.
-
- Feb 19, 2022
-
-
Henrik Gramner authored
With legacy encoding the last operand (the index) must be xmm0, but aside from that emulating non-destructive forms works the same as any other instruction.
-
- Feb 05, 2022
-
-
Anton Mitrofanov authored
-
- Jan 26, 2022
-
-
Anton Mitrofanov authored
-
-
-
- Jan 24, 2022
-
-
Anton Mitrofanov authored
-
- Dec 30, 2021
-
-
Building a shared library without -fPIC does not make sense. On most architectures, especially recent ones, doing so will give link-time errors due to relocations in read-only sections like .text. On some legacy architectures, including i386, it is allowed by default, but will warn, and is highly discouraged due to the overheads it adds at library load time. Most architectures were already listed here as having shared imply PIC, but not all, such as i386 which ends up with unwanted text relocations, as well as architectures not known to the build system currently like RISC-V, which does not permit text relocations by default. There is no good reason to want shared without PIC on any architecture, so just remove the architecture list.
-
- Dec 12, 2021
-
-
Henrik Gramner authored
Back in 2009 when this was added it improved scheduling of lookahead threads on prevalent operating systems at the time. According to more recent testing by Intel however, lowering thread priorities does not improve performance on modern operating systems. And more importantly, doing so on systems with heterogeneous CPU topologies may actually result in a severe performance reduction. Removing this code altogether eliminates the issue with performance degradation on such systems, while having no noticeable impact on regular systems with homogeneous CPU topologies.
-
- Dec 07, 2021
-
-
Claes Nästén authored
/usr/ucb/bin/install on Solaris does not support creating multiple directories in one go, issue multiple install commands instead.
-
Anton Mitrofanov authored
-
-
- Dec 06, 2021
-
-
Anton Mitrofanov authored
-
- Sep 29, 2021
-
-
The lookahead_thread main loop checks b_exit_thread and exits if it is set. That flag is set by x264_lookahead_delete, which uses ifbuf.mutex to guard accessing it. However, the read in the while-loop condition of lookahead_thread is not guarded, and so TSAN sometimes reports a data race.
-
This fixes rerunning checkasm with an earlier printed seed, when it's outside of the signed range.
-