Skip to content
Snippets Groups Projects
  1. Jan 03, 2023
    • guxiwei's avatar
      loongarch: Improve the performance of pixel series functions · 3bfa1c2f
      guxiwei authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 11.27fps to 20.50fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      hadamard_ac_8x8          117             21
      hadamard_ac_8x16         236             42
      hadamard_ac_16x8         235             31
      hadamard_ac_16x16        473             60
      intra_sad_x3_4x4         50              21
      intra_sad_x3_8x8         183             34
      intra_sad_x3_8x8c        181             36
      intra_sad_x3_16x16       643             68
      intra_satd_x3_4x4        83              61
      intra_satd_x3_8x8c       344             81
      intra_satd_x3_16x16      1389            136
      sa8d_8x8                 97              19
      sa8d_16x16               394             68
      satd_4x4                 24              8
      satd_4x8                 51              11
      satd_4x16                103             24
      satd_8x4                 52              9
      satd_8x8                 108             12
      satd_8x16                218             24
      satd_16x8                218             19
      satd_16x16               437             38
      ssd_4x4                  10              5
      ssd_4x8                  24              8
      ssd_4x16                 42              15
      ssd_8x4                  23              5
      ssd_8x8                  37              9
      ssd_8x16                 74              17
      ssd_16x8                 72              11
      ssd_16x16                140             23
      var2_8x8                 91              37
      var2_8x16                176             66
      var_8x8                  50              15
      var_8x16                 65              29
      var_16x16                132             56
      
      Signed-off-by: default avatargxw <guxiwei-hf@loongson.cn>
      3bfa1c2f
    • guxiwei's avatar
      loongarch: Improve the performance of dct series functions · a064e87b
      guxiwei authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 10.53fps to 11.27fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      add4x4_idct              34              9
      add8x8_idct              139             31
      add8x8_idct8             269             39
      add8x8_idct_dc           67              7
      add16x16_idct            564             123
      add16x16_idct_dc         260             22
      dct4x4dc                 18              10
      idct4x4dc                16              9
      sub4x4_dct               25              7
      sub8x8_dct               101             12
      sub8x8_dct8              160             25
      sub16x16_dct             403             52
      sub16x16_dct8            646             68
      zigzag_scan_4x4_frame    4               1
      
      Signed-off-by: default avatargxw <guxiwei-hf@loongson.cn>
      a064e87b
    • Hecai Yuan's avatar
      loongarch: Improve the performance of mc series functions · 46e520a3
      Hecai Yuan authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 6.78fps to 10.53fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      avg_4x2                  16              5
      avg_4x4                  30              6
      avg_4x8                  63              10
      avg_4x16                 124             19
      avg_8x4                  60              6
      avg_8x8                  119             10
      avg_8x16                 233             19
      avg_16x8                 229             21
      avg_16x16                451             41
      get_ref_4x4              30              9
      get_ref_4x8              52              11
      get_ref_8x4              45              9
      get_ref_8x8              80              11
      get_ref_8x16             156             16
      get_ref_12x10            137             13
      get_ref_16x8             147             11
      get_ref_16x16            282             16
      get_ref_20x18            278             22
      hpel_filter              5163            686
      lowres_init              5440            286
      mc_chroma_2x2            24              7
      mc_chroma_2x4            42              10
      mc_chroma_4x2            41              7
      mc_chroma_4x4            75              10
      mc_chroma_4x8            144             19
      mc_chroma_8x4            137             15
      mc_chroma_8x8            269             28
      mc_luma_4x4              30              10
      mc_luma_4x8              52              12
      mc_luma_8x4              44              10
      mc_luma_8x8              80              13
      mc_luma_8x16             156             19
      mc_luma_16x8             147             13
      mc_luma_16x16            281             19
      memcpy_aligned           14              9
      memzero_aligned          24              4
      offsetadd_w4             79              18
      offsetadd_w8             142             18
      offsetadd_w16            277             25
      offsetadd_w20            1118            38
      offsetsub_w4             75              18
      offsetsub_w8             140             18
      offsetsub_w16            265             25
      offsetsub_w20            989             39
      weight_w4                111             19
      weight_w8                205             19
      weight_w16               396             29
      weight_w20               1143            45
      deinterleave_chroma_fdec 76              9
      deinterleave_chroma_fenc 86              9
      plane_copy_deinterleave  733             90
      plane_copy_interleave    791             245
      store_interleave_chroma  82              12
      
      Signed-off-by: default avataryuanhecai <yuanhecai@loongson.cn>
      46e520a3
    • Hecai Yuan's avatar
      loongarch: Improve the performance of quant series functions · cfea0ec1
      Hecai Yuan authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 6.34fps to 6.78fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      coeff_last15             3               2
      coeff_last16             3               1
      coeff_last64             42              6
      decimate_score15         8               12
      decimate_score16         8               11
      decimate_score64         61              43
      dequant_4x4_cqm          16              5
      dequant_4x4_dc_cqm       13              5
      dequant_4x4_dc_flat      13              5
      dequant_4x4_flat         16              5
      dequant_8x8_cqm          71              9
      dequant_8x8_flat         71              9
      
      Signed-off-by: default avataryuanhecai <yuanhecai@loongson.cn>
      cfea0ec1
    • Lu Wang's avatar
      loongarch: Improve the performance of predict series functions · 72a635d6
      Lu Wang authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 6.32fps to 6.34fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      intra_predict_4x4_dc     3               2
      intra_predict_4x4_dc8    1               1
      intra_predict_4x4_dcl    2               1
      intra_predict_4x4_dct    2               1
      intra_predict_4x4_ddl    7               2
      intra_predict_4x4_h      2               1
      intra_predict_4x4_v      1               1
      intra_predict_8x8_dc     8               2
      intra_predict_8x8_dc8    1               1
      intra_predict_8x8_dcl    5               2
      intra_predict_8x8_dct    5               2
      intra_predict_8x8_ddl    27              3
      intra_predict_8x8_ddr    26              3
      intra_predict_8x8_h      4               2
      intra_predict_8x8_v      3               1
      intra_predict_8x8_vl     29              3
      intra_predict_8x8_vr     31              4
      intra_predict_8x8c_dc    8               5
      intra_predict_8x8c_dc8   1               1
      intra_predict_8x8c_dcl   5               3
      intra_predict_8x8c_dct   5               3
      intra_predict_8x8c_h     4               2
      intra_predict_8x8c_p     58              30
      intra_predict_8x8c_v     4               1
      intra_predict_16x16_dc   32              8
      intra_predict_16x16_dc8  9               4
      intra_predict_16x16_dcl  26              6
      intra_predict_16x16_dct  26              6
      intra_predict_16x16_h    23              7
      intra_predict_16x16_p    182             44
      intra_predict_16x16_v    22              4
      
      Signed-off-by: default avatarwanglu <wanglu@loongson.cn>
      72a635d6
    • Lu Wang's avatar
      loongarch: Improve the performance of sad/sad_x3/sad_x4 series functions · 8018af72
      Lu Wang authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 4.92fps to 6.32fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      sad_4x4                 13               3
      sad_4x8                 26               7
      sad_4x16                57               13
      sad_8x4                 24               3
      sad_8x8                 54               8
      sad_8x16                108              13
      sad_16x8                95               8
      sad_16x16               189              13
      sad_x3_4x4              37               6
      sad_x3_4x8              71               13
      sad_x3_8x4              70               8
      sad_x3_8x8              162              14
      sad_x3_8x16             323              25
      sad_x3_16x8             279              15
      sad_x3_16x16            555              27
      sad_x4_4x4              49               8
      sad_x4_4x8              95               17
      sad_x4_8x4              94               8
      sad_x4_8x8              214              16
      sad_x4_8x16             429              33
      sad_x4_16x8             372              18
      sad_x4_16x16            740              34
      
      Signed-off-by: default avatarwanglu <wanglu@loongson.cn>
      8018af72
    • guxiwei's avatar
      loongarch: Improve the performance of deblock series functions · b5110b16
      guxiwei authored and Lu Wang's avatar Lu Wang committed
      
      Performance has improved from 4.76fps to 4.92fps by using the
      following command:
      ./configure && make -j5
      ./x264 --threads 4 -o out.mkv yuv_1920x1080.yuv
      
      functions           performance     performance
                              (c)            (asm)
      deblock_luma[0]         79               39
      deblock_luma[1]         91               18
      deblock_luma_intra[0]   63               44
      deblock_luma_intra[1]   71               18
      deblock_strength        104              33
      
      Signed-off-by: default avatargxw <guxiwei-hf@loongson.cn>
      b5110b16
    • guxiwei's avatar
      loongarch: Add asm.S file · 596eaa09
      guxiwei authored and Lu Wang's avatar Lu Wang committed
      
      Signed-off-by: default avatargxw <guxiwei-hf@loongson.cn>
      596eaa09
    • guxiwei's avatar
      loongarch: Add checkasm support · 3dfcf99b
      guxiwei authored and Lu Wang's avatar Lu Wang committed
      
      Signed-off-by: default avatargxw <guxiwei-hf@loongson.cn>
      3dfcf99b
    • guxiwei's avatar
      loongarch: Initial LSX/LASX support · 78c79319
      guxiwei authored and Lu Wang's avatar Lu Wang committed
      
      LSX/LASX is the LOONGARCH 128-bit/256-bit SIMD Architecture.
      
      Signed-off-by: default avatargxw <guxiwei-hf@loongson.cn>
      78c79319
  2. Dec 17, 2022
  3. Oct 28, 2022
    • Hubert Mazur's avatar
      aarch64: pixel: add 10bits sad functions · 416e3eb2
      Hubert Mazur authored
      
      Provide routines for sad functions for high bit depth, i.e. 10 bits.
      Benchmarks run on AWS Gravtion 2 instances.
      
      sad_4x4_c: 583
      sad_4x4_neon: 273
      sad_4x8_c: 1179
      sad_4x8_neon: 366
      sad_4x16_c: 2121
      sad_4x16_neon: 550
      sad_8x4_c: 924
      sad_8x4_neon: 213
      sad_8x8_c: 1711
      sad_8x8_neon: 316
      sad_8x16_c: 3505
      sad_8x16_neon: 497
      sad_16x8_c: 3070
      sad_16x8_neon: 635
      sad_16x16_c: 6113
      sad_16x16_neon: 1118
      
      Signed-off-by: default avatarHubert Mazur <hum@semihalf.com>
      Signed-off-by: default avatarGrzegorz Bernacki <gjb@semihalf.com>
      416e3eb2
  4. Oct 05, 2022
  5. Oct 01, 2022
  6. Sep 19, 2022
    • Sergei Trofimovich's avatar
      Makefile: Add missing dependency of '.depend' on 'oclobj.h' · e067ab0b
      Sergei Trofimovich authored
      Without the change parallel build occasionally fails as:
      
          $ make --shuffle
          ...
          gcc ... -c common/opencl.c -o common/opencl-8.o ...
          common/opencl.c:116:10: fatal error: common/oclobj.h: No such file or directory
            116 | #include "common/oclobj.h"
                |          ^~~~~~~~~~~~~~~~~
      
      Best reproducible with `make --shuffle` mode:
         https://savannah.gnu.org/bugs/index.php?62100
      
      This happens because `common/oclobj.h` is an autogenerated file.
      Normally `.depend` would contain this autogenerated dependency.
      But nothing forces `common/oclobj.h` to be generated.
      
      The change moves dependency of $(GENERATED) from final binaries
      to `.depend` itself:
      
          .depend: $(GENERATED)
      e067ab0b
  7. Sep 05, 2022
  8. Sep 01, 2022
  9. Aug 31, 2022
  10. Jun 01, 2022
  11. Feb 22, 2022
  12. Feb 21, 2022
  13. Feb 19, 2022
  14. Feb 05, 2022
  15. Jan 26, 2022
  16. Jan 24, 2022
  17. Dec 30, 2021
    • Jessica Clarke's avatar
      configure: Always make shared imply PIC · 19856cc4
      Jessica Clarke authored and Anton Mitrofanov's avatar Anton Mitrofanov committed
      Building a shared library without -fPIC does not make sense. On most
      architectures, especially recent ones, doing so will give link-time
      errors due to relocations in read-only sections like .text. On some
      legacy architectures, including i386, it is allowed by default, but will
      warn, and is highly discouraged due to the overheads it adds at library
      load time. Most architectures were already listed here as having shared
      imply PIC, but not all, such as i386 which ends up with unwanted text
      relocations, as well as architectures not known to the build system
      currently like RISC-V, which does not permit text relocations by
      default. There is no good reason to want shared without PIC on any
      architecture, so just remove the architecture list.
      19856cc4
  18. Dec 12, 2021
    • Henrik Gramner's avatar
      Remove thread priority tweaking · 8a43cc14
      Henrik Gramner authored
      Back in 2009 when this was added it improved scheduling of lookahead
      threads on prevalent operating systems at the time.
      
      According to more recent testing by Intel however, lowering thread
      priorities does not improve performance on modern operating systems.
      And more importantly, doing so on systems with heterogeneous CPU
      topologies may actually result in a severe performance reduction.
      
      Removing this code altogether eliminates the issue with performance
      degradation on such systems, while having no noticeable impact on
      regular systems with homogeneous CPU topologies.
      8a43cc14
  19. Dec 07, 2021
  20. Dec 06, 2021
  21. Sep 29, 2021
Loading