1. 21 Feb, 2022 2 commits
    • Henrik Gramner's avatar
      x86inc: Add REPX macro to repeat instructions/operations · 6d10612a
      Henrik Gramner authored
      When operating on large blocks of data it's common to repeatedly use
      an instruction on multiple registers. Using the REPX macro makes it
      easy to quickly write dense code to achieve this without having to
      explicitly duplicate the same instruction over and over.
      
      For example,
      
          REPX {paddw x, m4}, m0, m1, m2, m3
          REPX {mova [r0+16*x], m5}, 0, 1, 2, 3
      
      will expand to
      
          paddw       m0, m4
          paddw       m1, m4
          paddw       m2, m4
          paddw       m3, m4
          mova [r0+16*0], m5
          mova [r0+16*1], m5
          mova [r0+16*2], m5
          mova [r0+16*3], m5
      6d10612a
    • Henrik Gramner's avatar
      x86inc: Fix edge case in forced VEX-encoding · f52e5e11
      Henrik Gramner authored
      Correctly handle emulation of 4-operand instructions (e.g. 'shufps')
      where src1 is a memory operand.
      f52e5e11
  2. 19 Feb, 2022 1 commit
  3. 24 Jan, 2022 1 commit
  4. 25 Aug, 2021 1 commit
  5. 14 Jun, 2021 1 commit
    • Henrik Gramner's avatar
      x86inc: Support memory operands in src1 in 3-operand instructions · e73fc230
      Henrik Gramner authored
      Particularly in code that makes heavy use of macros it's possible
      to end up with 3-operand instructions with a memory operand in src1.
      In the case of SSE this works fine due to automatic move insertions,
      but in AVX that fails since memory operands are only allowed in src2.
      
      The main purpose of this feature is to minimize the amount of code
      changes required to facilitate conversion of existing SSE code to AVX.
      e73fc230
  6. 11 Feb, 2021 1 commit
    • Henrik Gramner's avatar
      x86inc: Add stack probing on Windows · b86ae3c6
      Henrik Gramner authored and Henrik Gramner's avatar Henrik Gramner committed
      Large stack allocations on Windows need to use stack probing in order
      to guarantee that all stack memory is committed before accessing it.
      This is done by ensuring that the guard page(s) at the end of the
      currently committed pages are touched prior to any pages beyond that.
      b86ae3c6
  7. 27 Jan, 2021 1 commit
  8. 24 Jan, 2021 1 commit
  9. 25 Oct, 2020 1 commit
  10. 02 Jul, 2020 1 commit
  11. 10 Jun, 2020 1 commit
  12. 29 Feb, 2020 1 commit
  13. 06 Mar, 2019 7 commits
  14. 06 Aug, 2018 3 commits
    • Henrik Gramner's avatar
      x86inc: Improve SAVE/LOAD_MM_PERMUTATION macros · 28e48798
      Henrik Gramner authored
      Use register numbers instead of copying the full register names. This makes it
      possible to change register widths in the middle of a function and keep the
      mmreg permutations intact which can be useful for code that only needs larger
      vectors for parts of the function in combination with macros etc.
      
      Also change the LOAD_MM_PERMUTATION macro to use the same default name as the
      SAVE macro. This simplifies swapping from ymm to xmm registers or vice versa:
      
          SAVE_MM_PERMUTATION
          INIT_XMM <cpuflags>
          LOAD_MM_PERMUTATION
      28e48798
    • Henrik Gramner's avatar
      x86inc: Optimize VEX instruction encoding · 8badb910
      Henrik Gramner authored
      Most VEX-encoded instructions require an additional byte to encode when src2
      is a high register (e.g. x|ymm8..15). If the instruction is commutative we
      can swap src1 and src2 when doing so reduces the instruction length, e.g.
      
          vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8, xmm0
      8badb910
    • Henrik Gramner's avatar
      x86inc: Fix VEX -> EVEX instruction conversion · 0a84d986
      Henrik Gramner authored
      There's an edge case that wasn't properly handled.
      0a84d986
  15. 17 Jan, 2018 2 commits
  16. 24 Dec, 2017 4 commits
  17. 24 Jun, 2017 1 commit
  18. 21 May, 2017 4 commits
    • Henrik Gramner's avatar
      x86: AVX-512 support · 472ce364
      Henrik Gramner authored
      AVX-512 consists of a plethora of different extensions, but in order to keep
      things a bit more manageable we group together the following extensions
      under a single baseline cpu flag which should cover SKL-X and future CPUs:
       * AVX-512 Foundation (F)
       * AVX-512 Conflict Detection Instructions (CD)
       * AVX-512 Byte and Word Instructions (BW)
       * AVX-512 Doubleword and Quadword Instructions (DQ)
       * AVX-512 Vector Length Extensions (VL)
      
      On x86-64 AVX-512 provides 16 additional vector registers, prefer using
      those over existing ones since it allows us to avoid using `vzeroupper`
      unless more than 16 vector registers are required. They also happen to
      be volatile on Windows which means that we don't need to save and restore
      existing xmm register contents unless more than 22 vector registers are
      required.
      
      Also take the opportunity to drop X264_CPU_CMOV and X264_CPU_SLOW_CTZ while
      we're breaking API by messing with the cpu flags since they weren't really
      used for anything.
      
      Big thanks to Intel for their support.
      472ce364
    • Henrik Gramner's avatar
      x86: Change assembler from yasm to nasm · d2b5f487
      Henrik Gramner authored
      This is required to support AVX-512.
      
      Drop `-Worphan-labels` from ASFLAGS since it's enabled by default in nasm.
      
      Also change alignmode from `k8` to `p6` since it's more similar to `amdnop`
      in yasm, e.g. use long nops without excessive prefixes.
      d2b5f487
    • Henrik Gramner's avatar
      x86: Add some additional cpuflag relations · 8c297425
      Henrik Gramner authored
      Simplifies writing assembly code that depends on available instructions.
      
      LZCNT implies SSE2
      BMI1 implies AVX+LZCNT
      AVX2 implies BMI2
      
      Skip printing LZCNT under CPU capabilities when BMI1 or BMI2 is available,
      and don't print FMA4 when FMA3 is available.
      8c297425
    • Anton Mitrofanov's avatar
      x86inc: Remove argument from WIN64_RESTORE_XMM · 3538df12
      Anton Mitrofanov authored
      The use of rsp was pretty much hardcoded there and probably didn't work
      otherwise with stack_size > 0.
      3538df12
  19. 19 May, 2017 3 commits
  20. 21 Jan, 2017 1 commit
  21. 01 Dec, 2016 1 commit
    • Henrik Gramner's avatar
      x86inc: Avoid using eax/rax for storing the stack pointer · 0706ddb1
      Henrik Gramner authored
      When allocating stack space with an alignment requirement that is larger
      than the current stack alignment we need to store a copy of the original
      stack pointer in order to be able to restore it later.
      
      If we chose to use another register for this purpose we should not pick
      eax/rax since it can be overwritten as a return value.
      0706ddb1
  22. 12 Apr, 2016 1 commit