1. 06 Aug, 2018 3 commits
    • Henrik Gramner's avatar
      x86inc: Improve SAVE/LOAD_MM_PERMUTATION macros · 28e48798
      Henrik Gramner authored
      Use register numbers instead of copying the full register names. This makes it
      possible to change register widths in the middle of a function and keep the
      mmreg permutations intact which can be useful for code that only needs larger
      vectors for parts of the function in combination with macros etc.
      Also change the LOAD_MM_PERMUTATION macro to use the same default name as the
      SAVE macro. This simplifies swapping from ymm to xmm registers or vice versa:
          INIT_XMM <cpuflags>
    • Henrik Gramner's avatar
      x86inc: Optimize VEX instruction encoding · 8badb910
      Henrik Gramner authored
      Most VEX-encoded instructions require an additional byte to encode when src2
      is a high register (e.g. x|ymm8..15). If the instruction is commutative we
      can swap src1 and src2 when doing so reduces the instruction length, e.g.
          vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8, xmm0
    • Henrik Gramner's avatar
      x86inc: Fix VEX -> EVEX instruction conversion · 0a84d986
      Henrik Gramner authored
      There's an edge case that wasn't properly handled.
  2. 17 Jan, 2018 2 commits
  3. 24 Dec, 2017 4 commits
  4. 24 Jun, 2017 1 commit
  5. 21 May, 2017 4 commits
    • Henrik Gramner's avatar
      x86: AVX-512 support · 472ce364
      Henrik Gramner authored
      AVX-512 consists of a plethora of different extensions, but in order to keep
      things a bit more manageable we group together the following extensions
      under a single baseline cpu flag which should cover SKL-X and future CPUs:
       * AVX-512 Foundation (F)
       * AVX-512 Conflict Detection Instructions (CD)
       * AVX-512 Byte and Word Instructions (BW)
       * AVX-512 Doubleword and Quadword Instructions (DQ)
       * AVX-512 Vector Length Extensions (VL)
      On x86-64 AVX-512 provides 16 additional vector registers, prefer using
      those over existing ones since it allows us to avoid using `vzeroupper`
      unless more than 16 vector registers are required. They also happen to
      be volatile on Windows which means that we don't need to save and restore
      existing xmm register contents unless more than 22 vector registers are
      Also take the opportunity to drop X264_CPU_CMOV and X264_CPU_SLOW_CTZ while
      we're breaking API by messing with the cpu flags since they weren't really
      used for anything.
      Big thanks to Intel for their support.
    • Henrik Gramner's avatar
      x86: Change assembler from yasm to nasm · d2b5f487
      Henrik Gramner authored
      This is required to support AVX-512.
      Drop `-Worphan-labels` from ASFLAGS since it's enabled by default in nasm.
      Also change alignmode from `k8` to `p6` since it's more similar to `amdnop`
      in yasm, e.g. use long nops without excessive prefixes.
    • Henrik Gramner's avatar
      x86: Add some additional cpuflag relations · 8c297425
      Henrik Gramner authored
      Simplifies writing assembly code that depends on available instructions.
      LZCNT implies SSE2
      BMI1 implies AVX+LZCNT
      AVX2 implies BMI2
      Skip printing LZCNT under CPU capabilities when BMI1 or BMI2 is available,
      and don't print FMA4 when FMA3 is available.
    • Anton Mitrofanov's avatar
      x86inc: Remove argument from WIN64_RESTORE_XMM · 3538df12
      Anton Mitrofanov authored
      The use of rsp was pretty much hardcoded there and probably didn't work
      otherwise with stack_size > 0.
  6. 19 May, 2017 3 commits
  7. 21 Jan, 2017 1 commit
  8. 01 Dec, 2016 1 commit
    • Henrik Gramner's avatar
      x86inc: Avoid using eax/rax for storing the stack pointer · 0706ddb1
      Henrik Gramner authored
      When allocating stack space with an alignment requirement that is larger
      than the current stack alignment we need to store a copy of the original
      stack pointer in order to be able to restore it later.
      If we chose to use another register for this purpose we should not pick
      eax/rax since it can be overwritten as a return value.
  9. 12 Apr, 2016 4 commits
  10. 16 Jan, 2016 7 commits
    • Henrik Gramner's avatar
      Bump dates to 2016 · d23d1865
      Henrik Gramner authored
    • Geza Lore's avatar
      x86inc: Add debug symbols indicating sizes of compiled functions · 366fa858
      Geza Lore authored
      Some debuggers/profilers use this metadata to determine which function a
      given instruction is in; without it they get can confused by local labels
      (if you haven't stripped those). On the other hand, some tools are still
      confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
      Currently only implemented for ELF.
    • Henrik Gramner's avatar
      x86inc: Avoid creating unnecessary local labels · 70c3ba42
      Henrik Gramner authored
      The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
      up the symbol table and confuse debugging/profiling tools, so use EQU to
      create SHN_ABS symbols instead of creating local labels. Furthermore, skip
      the workaround completely in functions that definitely won't run on such cpus.
      This patch doesn't modify any emitted instructions, and doesn't actually affect
      x264 at all. It's only for other projects that use x86inc.asm without an
      appropriate `strip` command in their buildsystem.
      Note that EQU is just creating a local label when using nasm instead of yasm.
      This is probably a bug, but at least it doesn't break anything.
    • Henrik Gramner's avatar
      x86inc: Simplify AUTO_REP_RET · 5c3d473a
      Henrik Gramner authored
      cpuflags is never undefined any more, it's set to 0 instead.
      Also fix an incorrect comment.
    • Henrik Gramner's avatar
      x86inc: Use more consistent indentation · 28d68f09
      Henrik Gramner authored
    • Henrik Gramner's avatar
      x86inc: Preserve arguments when allocating stack space · 963b99ef
      Henrik Gramner authored
      When allocating stack space with a larger alignment than the known stack
      alignment a temporary register is used for storing the stack pointer.
      Ensure that this isn't one of the registers used for passing arguments.
    • Henrik Gramner's avatar
      x86inc: Improve FMA instruction handling · 6e503341
      Henrik Gramner authored
       * Correctly handle FMA instructions with memory operands.
       * Print a warning if FMA instructions are used without the correct cpuflag.
       * Simplify the instantiation code.
       * Clarify documentation.
      Only the last operand in FMA3 instructions can be a memory operand. When
      converting FMA4 instructions to FMA3 instructions we can utilize the fact
      that multiply is a commutative operation and reorder operands if necessary
      to ensure that a memory operand is used only as the last operand.
  11. 03 Jan, 2016 2 commits
  12. 18 Aug, 2015 1 commit
  13. 25 Jul, 2015 4 commits
    • Henrik Gramner's avatar
      x86: Experimental nasm support · b568a256
      Henrik Gramner authored
      Enables the use of nasm as an alternative to yasm.
      Note that nasm cannot assemble x264 with PIC enabled since it currently doesn't
      support [symbol-$$] addressing which is used extensively by x264's PIC code.
      This includes all 64-bit Windows and 64-bit OS X builds, even non-shared.
      For the above reason nasm is currently intentionally not auto-detected, instead
      the assembler must be explicitly specified using "AS=nasm ./configure".
      Also drop -O2 from ASFLAGS since it's simply ignored anyway.
    • Timothy Gu's avatar
      x86inc: Prevent warnings when using `struc` and `endstruc` · d14e38c0
      Timothy Gu authored
      struc and endstruc attempts to revert to the previous section state set by
      the SECTION macro.
      Use the primitive [SECTION] directive instead of the SECTION macro for the
      .note.GNU-stack section to prevent it from being emitted again during endstruc.
    • Henrik Gramner's avatar
      x86inc: Drop SECTION_TEXT macro · 353b1f88
      Henrik Gramner authored
      The .text section is already 16-byte aligned by default on all supported
      platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
    • Henrik Gramner's avatar
      x86inc: Disable vpbroadcastq workaround in newer yasm versions · b615f82e
      Henrik Gramner authored
      The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
  14. 23 Feb, 2015 3 commits