Fix clang stack alignment issues
Clang emits aligned AVX stores for things like zeroing stack-allocated variables when using -mavx even with -fno-tree-vectorize set which can result in crashes if this occurs before we've realigned the stack. Previously we only ensured that the stack was realigned before calling assembly functions that accesses stack-allocated buffers but this is not sufficient. Fix the issue by changing the stack realignment to instead occur immediately in all CLI, API and thread entry points.
Showing with 144 additions and 72 deletions