For arm/arm64, there's no need to align any buffer to 32 bytes
as the assembly doesn't need it and doesn't benefit from it.
This would be much more elegant if defined like this:
#define MAX_ALIGN 16
#define ALIGN(align) __attribute__((aligned(MIN(align, MAX_ALIGN))))
This works for GCC and Clang, but the MSVC alignment __declspec
needs a literal alignment value, it can't handle an expression.