x86: Branch before waiting on popcnt in ipred_z AVX2 functions

Some specific Haswell CPU:s have a hardware bug where the popcnt instruction doesn't set zero flag correctly, which causes the wrong branch to be taken.

popcnt also has a 3-cycle latency on Intel CPU:s, so doing the branch on the input value instead of the output reduces the amount of time wasted going down the wrong code path in case of branch mispredictions.

