win32: make thread killed flag atomic
Setting the flag in APC had the benefit of not needing atomicity, but it meant the flag only got set at the next opportunity to run APC's. Especially vlc_testcancel() is not an alertable function, so it would typically be slower. If the thread did not go to alertable sleep, then vlc_testcancel() would not work at all. Since vlc_cancel() and vlc_testcancel() do not imply any memory barriers, the loads and stores can be relaxed. That removes most if not all of the overhead of the atomic operations.
Showing with 3 additions and 20 deletions