New dynamic peak detection (v4, naive blocking)
After much ado about nothing, and significant amounts of testing, I've come to the conclusion that simply hard-waiting for the GPU at the appropriate place is both simpler and (in general) faster, or at least not significantly slower.