Split each lookahead frame analysis call into multiple threads. Has a small impact on quality, but does not seem to be consistently any worse. This helps alleviate bottlenecks with many cores and frame threads. In many case, this massively increases performance on many-core systems. For example, over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system. Realtime 1080p30 at --preset slow should now be feasible on real systems. For sliced-threads, this patch should be faster regardless of settings (~10%). By default, lookahead threads are 1/6 of regular threads. This isn't exacting, but it seems to work well for all presets on real systems. With sliced-threads, it's the same as the number of encoding threads.