Use a large LUT for CAVLC zero-run bit codes
Helps the most with trellis and RD, but also helps with bitstream writing. Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8).
Showing with 58 additions and 20 deletions