avoid memory loads that span the border between two cachelines.
on core2 this makes x264_pixel_sad an average of 2x faster. other intel cpus gain various amounts. amd are unaffected. overall speedup: 1-10%, depending on how much time is spent in fullpel motion estimation. git-svn-id: svn://svn.videolan.org/x264/trunk@696 df754926-b1dd-0310-bc7b-ec298dee348c
Showing with 798 additions and 172 deletions