- 10 May, 2009 3 commits
-
-
Fiona Glaser authored
Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding. Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3. Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.
-
Fiona Glaser authored
-
Fiona Glaser authored
Move some macros to x86util.asm that should have been there to begin with. Fix a typo that didn't cause any issues.
-
- 21 Apr, 2009 2 commits
-
-
Guillaume Poirier authored
fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version)
-
Guillaume Poirier authored
-
- 18 Apr, 2009 2 commits
-
-
Fiona Glaser authored
This measures the total percentage of blocks, intra and inter, which have nonzero coefficients. "y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks. Note that skip blocks are included in this stat.
-
Fiona Glaser authored
I'm not entirely sure how this snuck its way out of holger's intra pred patch.
-
- 17 Apr, 2009 1 commit
-
-
Fiona Glaser authored
-
- 14 Apr, 2009 1 commit
-
-
Fiona Glaser authored
shufps is the most underrated SSE instruction on x86.
-
- 09 Apr, 2009 1 commit
-
-
Fiona Glaser authored
Move calculation of b_intra out of the core residual loop and hardcode it where applicable. Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size. Inlining only cache_mvd is faster and significantly smaller.
-
- 08 Apr, 2009 1 commit
-
-
Fiona Glaser authored
faster bs_write_te, port CABAC context selection optimization to CAVLC.
-
- 05 Apr, 2009 1 commit
-
-
Fiona Glaser authored
Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding, it's faster to use a branch than a cmov.
-
- 31 Mar, 2009 3 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
-
Fiona Glaser authored
-
- 30 Mar, 2009 3 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)
-
Fiona Glaser authored
range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3
-
- 27 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Add a second chroma threshold after the DC transform.
-
- 19 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Should slightly improve performance.
-
- 17 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Replace PHADD with FastShuffle (more accurate naming). This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.
-
- 10 Mar, 2009 1 commit
-
-
Fiona Glaser authored
palignr to avoid unaligned loads is worth it in inith, but not initv.
-
- 09 Mar, 2009 1 commit
-
-
Holger Lubitz authored
~10% faster hpel_filter on 64-bit Penryn. 32-bit version by Fiona Glaser.
-
- 08 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Optimized using the DEINTB method from r1122. ~32% faster var_16x16 on Conroe.
-
- 07 Mar, 2009 3 commits
-
-
Fiona Glaser authored
Optimized using the same method as in r1122. Patch partially by Holger. ~8% faster hpel filter on 64-bit Nehalem
-
Fiona Glaser authored
-
Holger Lubitz authored
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs. 16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit) Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD. Overall performance boost is up to ~15% on 64-bit Conroe.
-
- 06 Mar, 2009 1 commit
-
-
Fiona Glaser authored
-
- 04 Mar, 2009 4 commits
-
-
Fiona Glaser authored
Also add psy-trellis to fprofile
-
Fiona Glaser authored
Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case. Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included.
-
Fiona Glaser authored
Also remove an unused variable
-
Fiona Glaser authored
Add support for no-b-adapt + pre-scenecut (patch by BugMaster) Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways. Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1) Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2. Simplify pre-scenecut code.
-
- 03 Mar, 2009 1 commit
-
-
Guillaume Poirier authored
Note this this implementation is pretty naive and should be improved by implementing what's discussed in this ML thread: date: Mon, Feb 2, 2009 at 6:58 PM subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines
-
- 26 Feb, 2009 1 commit
-
-
Fiona Glaser authored
Deblocking was very slightly incorrect with partitions=all. Bug found by BugMaster.
-
- 16 Feb, 2009 1 commit
-
-
Fiona Glaser authored
r1105 introduced array overflow in cbp handling
-
- 14 Feb, 2009 1 commit
-
-
tal.aloni authored
-
- 11 Feb, 2009 2 commits
-
-
Anton Mitrofanov authored
A "make distclean" is probably required after updating to this revision.
-
Fiona Glaser authored
Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.
-
- 10 Feb, 2009 1 commit
-
-
Manuel Rommel authored
Original thread: date: Mon, Feb 9, 2009 at 9:37 PM subject: [x264-devel] commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )
-
- 09 Feb, 2009 1 commit
-
-
Guillaume Poirier authored
-