- 18 Apr, 2009 1 commit
-
-
Fiona Glaser authored
I'm not entirely sure how this snuck its way out of holger's intra pred patch.
-
- 17 Apr, 2009 1 commit
-
-
Fiona Glaser authored
-
- 14 Apr, 2009 1 commit
-
-
Fiona Glaser authored
shufps is the most underrated SSE instruction on x86.
-
- 09 Apr, 2009 1 commit
-
-
Fiona Glaser authored
Move calculation of b_intra out of the core residual loop and hardcode it where applicable. Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size. Inlining only cache_mvd is faster and significantly smaller.
-
- 08 Apr, 2009 1 commit
-
-
Fiona Glaser authored
faster bs_write_te, port CABAC context selection optimization to CAVLC.
-
- 05 Apr, 2009 1 commit
-
-
Fiona Glaser authored
Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding, it's faster to use a branch than a cmov.
-
- 31 Mar, 2009 3 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
-
Fiona Glaser authored
-
- 30 Mar, 2009 3 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)
-
Fiona Glaser authored
range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3
-
- 27 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Add a second chroma threshold after the DC transform.
-
- 19 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Should slightly improve performance.
-
- 17 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Replace PHADD with FastShuffle (more accurate naming). This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.
-
- 10 Mar, 2009 1 commit
-
-
Fiona Glaser authored
palignr to avoid unaligned loads is worth it in inith, but not initv.
-
- 09 Mar, 2009 1 commit
-
-
Holger Lubitz authored
~10% faster hpel_filter on 64-bit Penryn. 32-bit version by Fiona Glaser.
-
- 08 Mar, 2009 1 commit
-
-
Fiona Glaser authored
Optimized using the DEINTB method from r1122. ~32% faster var_16x16 on Conroe.
-
- 07 Mar, 2009 3 commits
-
-
Fiona Glaser authored
Optimized using the same method as in r1122. Patch partially by Holger. ~8% faster hpel filter on 64-bit Nehalem
-
Fiona Glaser authored
-
Holger Lubitz authored
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs. 16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit) Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD. Overall performance boost is up to ~15% on 64-bit Conroe.
-
- 06 Mar, 2009 1 commit
-
-
Fiona Glaser authored
-
- 04 Mar, 2009 4 commits
-
-
Fiona Glaser authored
Also add psy-trellis to fprofile
-
Fiona Glaser authored
Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case. Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included.
-
Fiona Glaser authored
Also remove an unused variable
-
Fiona Glaser authored
Add support for no-b-adapt + pre-scenecut (patch by BugMaster) Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways. Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1) Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2. Simplify pre-scenecut code.
-
- 03 Mar, 2009 1 commit
-
-
Guillaume Poirier authored
Note this this implementation is pretty naive and should be improved by implementing what's discussed in this ML thread: date: Mon, Feb 2, 2009 at 6:58 PM subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines
-
- 26 Feb, 2009 1 commit
-
-
Fiona Glaser authored
Deblocking was very slightly incorrect with partitions=all. Bug found by BugMaster.
-
- 16 Feb, 2009 1 commit
-
-
Fiona Glaser authored
r1105 introduced array overflow in cbp handling
-
- 14 Feb, 2009 1 commit
-
-
tal.aloni authored
-
- 11 Feb, 2009 2 commits
-
-
Anton Mitrofanov authored
A "make distclean" is probably required after updating to this revision.
-
Fiona Glaser authored
Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.
-
- 10 Feb, 2009 1 commit
-
-
Manuel Rommel authored
Original thread: date: Mon, Feb 9, 2009 at 9:37 PM subject: [x264-devel] commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )
-
- 09 Feb, 2009 7 commits
-
-
Guillaume Poirier authored
-
Guillaume Poirier authored
This will allow simplifying vectors loads that can only load 16-bytes aligned data (such as AltiVec).
-
Fiona Glaser authored
Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP).
-
Fiona Glaser authored
Up to +0.04db with CAVLC, generally a lot less with CABAC.
-
Fiona Glaser authored
Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO. Up to 7% faster overall in extreme cases.
-
Fiona Glaser authored
-
Fiona Glaser authored
SSSE3 version of predict_8x8_hu SSE2 version of predict_8x8c_p SSSE3 versions of both planar prediction functions Optimizations to predict_16x16_p_sse2 Some unnecessary REP_RETs -> RETs. SSE2 version of predict_8x8_vr by Holger. SSE2 version of predict_8x8_hd. Don't compile MMX versions of some of the pred functions on x86_64. Remove now-useless x86_64 C versions of 4x4 pred functions. Rewrite some of the x86_64-only C functions in asm.
-