deinterlace: add SSE2 replacement for removed MMX/MMXEXT
This MR was originally purposed for purging all MMX/MMXEXT code from the codebase, including a conversion of MMX/MMXEXT in deinterlace to SSE2. Due to the android build issue with the latter discussed below holding everything up, most of the work was broken up into separate MRs, including a simple purge of MMX/MMXEXT from deinterlace in !424 (merged).
This MR now simply contains an SSE2 replacement for the purged deinterlace MMX/MMXEXT.
Merge request reports
Activity
added MRStatus::NotCompliant label
Does not work on Android-x86: https://code.videolan.org/jnqnfe/vlc/-/jobs/626135#L7114
Yeah I saw, I needed some sleep before I came back to it.
The failure relates to this NDK issue. I'm in the middle of trying to research what's going on such that a solution can be found. Any help anyone can give with this would be appreciated.
Having had a little time just now to further read into this before I grab some sleep, it appears to be the case that:
- Certain versions of Android have a bug in the app loader that fails to ensure 16-byte stack alignment in certain cases. This may have been fixed in either API v24 (v7 "Nougat"), v25 (v7 "Nougat"), or as late as v28 (v9 "Pie"). I've seen different answers in different threads/posts.
- Android NDK thus started using
-mstackrealign
as a workaround. This forces functions to use an alternative prologue and epilogue implementation that ensures 16-byte alignment of stacks. I believe that they have, or may intend, to only use this so long as the targeted minimum API version is below that which fixed the problem. - Unfortunately, the implementation of these alternatives in clang/llvm steals an extra register, and they have no intention of changing this implementation to avoid it (https://bugs.llvm.org/show_bug.cgi?id=37542).
So for solutions:
- Eventually we will no doubt upgrade our minimum supported Android version such that this no longer a problem. I expect that it is far too soon to do so now though?
- Someone here posted the following suggestion: "Pass
-mno-stackrealign
if you don't need static constructors to work prior to android-24" which sounds interesting. I'm not certain yet how correct that information is, or if it would apply here. - Someone here suggested that setting
HAVE_EBP_AVAILABLE=0
forces clang/llvm to use an alternate-alternate prologue/epilogue implementation that does not use an extra register. - Perhaps our (my) assembly could be changed in some way to use fewer registers?
I'll perhaps start by exploring giving 3 a try after I have slept.
mentioned in commit jnqnfe/vlc@7f9bbc23
added 117 commits
-
551a73c3...68b27c23 - 103 commits from branch
videolan:master
- bf23c23c - i420_yuy2/i422_yuy2/i420_rgb: purge MMX
- 4f0c1763 - add VLC_SSE2 define
- 5e3c8aea - deinterlace: convert MMXEXT only accelerations to SSE2
- d786bb03 - deinterlace: purge MMX/MMXEXT
- 82c77ca4 - deinterlace: purge 3dNow
- 81d25762 - deinterlace: use sfence instead of emms for SSE2
- 3c93788e - deinterlace: finish up SSE2 enhancement
- 5ceefe13 - gradfun: purge MMX
- 3234077e - grain: purge use of emms
- c28b4514 - chroma/copy: purge use of emms
- 28d909e4 - configure: purge MMX module
- 82806720 - configure: purge 3dNow module
- 78083df8 - tweak simd targets
- 7f9bbc23 - deinterlace: add workaround for android x86 build failure
Toggle commit list-
551a73c3...68b27c23 - 103 commits from branch
mentioned in commit jnqnfe/vlc@c88d8eaf
added 1 commit
- c88d8eaf - deinterlace: add workaround for android x86 build failure
mentioned in commit jnqnfe/vlc@12021006
added 1 commit
- 12021006 - deinterlace: add workaround for android x86 build failure
changed milestone to %4.0
mentioned in merge request !389 (merged)
mentioned in merge request !390 (merged)
mentioned in merge request !391 (closed)
mentioned in merge request !392 (merged)
mentioned in merge request !424 (merged)
added 699 commits
-
12021006...dd38fdc4 - 698 commits from branch
videolan:master
- 74ae2aea - deinterlace: add more SSE2 acceleration
-
12021006...dd38fdc4 - 698 commits from branch
added 1 commit
- 4d20c0b1 - deinterlace: add SSE2 replacement for removed MMX/MMXEXT
added 73 commits
-
4d20c0b1...5219ec95 - 72 commits from branch
videolan:master
- b6cd30c3 - deinterlace: add SSE2 replacement for removed MMX/MMXEXT
-
4d20c0b1...5219ec95 - 72 commits from branch
added 1244 commits
-
b6cd30c3...04029770 - 1243 commits from branch
videolan:master
- 360babf7 - deinterlace: add SSE2 replacement for removed MMX/MMXEXT
-
b6cd30c3...04029770 - 1243 commits from branch
166 p_out = p_dst->p[i_plane].p_pixels; 167 p_out_end = p_out + p_dst->p[i_plane].i_pitch 168 * p_dst->p[i_plane].i_visible_lines; 169 170 /* skip first line for bottom field */ 171 if( i_field == 1 ) 172 p_out += p_dst->p[i_plane].i_pitch; 173 174 int wm16 = w % 16; /* remainder */ 175 int w16 = w - wm16; /* part of width that is divisible by 16 */ 176 for( ; p_out < p_out_end ; p_out += 2*p_dst->p[i_plane].i_pitch ) 177 { 178 uint8_t *po = p_out; 179 int x = 0; 180 181 __asm__ volatile ( 175 int w16 = w - wm16; /* part of width that is divisible by 16 */ 176 for( ; p_out < p_out_end ; p_out += 2*p_dst->p[i_plane].i_pitch ) 177 { 178 uint8_t *po = p_out; 179 int x = 0; 180 181 __asm__ volatile ( 182 "movd %0, %%xmm1\n" 183 "movd %1, %%xmm2\n" 184 "pshufd $0, %%xmm2, %%xmm2\n" /* duplicate 32-bits across reg */ 185 :: "m" (i_strength), "m" (remove_high_u32) 186 : "xmm1", "xmm2" 187 ); 188 for( ; x < w16; x += 16 ) 189 { 190 __asm__ volatile ( mentioned in issue #28668