That said, VLC should validate the filename is actually valid UTF-8 and reject it or replace the wrong characters if it's not, as libvlc users probably expect valid UTF-8 strings.
Moving back to the mainline vlc bug tracker as the problem is not specific to iOS - it just does not show android due to an implementation detail of the OS.
Felix Paul Kühnechanged title from special character in filename cause the filename to appear blank in iOS app to special character in filename cause the filename to appear blank in SMB2 discovery
changed title from special character in filename cause the filename to appear blank in iOS app to special character in filename cause the filename to appear blank in SMB2 discovery
0xED, 0xA0, 0xBC now this is the more surprising part, 0xED hints this might be a surrogate. Specifically this is the UTF-8 representation of the U+D83C high surrogate.
0xED, 0xBF, 0xAE this is the UTF-8 representation of the U+DFEE low surrogate.
0x2E, 0x6D, 0x70, 0x34 is .mp4, as expected again
Surrogates
Surrogates are a way to address characters beyond the BMP (Basic Multilingual Plane) used in UTF-16, a high surrogate followed by are low surrogate. The two surrogates are joined to get the actual unicode character. More information about this can be found in the Unicode Book.
Now if we would combine our two surrogates, U+D83C and U+DFEE, we get U+1F3EE which is our familiar lantern .
However, surrogates are not necessary and actually invalid in UTF-8 according to Table 3-7 in the The Unicode Standard:
Table 3-7. Well-Formed UTF-8 Byte Sequences
Code Points
First Byte
Second Byte
Third Byte
Fourth Byte
U+0000..U+007F
00..7F
U+0080..U+07FF
C2..DF
80..BF
U+0800..U+0FFF
E0
A0..BF
80..BF
U+1000..U+CFFF
E1..EC
80..BF
80..BF
U+D000..U+D7FF
ED
80..9F
80..BF
U+E000..U+FFFF
EE..EF
80..BF
80..BF
U+10000..U+3FFFF
F0
90..BF
80..BF
80..BF
U+40000..U+FFFFF
F1..F3
80..BF
80..BF
80..BF
U+100000..U+10FFFF
F4
80..8F
80..BF
80..BF
So when trying to construct a CFString (macOS/iOS CoreFoundation's string type) it fails and the item gets no name.
The library seems to be using CESU-8 where VLC expects UTF-8. It wouldn't necessarily be a bug if the library names and documents stated it. But it seems that they claim to use UTF-8? If so it is evidently a library bug.
...
--
Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
Yeah, unfortunately I only found the upstream ticket after that, would have saved me a bit of time investigating this… While I can bump the contrib so that the smb2 version has the fix, I am not sure what we should do about linux where typically no contribs are used?