Commits on Source (10)
-
this does not just apply to srt files, it is done for all formats handled by this demuxer. it also needed additional clarification.
550cd952 -
we find the period for the extension and set it to null such that we can then find the second to last period that should preceed the substring we want to extract. there is absolutely no point in restoring the period afterwards in the working copy which we are then just about to destroy. the comment did not even make any sense.
eb07c5f2 -
d582bf98
-
(non-functional) prepares for handling an alternate common pattern; removes unnecessary variable (we can reuse the `psz_tmp` var now instead of also having `psz_language_begin`); better readability.
04aed075 -
...and thus prevent some ugly failures. (the string given to the function is the full filepath not just the filename). the attempt to determine the language from subtitle filenames is based upon a single common pattern - PATH/filename.LANG.ext. whilst it works just fine for this pattern, it is not the only pattern commonly used, for instance PATH/Subs/x_LANG.ext (where 'x' is an integer). in such cases where the period for the extension is the only one in the filename, the function could produce an ugly result should any directory in the path happen to contain a period (if not, NULL would be returned). it would incorrectly capture a chunk of the path as part of the substring extraction, producing results like "FOOBAR/Subs/1_English" (or worse) which then end up as the language name displayed under the subtitle menu and elsewhere. this commit strips the string processed down to filename only and thus prevents such ugliness. the next commit will introduce proper handling for the just mentioned alternate common pattern.
0984f11c -
the only pattern handled was PATH/filename.LANG.ext. another common one is PATH/Subs/x_LANG.ext which this adds handling for. this simply replies upon falling back to trying to get the substring after the last underscore if trying to get the substring after a period fails. we do not explicitly require the second pattern to only occur in files found under a 'Subs' subdir, since it is not certain that there is value in implementing such a restriction.
0dd41106 -
at least one subtitle format handled by this demuxer may hold the language as a property specified within the file. we should allow the parser to extract and use that as an alternative to the filename based substring extraction. this sets things up to allow the parser functions to provide that extracted property string.
58ce4b96 -
the substring obtained from filename extraction in some cases is perfect but in other cases may not be a language at all, just some portion of the filename. stating 'detected language FOO' is a bit odd if it turns out to not actually be a language name that we've extracted. let's fix that by clarifying what we've actually retrieved, and thus distinguish the less reliable filename extraction result from the likely more reliable property available in some subtitle files. also, enclose in quotes in both cases. for the filename based case since this simply makes sense. in the property case, since this may be a language code, as it is for ASS/SSA.
17b2c064 -
... for language identification. this info property has been supported by libass since v0.10.0. it is currently a 2-char iso-639-1 code. libass commit adding support: https://github.com/libass/libass/commit/c979365946b2dc2499ede862b6f7da15f9bc0ed1 discussion about enhancing the attribute to support 3-char iso-639-2 codes, possibly bcp-47: https://github.com/libass/libass/issues/404
49c7098e -
we only need to allocate the `psz_text` buffer when handling `Dialogue` and `Language` lines. restricting allocation to lines beginning with 'D' or 'L' is a simple way of avoiding most/all that are unnecessary.
d7d8cff6