I’m as big fan (maniac) of perfectly crafted movie subtitles as I’m a regular expression newbie (ignorant). I simply don’t understand them and I’m pretty scared of each attempt / need of using them.
Until today, my biggest problem about movie subtitles were “sound-like” sentences. Manual removal of 100+ lines out of 400+ subtitle files wasn’t and option. Today, I said to myself, that I’m going to sit by the computer until I don’t figure out a regular expression, which I can feed into Notepad++ and replace all of this junk-like (at least to me) text out of each of my movie subtitle.
So I did. Finding proper regular expression for such extremely easy task is a snap of fingers for every regular expression freak. For regexp-ignorant, like me, it took no more than five minutes, so I managed to get on time for diner back home.
A sound-like sentences
If you don’t know, what are these, then let me show you some examples:
[MEN SPEAKlNG lN FORElGN LANGUAGE] [ALL GRUNTlNG] [MEN SHOUTlNG] [LAUGHS]
They have two common problems. They’re
- looking stupid (at least to me) — I’m not deaf and I can hear when someone is laughing, without a text telling me this,
- full of mistakes — notice all these
l(small letter “l”) in place of
The last one is a well known effect of using poor OCR software on graphical subtitles texts rendered into old DVDs, as DVD specification allows only graphical subtitles. Necessary to display ideograph-like letters (Japaneese, Chineese, Korean etc.), as there was no UTF-8, able to handle them, when DVD specification was created.
As I said in an introduction, manual removal of these lines were not an option and I had to hire regular expressions to get rid of them once and for good.
To cut the long story short, let me tell you that proper regular expression for this task is as simple as:
- opening square bracket plus
- any number of any character except newline plus
- closing square bracket.
The remaining part was to push this regex to Notepad++.
Here is a sample
Replace dialog configuration for replacing all sound-like texts inside single subtitle file:
And here is the same dialog configured for replacing all subtitles at once:
Doing a batch-replace on all files ant once is a certain risk, so — as you may see — I’m always performing such operation on a copy of all my subtitles.
Notice, that I’m replacing sound-like sentences with single space, not with an empty line and — the most important — I’m not removing entire subtitle parts containing them.
This is because
.srt format, I’m using, has each and every subtitle ordered using integer order and only removing subtitles from end of each file is possible.