Split-merge algorithm for converting to simple subtitle formats
From Aegisub Wiki
This page describes an algorithm that splits and re-merges overlapping subtitle lines, such that there are no overlapping lines after the algorithm has been run.
The algorithm can handle an arbitrary number of overlapping lines.
Contents |
[edit] Pre-requirements
- The lines must be sorted by starting time.
- There must be no "superflous" lines, such as comments, blank lines, header data etc.
[edit] Conflict resolution
When lines overlap, the text of the line that appears later is placed above the text of the line that appears earlier.
For example, consider two lines, A and B, where line A is has timing 0-2 and line B has timing 1-3, three subtitle lines will be generated:
A
B A
B
Styles are completely ignored in this algorithm, at least for now. It is undefined what styles are assigned to the new lines if the original A and B lines have different styles.
[edit] Variables maintained
- List of subtitle lines, named LIST
- Pointer into LIST named PTR, this pointer points between two entries in LIST
- Previous subtitle line, PREV, this is the line that is before PTR
- Next/current subtitle line, CUR, this is the line that is after PTR
CUR-S refers to the start time of CUR, CUR-E refers to the end time of CUR. Same for PREV-S and PREV-E.
[edit] Main process
This section describes the main process of the algorithm. There is also a sub-process described below.
PTR is initialised to one item past the start of LIST, ie. it starts with the first line as PREV and second as CUR.
When PTR reaches the end of LIST, the algorithm exits successfully.
The following steps are repeated until PTR reaches the end of LIST:
- If CUR-S >= PREV-E, there is no overlap. Move PTR forward by one and continue with next iteration.
- Otherwise, there is an overlap. Move PTR back by one and remove PREV and CUR from LIST. Store them locally for further use.
- If CUR-S > PREV-S, generate a new line using the text of PREV and timed from PREV-S to CUR-S. Insert using the Line-insertion process.
- Generate a new line with the text "CUR <newline> PREV". The timing of this line is CUR-S to MIN(PREV-E,CUR-E). Insert using the Line-insertion process.
- If CUR-E > PREV-E, generate a new line using the text of CUR and timed from PREV-E to CUR-E. Insert using the Line-insertion process.
- if PREV-E > CUR-E, generate a new line using the text of PREV and timed from CUR-E to PREV-E. Insert using the Line-insertion process.
- Move PTR forward by one.
[edit] Line-insertion process
Here, NEW refers to the line being inserted. PTR is received from the main process, but is not shared. NEXT refers to the item after PTR in LIST.
- If NEXT is null, insert NEW at PTR. Move PTR forward by one. Return.
- If NEW-S <= NEXT-S, insert NEW at PTR. Move PTR forward by one. Return.
- If NEW-S > NEXT-S, move PTR forward by one. Repeat from step 1.
This process ensures that lines generated by the main process are inserted such that the sorted property of LIST is maintained at all times.
[edit] Proof of correctness
None at this time. I am quite sure this algorithm is correct but haven't proven it. If anyone can prove or disprove the correctness of it, please do contact me (jfs/Niels) as it most likely means there is a bug in Aegisub.
[edit] Testcase
The following ASS lines should test all relevant cases for recombining lines.
Dialogue: 0,0:00:00.00,0:00:03.00,Default,,0000,0000,0000,,line 1 Dialogue: 0,0:00:01.00,0:00:04.00,Default,,0000,0000,0000,,line 2 Dialogue: 0,0:00:10.00,0:00:15.00,Default,,0000,0000,0000,,line 3 Dialogue: 0,0:00:12.00,0:00:13.00,Default,,0000,0000,0000,,line 4 Dialogue: 0,0:00:20.00,0:00:25.00,Default,,0000,0000,0000,,line 5 Dialogue: 0,0:00:20.00,0:00:22.00,Default,,0000,0000,0000,,line 6 Dialogue: 0,0:00:30.00,0:00:35.00,Default,,0000,0000,0000,,line 7 Dialogue: 0,0:00:33.00,0:00:35.00,Default,,0000,0000,0000,,line 8 Dialogue: 0,0:00:40.00,0:00:45.00,Default,,0000,0000,0000,,line 9 Dialogue: 0,0:00:40.00,0:00:45.00,Default,,0000,0000,0000,,line 10 Dialogue: 0,0:00:50.00,0:00:53.00,Default,,0000,0000,0000,,line 11 Dialogue: 0,0:00:51.00,0:00:54.00,Default,,0000,0000,0000,,line 12 Dialogue: 0,0:00:52.00,0:00:55.00,Default,,0000,0000,0000,,line 13
[edit] Output in SubRip format
1 00:00:00,000 --> 00:00:01,000 line 1 2 00:00:01,000 --> 00:00:03,000 line 2 line 1 3 00:00:03,000 --> 00:00:04,000 line 2 4 00:00:10,000 --> 00:00:12,000 line 3 5 00:00:12,000 --> 00:00:13,000 line 4 line 3 6 00:00:13,000 --> 00:00:15,000 line 3 7 00:00:20,000 --> 00:00:22,000 line 6 line 5 8 00:00:22,000 --> 00:00:25,000 line 5 9 00:00:30,000 --> 00:00:33,000 line 7 10 00:00:33,000 --> 00:00:35,000 line 8 line 7 11 00:00:40,000 --> 00:00:45,000 line 10 line 9 12 00:00:50,000 --> 00:00:51,000 line 11 13 00:00:51,000 --> 00:00:52,000 line 12 line 11 14 00:00:52,000 --> 00:00:53,000 line 13 line 12 line 11 15 00:00:53,000 --> 00:00:54,000 line 12 line 13 16 00:00:54,000 --> 00:00:55,000 line 13
[edit] Output in TranStation format
SUB [0 N 00:00:00:00>00:00:01:00] line 1 SUB [0 N 00:00:01:00>00:00:03:00] line 2 line 1 SUB [0 N 00:00:03:00>00:00:04:00] line 2 SUB [0 N 00:00:10:00>00:00:12:00] line 3 SUB [0 N 00:00:12:00>00:00:13:00] line 4 line 3 SUB [0 N 00:00:13:00>00:00:15:00] line 3 SUB [0 N 00:00:20:00>00:00:22:00] line 6 line 5 SUB [0 N 00:00:22:00>00:00:25:00] line 5 SUB [0 N 00:00:30:00>00:00:33:00] line 7 SUB [0 N 00:00:33:00>00:00:35:00] line 8 line 7 SUB [0 N 00:00:40:00>00:00:45:00] line 10 line 9 SUB [0 N 00:00:50:00>00:00:51:00] line 11 SUB [0 N 00:00:51:00>00:00:52:00] line 12 line 11 SUB [0 N 00:00:52:00>00:00:53:00] line 13 line 12 line 11 SUB [0 N 00:00:53:00>00:00:54:00] line 12 line 13 SUB [0 N 00:00:54:00>00:00:55:00] line 13
