Split-merge algorithm for converting to simple subtitle formats

From Aegisub Wiki

Jump to: navigation, search

This page describes an algorithm that splits and re-merges overlapping subtitle lines, such that there are no overlapping lines after the algorithm has been run.

The algorithm can handle an arbitrary number of overlapping lines.

Contents

[edit] Pre-requirements

  • The lines must be sorted by starting time.
  • There must be no "superflous" lines, such as comments, blank lines, header data etc.

[edit] Conflict resolution

When lines overlap, the text of the line that appears later is placed above the text of the line that appears earlier.

For example, consider two lines, A and B, where line A is has timing 0-2 and line B has timing 1-3, three subtitle lines will be generated:

A
B
A
B

Styles are completely ignored in this algorithm, at least for now. It is undefined what styles are assigned to the new lines if the original A and B lines have different styles.

[edit] Variables maintained

  • List of subtitle lines, named LIST
  • Pointer into LIST named PTR, this pointer points between two entries in LIST
  • Previous subtitle line, PREV, this is the line that is before PTR
  • Next/current subtitle line, CUR, this is the line that is after PTR

CUR-S refers to the start time of CUR, CUR-E refers to the end time of CUR. Same for PREV-S and PREV-E.

[edit] Main process

This section describes the main process of the algorithm. There is also a sub-process described below.

PTR is initialised to one item past the start of LIST, ie. it starts with the first line as PREV and second as CUR.

When PTR reaches the end of LIST, the algorithm exits successfully.

The following steps are repeated until PTR reaches the end of LIST:

  1. If CUR-S >= PREV-E, there is no overlap. Move PTR forward by one and continue with next iteration.
  2. Otherwise, there is an overlap. Move PTR back by one and remove PREV and CUR from LIST. Store them locally for further use.
  3. If CUR-S > PREV-S, generate a new line using the text of PREV and timed from PREV-S to CUR-S. Insert using the Line-insertion process.
  4. Generate a new line with the text "CUR <newline> PREV". The timing of this line is CUR-S to MIN(PREV-E,CUR-E). Insert using the Line-insertion process.
  5. If CUR-E > PREV-E, generate a new line using the text of CUR and timed from PREV-E to CUR-E. Insert using the Line-insertion process.
  6. if PREV-E > CUR-E, generate a new line using the text of PREV and timed from CUR-E to PREV-E. Insert using the Line-insertion process.
  7. Move PTR forward by one.

[edit] Line-insertion process

Here, NEW refers to the line being inserted. PTR is received from the main process, but is not shared. NEXT refers to the item after PTR in LIST.

  1. If NEXT is null, insert NEW at PTR. Move PTR forward by one. Return.
  2. If NEW-S <= NEXT-S, insert NEW at PTR. Move PTR forward by one. Return.
  3. If NEW-S > NEXT-S, move PTR forward by one. Repeat from step 1.

This process ensures that lines generated by the main process are inserted such that the sorted property of LIST is maintained at all times.

[edit] Proof of correctness

None at this time. I am quite sure this algorithm is correct but haven't proven it. If anyone can prove or disprove the correctness of it, please do contact me (jfs/Niels) as it most likely means there is a bug in Aegisub.

[edit] Testcase

The following ASS lines should test all relevant cases for recombining lines.

Dialogue: 0,0:00:00.00,0:00:03.00,Default,,0000,0000,0000,,line 1
Dialogue: 0,0:00:01.00,0:00:04.00,Default,,0000,0000,0000,,line 2
Dialogue: 0,0:00:10.00,0:00:15.00,Default,,0000,0000,0000,,line 3
Dialogue: 0,0:00:12.00,0:00:13.00,Default,,0000,0000,0000,,line 4
Dialogue: 0,0:00:20.00,0:00:25.00,Default,,0000,0000,0000,,line 5
Dialogue: 0,0:00:20.00,0:00:22.00,Default,,0000,0000,0000,,line 6
Dialogue: 0,0:00:30.00,0:00:35.00,Default,,0000,0000,0000,,line 7
Dialogue: 0,0:00:33.00,0:00:35.00,Default,,0000,0000,0000,,line 8
Dialogue: 0,0:00:40.00,0:00:45.00,Default,,0000,0000,0000,,line 9
Dialogue: 0,0:00:40.00,0:00:45.00,Default,,0000,0000,0000,,line 10
Dialogue: 0,0:00:50.00,0:00:53.00,Default,,0000,0000,0000,,line 11
Dialogue: 0,0:00:51.00,0:00:54.00,Default,,0000,0000,0000,,line 12
Dialogue: 0,0:00:52.00,0:00:55.00,Default,,0000,0000,0000,,line 13

[edit] Output in SubRip format

1
00:00:00,000 --> 00:00:01,000
line 1

2
00:00:01,000 --> 00:00:03,000
line 2
line 1

3
00:00:03,000 --> 00:00:04,000
line 2

4
00:00:10,000 --> 00:00:12,000
line 3

5
00:00:12,000 --> 00:00:13,000
line 4
line 3

6
00:00:13,000 --> 00:00:15,000
line 3

7
00:00:20,000 --> 00:00:22,000
line 6
line 5

8
00:00:22,000 --> 00:00:25,000
line 5

9
00:00:30,000 --> 00:00:33,000
line 7

10
00:00:33,000 --> 00:00:35,000
line 8
line 7

11
00:00:40,000 --> 00:00:45,000
line 10
line 9

12
00:00:50,000 --> 00:00:51,000
line 11

13
00:00:51,000 --> 00:00:52,000
line 12
line 11

14
00:00:52,000 --> 00:00:53,000
line 13
line 12
line 11

15
00:00:53,000 --> 00:00:54,000
line 12
line 13

16
00:00:54,000 --> 00:00:55,000
line 13

[edit] Output in TranStation format

SUB [0 N 00:00:00:00>00:00:01:00]
line 1

SUB [0 N 00:00:01:00>00:00:03:00]
line 2
line 1

SUB [0 N 00:00:03:00>00:00:04:00]
line 2

SUB [0 N 00:00:10:00>00:00:12:00]
line 3

SUB [0 N 00:00:12:00>00:00:13:00]
line 4
line 3

SUB [0 N 00:00:13:00>00:00:15:00]
line 3

SUB [0 N 00:00:20:00>00:00:22:00]
line 6
line 5

SUB [0 N 00:00:22:00>00:00:25:00]
line 5

SUB [0 N 00:00:30:00>00:00:33:00]
line 7

SUB [0 N 00:00:33:00>00:00:35:00]
line 8
line 7

SUB [0 N 00:00:40:00>00:00:45:00]
line 10
line 9

SUB [0 N 00:00:50:00>00:00:51:00]
line 11

SUB [0 N 00:00:51:00>00:00:52:00]
line 12
line 11

SUB [0 N 00:00:52:00>00:00:53:00]
line 13
line 12
line 11

SUB [0 N 00:00:53:00>00:00:54:00]
line 12
line 13

SUB [0 N 00:00:54:00>00:00:55:00]
line 13
Personal tools