1. Detect subtitle regions based on inter-frame visual consistency
2. Separate subtitle pixels from the background
3. Use OCR optimized for subtitle fonts and layouts to recognize text
4. Infer subtitle timing information based on duration of appearance