Open-Source · Self-Hosted · PaddleOCR

VideOCR Subtitle Extraction: Extract Hardcoded Subtitles from Any Video

VideOCR is an open-source command-line tool that uses PaddleOCR to read burned-in subtitle text from video frames and export it as an SRT file — all locally, with no cloud upload required. This guide covers installation, GPU setup, accuracy expectations, and when a desktop app is the smarter choice.

What Is VideOCR? An Open-Source Subtitle Extraction Tool

VideOCR is a Python library and CLI script that extracts hardcoded (burned-in) subtitles from video files using PaddleOCR — Baidu's open-source deep-learning OCR engine. It was shared on Reddit in March 2026 and quickly gained traction as one of the most capable free, self-hosted alternatives to commercial subtitle extraction services.

Unlike soft subtitles (stored as a separate .srt or .ass track), hardcoded subtitles are baked directly into the video pixels. Extracting them requires computer vision — specifically optical character recognition (OCR) applied frame-by-frame. VideOCR automates this pipeline with PaddleOCR handling the heavy lifting.

80 Languages
Supports CJK, Arabic, Cyrillic, Latin, Devanagari, Thai, and more via PaddleOCR language packs.
CPU & GPU Versions
Run on any machine CPU-only, or accelerate with an NVIDIA GPU via CUDA for 5–10× faster processing.
100% Offline
No internet connection required after the one-time model download. Your video never leaves your machine.
Note: VideOCR extracts subtitle text and outputs an SRT file — it does not remove or paint out the subtitles from the video. For full subtitle removal (clean video output), see hard-sub removal.

VideOCR Installation and Configuration (CPU / GPU)

VideOCR requires Python, FFmpeg, and PaddleOCR. The CPU version is straightforward to install on any platform; the GPU version requires an NVIDIA GPU with CUDA support. Below are the key steps:

1
Install Python 3.8+
Download from python.org. Ensure pip is included. On macOS, use Homebrew: brew install python.
2
Install FFmpeg
Windows: winget install ffmpeg. macOS: brew install ffmpeg. Linux: apt install ffmpeg.
3
Clone VideOCR
git clone https://github.com/tkahnGit/videocr (or the current active fork). cd videocr.
4
Install dependencies
pip install paddlepaddle paddleocr (CPU). For GPU: pip install paddlepaddle-gpu paddleocr — requires CUDA 11.2+ and cuDNN 8.1+.
5
Run extraction
python videocr/api.py --file myvideo.mp4 --lang en --output subs.srt. Change --lang to the subtitle language code.
# Example: extract English subtitles from an MKV file
python videocr/api.py --file movie.mkv --lang en --output movie_subs.srt
# For GPU (CUDA) acceleration:
pip install paddlepaddle-gpu paddleocr

After extraction, review the SRT file in a text editor or subtitle editor (like Subtitle Edit) before using it. Expect 2–5% character error rate on typical 1080p content.

VideOCR vs VideoSubFinder vs Commercial Tools

The hardcoded subtitle extraction space includes free open-source tools, free GUI apps, and commercial SaaS platforms. Here is how the major options compare:

ToolTypeCostGPUGUILanguagesOutputPrivacySetup
VideOCROpen-source CLIFree80 langsSRT onlyFull offlineComplex
VideoSubFinderWindows GUIFree~30 langsSRT + imageFull offlineEasy
Subtitle EditWindows GUIFree40+ langsSRT / ASS / VTTFull offlineEasy
EchoSubs ★Desktop appOne-time50+ langsSRT + clean videoFull offlineInstaller
Media.ioCloud SaaSSubscriptionVariesSRT + videoCloud uploadNone
Pollo AICloud SaaSSubscriptionVariesClean videoCloud uploadNone

★ EchoSubs handles both extraction and AI inpainting removal in one desktop workflow.

Supported Formats and Languages (80 Languages)

Video Format Support

Via FFmpeg frame extraction — virtually any container is supported:

MP4 (H.264/H.265)MKVAVIMOVTS / M2TSWMVWebMMPEG-2FLV

Higher bitrate and resolution (1080p+) improves OCR accuracy. 480p DVD rips may yield 70–80% accuracy.

Language Coverage

PaddleOCR's multilingual models cover ~80 languages:

EnglishChinese (Simplified)Chinese (Traditional)JapaneseKoreanArabicHindiRussianThaiFrenchGermanSpanishPortugueseItalianTurkishVietnamese+ 64 more
Bilingual subtitles: For mixed-language subtitles (common in Asian media with simultaneous Chinese + English), run two separate passes with the appropriate language code and merge the resulting SRT files. PaddleOCR's multilingual model can handle simple bilingual cases in a single pass.

Tutorial: How to Extract Hardcoded Subtitles with VideOCR

1
Prepare your video
Ensure the video is at 720p or higher. Lower-resolution video will reduce OCR accuracy. Trim to the section containing subtitles if you only need a portion — it reduces processing time significantly.
2
Identify the subtitle language
Pass the correct two-letter language code to VideOCR using the --lang flag (e.g., "en" for English, "zh" for Chinese, "ja" for Japanese). Using the wrong language model is the most common cause of poor results.
3
Set the subtitle region (optional)
By default, VideOCR scans the entire frame. For faster processing, define a crop region with --conf to focus only on the subtitle area (typically the bottom 15–20% of the frame). This reduces false positives from on-screen text in the scene.
4
Run the extraction
Execute the command and wait. CPU processing at 720p takes roughly 1–2 minutes per minute of video. GPU processing is 5–10× faster. Progress is printed to the terminal.
5
Review and clean the SRT output
Open the output SRT in Subtitle Edit or a text editor. Common issues: duplicate lines (same subtitle detected across multiple frames), missing punctuation, and OCR errors on stylised fonts. Run a spell-check pass for the target language.
6
Remove subtitles from the video (if needed)
If your goal is a clean video without the burned-in text, the SRT file gives you the exact timing and region data needed for inpainting. Use EchoSubs to automate the AI inpainting step — see the hard subtitle removal guide.

Frequently Asked Questions

Self-Hosted vs Online Tools: Privacy and Security

The rise of cloud-based subtitle removers (Media.io, Pollo AI, etc.) offers zero-setup convenience — but at a cost: your video is uploaded to a third-party server for processing. For many use cases this is acceptable; for others, it is a dealbreaker.

Self-Hosted / Desktop Advantages

  • No video upload — data never leaves your machine
  • Works fully offline after initial model download
  • No per-video fees or subscription required
  • Suitable for confidential corporate or medical content
  • No watermarks or resolution caps on output
  • Can process unlimited batch jobs without cost scaling

Online Tool Trade-offs

  • Video is uploaded to and processed on remote servers
  • Subscription or per-minute pricing adds up at scale
  • Quality limited by the cloud provider's model version
  • Service outages or shutdowns can block your workflow
  • Free tiers typically include watermarks or caps
  • May retain uploaded content per their privacy policy

EchoSubs: Desktop-First, Optionally Online

EchoSubs is installed once as a desktop application and runs AI models locally — no subscription, no upload. For users who want cloud-based AI parameter suggestions (e.g., auto-detect optimal inpainting strength), an optional online advisor mode is available, but all actual video processing remains on your machine.