Open-Source · Self-Hosted · PaddleOCR

VideOCR Subtitle Extraction: Extract Hardcoded Subtitles from Any Video

VideOCR is an open-source command-line tool that uses PaddleOCR to read burned-in subtitle text from video frames and export it as an SRT file — all locally, with no cloud upload required. This guide covers installation, GPU setup, accuracy expectations, and when a desktop app is the smarter choice.

Download EchoSubs Desktop See Hard-Sub Removal Feature

What Is VideOCR? An Open-Source Subtitle Extraction Tool

VideOCR is a Python library and CLI script that extracts hardcoded (burned-in) subtitles from video files using PaddleOCR — Baidu's open-source deep-learning OCR engine. It was shared on Reddit in March 2026 and quickly gained traction as one of the most capable free, self-hosted alternatives to commercial subtitle extraction services.

Unlike soft subtitles (stored as a separate .srt or .ass track), hardcoded subtitles are baked directly into the video pixels. Extracting them requires computer vision — specifically optical character recognition (OCR) applied frame-by-frame. VideOCR automates this pipeline with PaddleOCR handling the heavy lifting.

80 Languages

Supports CJK, Arabic, Cyrillic, Latin, Devanagari, Thai, and more via PaddleOCR language packs.

CPU & GPU Versions

Run on any machine CPU-only, or accelerate with an NVIDIA GPU via CUDA for 5–10× faster processing.

100% Offline

No internet connection required after the one-time model download. Your video never leaves your machine.

Note: VideOCR extracts subtitle text and outputs an SRT file — it does not remove or paint out the subtitles from the video. For full subtitle removal (clean video output), see hard-sub removal.

VideOCR Installation and Configuration (CPU / GPU)

VideOCR requires Python, FFmpeg, and PaddleOCR. The CPU version is straightforward to install on any platform; the GPU version requires an NVIDIA GPU with CUDA support. Below are the key steps:

Install Python 3.8+

Download from python.org. Ensure pip is included. On macOS, use Homebrew: brew install python.

Install FFmpeg

Windows: winget install ffmpeg. macOS: brew install ffmpeg. Linux: apt install ffmpeg.

Clone VideOCR

git clone https://github.com/tkahnGit/videocr (or the current active fork). cd videocr.

Install dependencies

pip install paddlepaddle paddleocr (CPU). For GPU: pip install paddlepaddle-gpu paddleocr — requires CUDA 11.2+ and cuDNN 8.1+.

Run extraction

python videocr/api.py --file myvideo.mp4 --lang en --output subs.srt. Change --lang to the subtitle language code.

# Example: extract English subtitles from an MKV file

python videocr/api.py --file movie.mkv --lang en --output movie_subs.srt

# For GPU (CUDA) acceleration:

pip install paddlepaddle-gpu paddleocr

After extraction, review the SRT file in a text editor or subtitle editor (like Subtitle Edit) before using it. Expect 2–5% character error rate on typical 1080p content.

VideOCR vs VideoSubFinder vs Commercial Tools

The hardcoded subtitle extraction space includes free open-source tools, free GUI apps, and commercial SaaS platforms. Here is how the major options compare:

Tool	Type	Cost	Languages	Output	Privacy	Setup
VideOCR	Open-source CLI	Free	80 langs	SRT only	Full offline	Complex
VideoSubFinder	Windows GUI	Free	~30 langs	SRT + image	Full offline	Easy
Subtitle Edit	Windows GUI	Free	40+ langs	SRT / ASS / VTT	Full offline	Easy
EchoSubs ★	Desktop app	One-time	50+ langs	SRT + clean video	Full offline	Installer
Media.io	Cloud SaaS	Subscription	Varies	SRT + video	Cloud upload	None
Pollo AI	Cloud SaaS	Subscription	Varies	Clean video	Cloud upload	None

★ EchoSubs handles both extraction and AI inpainting removal in one desktop workflow.

Supported Formats and Languages (80 Languages)

Video Format Support

Via FFmpeg frame extraction — virtually any container is supported:

MP4 (H.264/H.265)MKVAVIMOVTS / M2TSWMVWebMMPEG-2FLV

Higher bitrate and resolution (1080p+) improves OCR accuracy. 480p DVD rips may yield 70–80% accuracy.

Language Coverage

PaddleOCR's multilingual models cover ~80 languages:

EnglishChinese (Simplified)Chinese (Traditional)JapaneseKoreanArabicHindiRussianThaiFrenchGermanSpanishPortugueseItalianTurkishVietnamese+ 64 more

Bilingual subtitles: For mixed-language subtitles (common in Asian media with simultaneous Chinese + English), run two separate passes with the appropriate language code and merge the resulting SRT files. PaddleOCR's multilingual model can handle simple bilingual cases in a single pass.

Tutorial: How to Extract Hardcoded Subtitles with VideOCR

Prepare your video

Ensure the video is at 720p or higher. Lower-resolution video will reduce OCR accuracy. Trim to the section containing subtitles if you only need a portion — it reduces processing time significantly.

Identify the subtitle language

Pass the correct two-letter language code to VideOCR using the --lang flag (e.g., "en" for English, "zh" for Chinese, "ja" for Japanese). Using the wrong language model is the most common cause of poor results.

Set the subtitle region (optional)

By default, VideOCR scans the entire frame. For faster processing, define a crop region with --conf to focus only on the subtitle area (typically the bottom 15–20% of the frame). This reduces false positives from on-screen text in the scene.

Run the extraction

Execute the command and wait. CPU processing at 720p takes roughly 1–2 minutes per minute of video. GPU processing is 5–10× faster. Progress is printed to the terminal.

Review and clean the SRT output

Open the output SRT in Subtitle Edit or a text editor. Common issues: duplicate lines (same subtitle detected across multiple frames), missing punctuation, and OCR errors on stylised fonts. Run a spell-check pass for the target language.

Remove subtitles from the video (if needed)

If your goal is a clean video without the burned-in text, the SRT file gives you the exact timing and region data needed for inpainting. Use EchoSubs to automate the AI inpainting step — see the hard subtitle removal guide.

Frequently Asked Questions

Self-Hosted vs Online Tools: Privacy and Security

The rise of cloud-based subtitle removers (Media.io, Pollo AI, etc.) offers zero-setup convenience — but at a cost: your video is uploaded to a third-party server for processing. For many use cases this is acceptable; for others, it is a dealbreaker.

Self-Hosted / Desktop Advantages

No video upload — data never leaves your machine
Works fully offline after initial model download
No per-video fees or subscription required
Suitable for confidential corporate or medical content
No watermarks or resolution caps on output
Can process unlimited batch jobs without cost scaling

Online Tool Trade-offs

Video is uploaded to and processed on remote servers
Subscription or per-minute pricing adds up at scale
Quality limited by the cloud provider's model version
Service outages or shutdowns can block your workflow
Free tiers typically include watermarks or caps
May retain uploaded content per their privacy policy

EchoSubs: Desktop-First, Optionally Online

EchoSubs is installed once as a desktop application and runs AI models locally — no subscription, no upload. For users who want cloud-based AI parameter suggestions (e.g., auto-detect optimal inpainting strength), an optional online advisor mode is available, but all actual video processing remains on your machine.

Download Free Trial View Pricing Subtitle Removal Guide →