PDF Changer

Is OCR Text Recognition private by default?

Frequently asked question for OCR Text Recognition.

Open tool · How-to · Privacy

Is OCR Text Recognition private by default?

All OCR processing happens in your browser using Tesseract.js (WASM). No page images leave your device. Language data files are cached in the browser after first download.

What this does not protect

Tesseract.js runs in a Web Worker. Each page consumes ~50-100MB of RAM during processing. Documents over 50 pages may cause memory pressure on devices with less than 4GB free.
Handwritten text is poorly supported. Tesseract is designed for printed text. Expect less than 30% accuracy on handwriting.
Multi-column layouts are partially supported. Tesseract reads left-to-right by default and may interleave columns on complex layouts.
Tables are not preserved structurally. Cell contents are extracted as text, but row/column relationships are lost.
It cannot fix compromised devices, accounts, or unsafe sharing channels.