OCR Text Recognition privacy model

This tool is classified as heavy workload and runs in On-device mode. Current status: beta (fully-functional). Release note: OCR quality depends on scan quality. First run downloads language data.

What this does

Applies the selected transformation to the document or exported output.
Keeps processing local in browser when marked On-device.
Uses monthly local counters for usage quotas.

What this does not protect

It does not remove names or sensitive content visible in document text or images.
It does not guarantee legal anonymity or endpoint compromise protection.
For hybrid tools, privacy depends on explicit cloud opt-in when enabled.
Tesseract.js runs in a Web Worker. Each page consumes ~50-100MB of RAM during processing. Documents over 50 pages may cause memory pressure on devices with less than 4GB free.
Handwritten text is poorly supported. Tesseract is designed for printed text. Expect less than 30% accuracy on handwriting.
Multi-column layouts are partially supported. Tesseract reads left-to-right by default and may interleave columns on complex layouts.
Tables are not preserved structurally. Cell contents are extracted as text, but row/column relationships are lost.

Safe workflow defaults

Verify output manually before sharing.
Use security guidance at /security for higher-risk scenarios.
Keep original and transformed files separated to avoid accidental leaks.

OCR Text Recognition — Privacy

OCR Text Recognition privacy model

What this does

What this does not protect

Safe workflow defaults