OCR Text Recognition privacy model
This tool is classified as heavy workload and runs in On-device mode. Current status: beta (fully-functional). Release note: OCR quality depends on scan quality. First run downloads language data.
What this does
- Applies the selected transformation to the document or exported output.
- Keeps processing local in browser when marked On-device.
- Uses monthly local counters for usage quotas.
What this does not protect
- It does not remove names or sensitive content visible in document text or images.
- It does not guarantee legal anonymity or endpoint compromise protection.
- For hybrid tools, privacy depends on explicit cloud opt-in when enabled.
- Tesseract.js runs in a Web Worker. Each page consumes ~50-100MB of RAM during processing. Documents over 50 pages may cause memory pressure on devices with less than 4GB free.
- Handwritten text is poorly supported. Tesseract is designed for printed text. Expect less than 30% accuracy on handwriting.
- Multi-column layouts are partially supported. Tesseract reads left-to-right by default and may interleave columns on complex layouts.
- Tables are not preserved structurally. Cell contents are extracted as text, but row/column relationships are lost.
Safe workflow defaults
- Verify output manually before sharing.
- Use security guidance at
/securityfor higher-risk scenarios. - Keep original and transformed files separated to avoid accidental leaks.