Tool Workspace

Browser OCR Workspace

Run OCR against PDFs and images without sending files to a server. PDF pages render locally with PDF.js, then OCR runs through Tesseract in your browser.

Client-side only PDF + JPG + PNG + GIF + pasted images 25 on-demand language packs

Runtime note

The tool loads local PDF and OCR runtime assets from this repo, then fetches language data on demand and caches it in your browser. It works best on the published site or over http://localhost. Some browsers may block worker startup from file://.

Input Upload, drop, paste

Language pack

Language data is downloaded on demand from the public Tesseract.js language package CDN and then cached in your browser for reuse.

Drop a PDF or image here

or click / press Enter to choose a file

Paste an image from the clipboard anywhere on the page.

Example flow: drop a scanned PDF, watch each page render below, then copy the full extracted document.

Job Status Idle

Ready.

Current source: None
Pages / images: 0
OCR language: English

Attribution

OCR runtime: Apache-2.0 Tesseract.js and Tesseract.js Core.

PDF rendering: Apache-2.0 PDF.js.

Language data is fetched on demand from the public Tesseract.js language package CDN and cached locally in the browser.

Pages and Images No pages rendered yet

Nothing processed yet

Choose a file, load the example PDF, or paste an image to begin.