Tool Workspace

Browser OCR Workspace

Run OCR against PDFs and images without sending files to a server. PDF pages render locally with PDF.js, then OCR runs through Tesseract in your browser.

Client-side only PDF + JPG + PNG + GIF + pasted images 25 on-demand language packs

Runtime note

The tool loads local PDF and OCR runtime assets from this repo, then fetches language data on demand and caches it in your browser. It works best on the published site or over http://localhost. Some browsers may block worker startup from file://.

Input Upload, drop, paste
Language data is downloaded on demand from the public Tesseract.js language package CDN and then cached in your browser for reuse.
Drop a PDF or image here
or click / press Enter to choose a file
Paste an image from the clipboard anywhere on the page.
Example flow: drop a scanned PDF, watch each page render below, then copy the full extracted document.
Job Status Idle
Ready.
Current source
None
Pages / images
0
OCR language
English
Attribution

OCR runtime: Apache-2.0 Tesseract.js and Tesseract.js Core.

PDF rendering: Apache-2.0 PDF.js.

Language data is fetched on demand from the public Tesseract.js language package CDN and cached locally in the browser.

Pages and Images No pages rendered yet

Nothing processed yet

Choose a file, load the example PDF, or paste an image to begin.