A powerful optical character recognition (OCR) extension to capture and convert images to text
This extension adds a toolbar button to your browser to perform OCR. When this action button is pressed, it allows the user to select a region in the currently active window. The extension captures the area and tries to recognize text inside this region using the internal powerful OCR engine (Tesseract engine). This extension uses the "tesseract.js" library that supports more than 100 languages, automatic text orientation, and script detection.
This extension loads the JS library on the page and removes it when you are done. This way, there is no long-term resource usage.
1. On the first run, the extension might take a few minutes to fetch the training data from the internet. Since this resource is cached, all subsequent calls are going to be fast.
2. Optical character recognition (OCR) is slow, so this extension displays a progress bar for each detection module.
3. This extension does the OCR process offline. There is no server-side interaction. It only fetches the language training database once.
4. This tool can be used to extract the text content out of images, PDF documents, Powerpoint slides, or extract the content of a web page when user-section is forbidden.
5. If the text extraction confidence is low, the extension inverts the image and retries (particularly useful on the dark theme)
6. If the text extraction is not accurate, you can modify the image and drop it into the interface to retry.
Supports browser-level page zooming and OS-level screen zooming
Adds image language detection (still beta) (slow operation)