28/10/2016

Tesseract.js

 

A Javascript-based OCR tool.

From Bengali to Swahili, MIT's recent release of Tesseract as a Javascript library allows you to detect words in images for over 60 languages.

russian.PNG

Image: Converting Russian.

The library supports automatic text orientation and script detection, and provides a simple interface for reading paragraph, word, and character bounding boxes. It can be run in-browser or on a server with NodeJS.

As a javascript library, Tesseract.js offers a number of benefits compared to other OCR tools.

"The first reason is convenience -- the C++ version of Tesseract can be tricky to install, and nearly impossible for people with rare setups or limited privileges," the developers told InfoWorld.

"The second reason is that for some applications, it's just too expensive or painful to set up a server to offload image processing onto. Tesseract.js lets you offload the computationally expensive task of text recognition to the client, allowing your service to scale to arbitrarily many users without having to figure out how to set up -- and to pay for -- compute clusters doing OCR."

Check out Tesseract.js on GitHub here.

Comments