Papers
Topics
Authors
Recent
2000 character limit reached

Recognition of Handwritten Roman Script Using Tesseract Open source OCR Engine

Published 30 Mar 2010 in cs.CV | (1003.5891v1)

Abstract: In the present work, we have used Tesseract 2.01 open source Optical Character Recognition (OCR) Engine under Apache License 2.0 for recognition of handwriting samples of lower case Roman script. Handwritten isolated and free-flow text samples were collected from multiple users. Tesseract is trained to recognize user-specific handwriting samples of both the categories of document pages. On a single user model, the system is trained with 1844 isolated handwritten characters and the performance is tested on 1133 characters, taken form the test set. The overall character-level accuracy of the system is observed as 83.5%. The system fails to segment 5.56% characters and erroneously classifies 10.94% characters.

Citations (11)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.