2000 character limit reached
OCR Error Correction Using Character Correction and Feature-Based Word Classification (1604.06225v1)
Published 21 Apr 2016 in cs.IR and cs.CL
Abstract: This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow LLM, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.