Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient, Lexicon-Free OCR using Deep Learning (1906.01969v1)

Published 5 Jun 2019 in cs.CV and cs.LG

Abstract: Contrary to popular belief, Optical Character Recognition (OCR) remains a challenging problem when text occurs in unconstrained environments, like natural scenes, due to geometrical distortions, complex backgrounds, and diverse fonts. In this paper, we present a segmentation-free OCR system that combines deep learning methods, synthetic training data generation, and data augmentation techniques. We render synthetic training data using large text corpora and over 2000 fonts. To simulate text occurring in complex natural scenes, we augment extracted samples with geometric distortions and with a proposed data augmentation technique - alpha-compositing with background textures. Our models employ a convolutional neural network encoder to extract features from text images. Inspired by the recent progress in neural machine translation and LLMing, we examine the capabilities of both recurrent and convolutional neural networks in modeling the interactions between input elements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Marcin Namysl (5 papers)
  2. Iuliu Konya (1 paper)
Citations (34)

Summary

We haven't generated a summary for this paper yet.