Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Scene Text Recognition with Automatic Rectification (1603.03915v2)

Published 12 Mar 2016 in cs.CV
Robust Scene Text Recognition with Automatic Rectification

Abstract: Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-designed deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.

Robust Scene Text Recognition with Automatic Rectification

The paper "Robust Scene Text Recognition with Automatic Rectification" introduces RARE, a novel model designed for recognizing irregular text in natural images. The challenge stems from factors like perspective distortion and curved character placement, distinct from traditional OCR tasks focused on regular text. The proposed RARE framework integrates a Spatial Transformer Network (STN) with a Sequence Recognition Network (SRN) to enhance robustness to such irregularities.

Core Contributions

RARE's architecture brings several key innovations:

  • Integration of STN and SRN: The STN predicts a Thin-Plate-Spline (TPS) transformation, rectifying the image into a form more suitable for recognition. This transformation is versatile enough to handle various text distortions effectively.
  • End-to-End Trainability: RARE can be trained end-to-end without requiring additional geometric annotations. This is achieved by leveraging the back-propagation capabilities of both the STN and SRN.
  • Attention-Based Sequence Recognition: The SRN employs an attention mechanism within a convolutional-recurrent structure, where it processes the rectified images, extracting a sequential representation conducive to character recognition.

Experimental Insights

Extensive evaluations on several benchmarks underscore RARE's efficacy:

  • Performance: Achieving state-of-the-art accuracy, particularly in tasks involving irregular text, highlights RARE's robustness. On the IIIT5K dataset, RARE outperformed existing methods with notable improvements in recognizing curved and perspective text.
  • Comparison with Baselines: Without a lexicon, RARE demonstrated superior performance over previous architectures such as CRNN, especially on datasets like SVT-Perspective and CUTE80, which focus on distorted text.

Implications and Future Directions

The findings suggest several implications:

  • Practical Deployments: Given its robustness and flexibility, RARE can be effectively deployed in real-world applications where text appears distorted due to environmental factors.
  • Potential Integrations: Future work could explore integrating RARE with text detection systems to build comprehensive end-to-end solutions for scene text reading.

The research illustrates a critical advancement in handling irregular text recognition by unifying rectification and recognition processes within a single trainable model architecture. The extension of STN capabilities combined with attention-based sequence recognition sets a foundation for further improvements in scene text recognition technologies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Baoguang Shi (15 papers)
  2. Xinggang Wang (163 papers)
  3. Pengyuan Lyu (19 papers)
  4. Cong Yao (70 papers)
  5. Xiang Bai (221 papers)
Citations (569)
X Twitter Logo Streamline Icon: https://streamlinehq.com