Rethinking Irregular Scene Text Recognition (1908.11834v2)

Published 30 Aug 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Reading text from natural images is challenging due to the great variety in text font, color, size, complex background and etc.. The perspective distortion and non-linear spatial arrangement of characters make it further difficult. While rectification based method is intuitively grounded and has pushed the envelope by far, its potential is far from being well exploited. In this paper, we present a bag of tricks that prove to significantly improve the performance of rectification based method. On curved text dataset, our method achieves an accuracy of 89.6% on CUTE-80 and 76.3% on Total-Text, an improvement over previous state-of-the-art by 6.3% and 14.7% respectively. Furthermore, our combination of tricks helps us win the ICDAR 2019 Arbitrary-Shaped Text Challenge (Latin script), achieving an accuracy of 74.3% on the held-out test set. We release our code as well as data samples for further exploration at https://github.com/Jyouhou/ICDAR2019-ArT-Recognition-Alchemy

PDF Abstract

Overview of the Paper "Rethinking Irregular Scene Text Recognition"

This paper addresses the complex problem of recognizing irregular text in natural scenes, a task that is increasingly important due to diverse applications including instant translation and robotic navigation. The authors provide a comprehensive examination of techniques for enhancing the performance of rectification-based methods in irregular scene text recognition. The paper succeeds in making significant performance improvements over existing state-of-the-art methods.

Key Contributions

The authors present a series of modifications and enhancements, guided by the hypothesis that existing text recognition methods are hampered by the inadequate handling of irregular text shapes, such as curved text. Key contributions of the paper include:

Dataset Augmentation: A novel approach to generating synthetic curved text is proposed, leading to the creation of the CurvedSynth dataset. This dataset significantly outperforms previous synthetic datasets like SynthText and Synth90K, particularly on datasets with curved text (e.g., CUTE80, Total-Text, and IC19-ArT).
Input Preprocessing: The authors introduce "squarization" to maintain aspect ratios during preprocessing, an approach that, combined with random rotations during training, provides improvements particularly for irregular text datasets.
Robust Model Modifications: The paper explores the efficacy of performing rectification at both image and feature levels. The results are mixed but offer insights into potential avenues for developing more robust recognizers.
Evaluation on New Datasets: The introduction of RectTotal, a rectified dataset using TextSnake, provides a new testing ground that helps illustrate the potential of rectification in preprocessing steps rather than during recognition.
Comprehensive Experimental Study: The paper includes extensive comparisons of synthetic and real-world data integration. The weighted inclusion of real-world data (at 15%) alongside synthetic data offers compelling improvements across several benchmarks.

Numerical Results

The numerical evidence supports the effectiveness of these techniques, with the proposed methods achieving an accuracy of 89.6% on CUTE80—improving by 6.3% over previous best results—and 76.3% on Total-Text, a remarkable 14.7% increase. The ensemble approach employed by the authors in the ICDAR 2019 Arbitrary-Shaped Text Challenge (Latin script) yielded a final accuracy of 74.3% on the held-out test set, underscoring the robustness of their methods.

Implications and Future Work

The implications of this research are significant both in practice and in the theoretical landscape of scene text recognition. The paper highlights the potential of synthesized and real data combinatory strategies in yielding robust recognition systems capable of handling both regular and irregular text. The insights derived from squarization and the analysis of input dimensions open new avenues for future explorations in text recognition, inviting further exploration into adaptive input resizing.

The introduction of RectTotal provides a useful benchmark for evaluating text recognition models under rectified conditions. More broadly, the exploration underscores the potential of leveraging robust detection systems that can preemptively rectify text irregularities.

The conclusions drawn warrant ongoing research into not only innovative data generation but also into evolving recognition architectures that can seamlessly handle varying text shapes without the computational overhead of exhaustive rectification—potentially a promising direction for low-resource deployment.

In summary, this paper presents a meticulous investigation into the challenges of irregular scene text recognition, making noteworthy advances that are likely to inform future research and practice in the domain.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Shangbang Long (13 papers)
Yushuo Guan (8 papers)
Bingxuan Wang (10 papers)
Kaigui Bian (44 papers)
Cong Yao (70 papers)

Citations (8)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - Jyouhou/ICDAR2019-ArT-Recognition-Alchemy: PKU Team Zero's code for participation in ICDAR2019 ArT Recognition track (Champion) (222 stars)