Papers
Topics
Authors
Recent
2000 character limit reached

SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models

Published 20 Jul 2021 in cs.CV | (2107.09313v1)

Abstract: For successful scene text recognition (STR) models, synthetic text image generators have alleviated the lack of annotated text images from the real world. Specifically, they generate multiple text images with diverse backgrounds, font styles, and text shapes and enable STR models to learn visual patterns that might not be accessible from manually annotated data. In this paper, we introduce a new synthetic text image generator, SynthTIGER, by analyzing techniques used for text image synthesis and integrating effective ones under a single algorithm. Moreover, we propose two techniques that alleviate the long-tail problem in length and character distributions of training data. In our experiments, SynthTIGER achieves better STR performance than the combination of synthetic datasets, MJSynth (MJ) and SynthText (ST). Our ablation study demonstrates the benefits of using sub-components of SynthTIGER and the guideline on generating synthetic text images for STR models. Our implementation is publicly available at https://github.com/clovaai/synthtiger.

Citations (45)

Summary

  • The paper introduces SynthTIGER, which integrates advanced text rendering and data augmentation to significantly improve scene text recognition accuracy.
  • It employs a five-step methodology—text shape, style, transformation, blending, and post-processing—to realistically simulate diverse text appearances.
  • Experimental evaluations show that SynthTIGER outperforms existing datasets like MJ and ST, enhancing cross-lingual and cross-domain STR applications.

SynthTIGER: A Novel Framework for Synthetic Text Image Generation

The paper "SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models" presents an advanced approach to generating synthetic text images aimed at enhancing scene text recognition (STR) models. The introduction of SynthTIGER seeks to bridge the gap between existing methods by integrating advantageous elements of prior techniques, MJ and ST, into a singular, efficient algorithm.

Methodology

SynthTIGER advances on two fronts: text rendering and distribution control. The rendering process is meticulously designed with five core components that capture the nuances of text appearance in the real world: Text Shape Selection, Text Style Selection, Transformation, Blending, and Post-processing. Each component is engineered to simulate real-world complexities:

  1. Text Shape and Style: Through versatile font and style choices, SynthTIGER accurately mimics the diversity of real-world text images by manipulating color, texture, and visual effects.
  2. Blending and Transformation: The integration of background textures and character transformations accentuates the realism of generated word box images, which is later refined through realistic noise additions in post-processing.

Moreover, SynthTIGER tackles the long-tail problem in training data distributions by employing two strategic augmentations: text length and character distribution. These augmentations ensure a balanced representation of text variations, addressing deficiencies in current datasets.

Experimental Evaluation

The researchers conducted extensive experiments comparing SynthTIGER against existing datasets like MJ and ST. Notable findings include:

  • Superior STR Performance: SynthTIGER achieved higher accuracy on standard benchmarks than when using individual or combined MJ and ST datasets, demonstrating the model's robustness and potential utility in diverse applications.
  • Rendering Function Insights: Ablation studies revealed the critical role of text rendering components, confirming that each contributes uniquely to the overall performance, particularly texture blending and transformation processes.
  • Real-world Applicability: The experiments affirm SynthTIGER's adaptability, as results on Japanese text inputs and document datasets also showed performance improvements, implying cross-linguistic and cross-domain applicability.

Implications and Future Directions

The findings underscore the importance of high-quality synthetic data and its impact on STR model efficacy. The adaptation of realistic text-noise interactions proves particularly valuable for improving model robustness. Practically, SynthTIGER provides a framework that may substantially reduce reliance on manually annotated datasets, which are resource-intensive to produce.

Theoretically, the integration of advanced blending and noise-injection techniques could pave the way for future research into more sophisticated synthetic data generation strategies, potentially extending beyond text recognition into other domains like augmented reality and virtual environments.

Additionally, the open-source nature of the implementation encourages further research and development within the OCR community, fostering innovation and collaboration.

The paper offers a significant contribution to the field, outlining a clear path for improving the quality and diversity of training data for STR models, with implications that resonate across multiple areas of artificial intelligence.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.