Character Region Awareness for Text Detection (1904.01941v1)

Published 3 Apr 2019 in cs.CV

Abstract: Scene text detection methods based on neural networks have emerged recently and have shown promising results. Previous methods trained with rigid word-level bounding boxes exhibit limitations in representing the text region in an arbitrary shape. In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters. To overcome the lack of individual character level annotations, our proposed framework exploits both the given character-level annotations for synthetic images and the estimated character-level ground-truths for real images acquired by the learned interim model. In order to estimate affinity between characters, the network is trained with the newly proposed representation for affinity. Extensive experiments on six benchmarks, including the TotalText and CTW-1500 datasets which contain highly curved texts in natural images, demonstrate that our character-level text detection significantly outperforms the state-of-the-art detectors. According to the results, our proposed method guarantees high flexibility in detecting complicated scene text images, such as arbitrarily-oriented, curved, or deformed texts.

PDF Abstract

Character Region Awareness for Text Detection

The manuscript titled "Character Region Awareness for Text Detection" introduces a novel scene text detection framework, CRAFT, which adeptly detects text by focusing on character regions and the affinities between them. Traditional text detection methods, mainly relying on word-level bounding boxes, often struggle with variably curved, deformed, or elongated text forms. CRAFT addresses these challenges by proposing a character-level approach that enhances detection accuracy for complex text shapes.

Methodological Advancements

Character and Affinity Region Scores: CRAFT employs convolutional neural networks to predict two scores - the character region score and the affinity score. The character region score identifies individual characters, whereas the affinity score links characters to form coherent text instances. This dual-score system allows CRAFT to handle irregular text shapes more effectively than traditional methods that rely on rigid bounding boxes.
Weakly-Supervised Learning: Character-level annotations are typically sparse in existing datasets. To overcome this, the authors implement a weakly-supervised framework drawing on synthetic images with character-level annotations and estimated annotations for real images. This process includes using an interim model to generate character-level predictions from word-level annotated datasets.
Architecture and Network Design: CRAFT's architecture is based on a modified VGG-16 with batch normalization and skip connections reminiscent of U-net designs. This configuration enhances feature aggregation and improves localization performance.
Robust Post-Processing: The model employs a post-processing algorithm focused on region and affinity thresholds to extract bounding shapes without relying on Non-Maximum Suppression (NMS). Moreover, the system can generate bounding polygons for arbitrarily shaped text, further demonstrating its adaptability.

Experimental Validation

The empirical evaluation is comprehensive, spanning six benchmark datasets such as TotalText and CTW-1500, highlighting CRAFT's superiority over state-of-the-art text detectors. Notable findings include CRAFT's performance in detecting curved and oriented texts, where it consistently outperforms existing methods. Specific numerical results demonstrate substantial improvements in precision and recall across diverse datasets, signifying CRAFT's robustness and adaptability.

Implications and Future Directions

CRAFT's character-level detection capability offers significant implications for various text detection applications. Its ability to accurately detect and demarcate complex text shapes in natural scenes can benefit real-time applications like instant translation, image retrieval, and augmented reality. From a theoretical perspective, the attention to character and affinity regions offers a new paradigm in text detection, moving away from traditional word-centric models.

Future exploration could involve integrating recognition modules for end-to-end text spotting systems, potentially increasing accuracy and robustness in recognition tasks. Expanding datasets to have richer character-level annotations could also enhance the framework's performance in multi-lingual scenarios, especially considering scripts with cursive or non-segmented characters.

Conclusion

The CRAFT framework presents a compelling advancement in the field of text detection, primarily through its innovative focus on character regions and inter-character affinities. This methodological shift allows for enhanced detection flexibility and accuracy, especially for irregular text shapes. The paper positions CRAFT as a foundational technology that pushes the boundaries of current text detection capabilities and opens avenues for future advancements in AI-driven text recognition.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Youngmin Baek (7 papers)
Bado Lee (9 papers)
Dongyoon Han (49 papers)
Sangdoo Yun (71 papers)
Hwalsuk Lee (10 papers)

Citations (718)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/barrowjoseph/status/1902357798584717454

YouTube

Show All Videos