ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection (2004.04940v1)

Published 10 Apr 2020 in cs.CV

Abstract: Scene text detection has witnessed rapid development in recent years. However, there still exists two main challenges: 1) many methods suffer from false positives in their text representations; 2) the large scale variance of scene texts makes it hard for network to learn samples. In this paper, we propose the ContourNet, which effectively handles these two problems taking a further step toward accurate arbitrary-shaped text detection. At first, a scale-insensitive Adaptive Region Proposal Network (Adaptive-RPN) is proposed to generate text proposals by only focusing on the Intersection over Union (IoU) values between predicted and ground-truth bounding boxes. Then a novel Local Orthogonal Texture-aware Module (LOTM) models the local texture information of proposal features in two orthogonal directions and represents text region with a set of contour points. Considering that the strong unidirectional or weakly orthogonal activation is usually caused by the monotonous texture characteristic of false-positive patterns (e.g. streaks.), our method effectively suppresses these false positives by only outputting predictions with high response value in both orthogonal directions. This gives more accurate description of text regions. Extensive experiments on three challenging datasets (Total-Text, CTW1500 and ICDAR2015) verify that our method achieves the state-of-the-art performance. Code is available at https://github.com/wangyuxin87/ContourNet.

PDF Abstract

Analysis of "ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection"

The paper "ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection" presents an innovative methodology for addressing the persistent challenges inherent in scene text detection, particularly in contexts involving arbitrary-shaped text.

Core Contributions

ContourNet introduces a multi-faceted approach designed to enhance the precision of scene text detection by tackling two primary challenges: false positives (FPs) and the large scale variance of text. The proposed architecture is comprised of the following key components:

Adaptive Region Proposal Network (Adaptive-RPN): This component leverages Intersection over Union (IoU) values to generate text proposals without sensitivity to scale variations. By utilizing a set of pre-defined points rather than traditional bounding boxes, Adaptive-RPN seeks to accurately localize text regions while being invariant to text scale and shape.
Local Orthogonal Texture-aware Module (LOTM): LOTM models local texture information in two orthogonal directions, aiming to leverage traditional edge detection strategies to discern text characteristics from non-text regions. This module enhances text description accuracy by suppressing FPs characterized by unidirectional or weakly orthogonal texture responses.
Point Re-scoring Algorithm: Implemented during the testing phase, this algorithm filters predictions from LOTM, prioritizing those with high confidence levels across both orthogonal directions to further mitigate the impact of FPs.

Experimental Evaluation

The efficacy of ContourNet is validated through an extensive array of experiments on key datasets: Total-Text, CTW1500, and ICDAR2015, which include diverse text forms ranging from multi-oriented to curved shapes. The results demonstrate that ContourNet achieves state-of-the-art performance with an F-measure of 85.4% and 83.9% on Total-Text and CTW1500, respectively, without utilizing external training data. This signifies substantial improvement over previous methodologies, especially in terms of robust text region localization and FP reduction.

Implications and Future Work

The practical implications of this research extend to applications demanding high-precision, arbitrary-shaped text detection, such as automatic image captioning, augmented reality interfaces, and various document analysis tasks. The robustness of ContourNet in handling diverse and challenging text shapes indicates its potential adaptability to real-world conditions.

From a theoretical perspective, the integration of scale-invariant localization (via IoU-driven Adaptive-RPN) and orthogonal texture modeling (through LOTM) presents a meaningful advancement in scene text detection methodologies. This suggests a promising direction for future research, particularly the exploration of edge detection techniques within deep learning frameworks.

Conclusion

ContourNet represents an advanced framework for scene text detection, enriched by its emphasis on refining text localization under scale variation and mitigating FPs through enhanced texture modeling techniques. These contributions underscore the paper's significance within the field of computer vision, paving the way for ongoing research into adaptable, robust scene text detection solutions.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yuxin Wang (132 papers)
Hongtao Xie (48 papers)
Zhengjun Zha (24 papers)
Mengting Xing (2 papers)
Zilong Fu (1 paper)
Yongdong Zhang (119 papers)

Citations (171)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - wangyuxin87/ContourNet: A PyTorch implementation of "ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection" (CVPR2020) (226 stars)