R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection (1706.09579v2)

Published 29 Jun 2017 in cs.CV

Abstract: In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.

PDF Abstract

Overview of Rotational Region CNN for Orientation Robust Scene Text Detection

The paper introduces a novel framework, Rotational Region CNN (R CNN), designed to enhance the detection of arbitrary-oriented text in natural scene images. This approach is based on the Faster R-CNN architecture and focuses on addressing the challenges posed by varying text orientations in natural scenes, which traditional horizontal text detectors often fail to capture accurately.

Methodology

The proposed R CNN framework incorporates several key innovations:

Region Proposal Network (RPN): The RPN is utilized to propose axis-aligned bounding boxes that enclose the text regions. This step is foundational, allowing the system to address arbitrary orientations by generating candidates from convolutional feature maps.
Pooled Feature Extraction: For each text box proposed by the RPN, the method uses multiple ROIPoolings with varied pooled sizes (7x7, 11x3, 3x11). The extracted features are concatenated to facilitate better prediction of text presence, axis-aligned boxes, and inclined minimum area boxes.
Multi-task Problem Formulation: Text detection is treated as a multi-task problem. The approach simultaneously predicts text/non-text scores, axis-aligned boxes, and inclined boxes for each proposal. This formulation allows comprehensive modeling of text annotations in images.
Inclined Non-Maximum Suppression (NMS): By employing an inclined NMS, the framework overcomes limitations associated with traditional NMS, especially in handling inclined texts that are more likely to be missed due to high IoU in axis-aligned boxes.

Results and Performance

The R CNN framework demonstrated strong performance on two key benchmarks:

ICDAR 2015: The system achieved an F-measure of 82.54%, with a recall of 79.68% and precision of 85.62%.
ICDAR 2013: On this dataset, optimized for horizontal text detection, the method achieved an F-measure of 87.73%.

The experiments discussed include various configurations, evaluating the impact of anchor scales, NMS strategies, and the effect of different pooled sizes. The results consistently showed improvements over the Faster R-CNN baseline, particularly with the use of smaller anchor scales and inclined NMS.

Implications and Future Directions

This research contributes significantly to the domain of scene text detection by providing a robust solution for arbitrary orientations. The methodological advancements enable better handling of real-world challenges posed by text in varying orientations and styles. The R CNN framework's adaptability suggests potential refinements and integrations with other detection frameworks like SSD and YOLO.

Future explorations could focus on enhancing the model's adaptability to complex text scenarios and integrating it with recognition modules for end-to-end text reading solutions in natural scenes. The approach's flexibility in accommodating different text orientations and sizes makes it a valuable reference point for ongoing research in scene text detection and related fields.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Yingying Jiang (10 papers)
Xiangyu Zhu (85 papers)
Xiaobing Wang (11 papers)
Shuli Yang (3 papers)
Wei Li (1121 papers)
Hua Wang (199 papers)
Pei Fu (14 papers)
Zhenbo Luo (9 papers)

Citations (515)

View on Semantic Scholar

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection (1706.09579v2)

Overview of Rotational Region CNN for Orientation Robust Scene Text Detection

Methodology

Results and Performance

Implications and Future Directions

Related Papers