Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rotation-Sensitive Regression for Oriented Scene Text Detection (1803.05265v1)

Published 14 Mar 2018 in cs.CV

Abstract: Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text presence detection, which is a classification problem disregarding text orientation; 2) oriented bounding box regression, which concerns about text orientation. Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks. To address this issue, we propose to perform classification and regression on features of different characteristics, extracted by two network branches of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotation-sensitive features. The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on three oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17 and COCO-Text. Furthermore, RRD achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.

Rotation-Sensitive Regression for Oriented Scene Text Detection

This paper presents a novel method for detecting oriented scene text, titled Rotation-sensitive Regression Detector (RRD). It addresses the challenge of accurately detecting text of arbitrary orientations in natural images, using oriented bounding boxes instead of the conventional horizontal ones. The method stands out by employing distinct feature extraction paths for classification and regression, a departure from the shared features approach commonly used in previous works, which often suffers from the conflicting requirements of rotation variance in regression and invariance in classification.

Methodology

RRD separates text detection into two distinct tasks:

  1. Text Presence Detection: This involves classifying areas of an image as containing text, regardless of orientation, using rotation-invariant features.
  2. Oriented Bounding Box Regression: This task concerns determining the orientation of text for precise localization using rotation-sensitive features.

The innovation lies in applying convolution differently in these tasks. The regression branch uses rotation-sensitive features by actively rotating convolutional filters, which enhances orientation awareness. The classification branch, in contrast, pools the features to maintain rotation invariance.

The architecture is inspired by SSD (Single Shot MultiBox Detector), with a backbone network based on VGG16. Rotation-sensitive regression is achieved using oriented response convolution (from the ORN framework) that produces multiple orientation responses, capturing orientation-sensitive features. The rotation-invariant features for classification are obtained via an oriented response pooling layer. The RRD is further enhanced by an inception block with three-scale convolutional kernels that provide flexible receptive fields better suited for elongated text detection.

Results

The RRD demonstrates state-of-the-art performance on several benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17, COCO-Text, and an additional dataset comprising oriented ship images. Key results include:

  • On the MSRA-TD500 dataset, RRD achieves an F-measure of 0.79, surpassing previous methods.
  • On the RCTW-17 dataset, RRD records a precision of 0.77 and an F-measure of 0.67, showing substantial improvement over the baseline methods.
  • The exceptional performance on a non-text dataset, HRSC2016, which involves rotated object detection (ships), indicates the method's generality and applicability beyond text.

Implications and Future Directions

The RRD approach effectively handles the inherent tension between orientation-sensitive regression and orientation-invariant classification. This can potentially be adapted to other tasks involving oriented object detection. Its generalized architecture could be embedded into various existing detection frameworks, enhancing their performance without significant computational claims.

The paper suggests exploring stronger rotation-sensitive and rotation-invariant features for improved oriented object detection. The conceptual simplicity yet efficacy of RRD presents a promising direction for further research, especially in enhancing the recognition of long and thin text lines characteristic of non-Latin scripts.

In conclusion, the RRD offers a robust framework for oriented scene text detection, while its extension to other types of rotated object detection marks a promising avenue for future exploration in computer vision applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Minghui Liao (29 papers)
  2. Zhen Zhu (64 papers)
  3. Baoguang Shi (15 papers)
  4. Xiang Bai (221 papers)
  5. Gui-Song Xia (139 papers)
Citations (435)