Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation (1802.08948v2)

Published 25 Feb 2018 in cs.CV

Abstract: Previous deep learning based state-of-the-art scene text detection methods can be roughly classified into two categories. The first category treats scene text as a type of general objects and follows general object detection paradigm to localize scene text by regressing the text box locations, but troubled by the arbitrary-orientation and large aspect ratios of scene text. The second one segments text regions directly, but mostly needs complex post processing. In this paper, we present a method that combines the ideas of the two types of methods while avoiding their shortcomings. We propose to detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions. In inference stage, candidate boxes are generated by sampling and grouping corner points, which are further scored by segmentation maps and suppressed by NMS. Compared with previous methods, our method can handle long oriented text naturally and doesn't need complex post processing. The experiments on ICDAR2013, ICDAR2015, MSRA-TD500, MLT and COCO-Text demonstrate that the proposed algorithm achieves better or comparable results in both accuracy and efficiency. Based on VGG16, it achieves an F-measure of 84.3% on ICDAR2015 and 81.5% on MSRA-TD500.

PDF Abstract

Review of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation"

In the paper titled "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation," Lyu et al. introduces a novel approach to the problem of detecting text in natural scene images, addressing the issues faced by previous methodologies. The authors recognized the limitation of traditional object detection frameworks in dealing with arbitrary orientations and high aspect ratios of scene text. While segmentation-based methods require extensive post-processing, the proposed hybrid method combines the strengths of object detection and segmentation to overcome these challenges.

The authors propose a method that operates by localizing the corner points of text bounding boxes followed by segmenting the text regions in relative positions. During inference, candidate text boxes are generated from detected corner points, scored using segmentation maps, and refined through non-maximum suppression (NMS). This innovative approach allows for efficient handling of long, oriented text without complex post-processing. The evaluation of this method on several benchmarks—including ICDAR2013, ICDAR2015, MSRA-TD500, MLT, and COCO-Text—demonstrated that the proposed method achieves comparable or superior results in both accuracy and computational efficiency.

Key Contributions

Detection Framework: A detection framework that can naturally accommodate the arbitrary orientations of scene text by prioritizing the detection of corner points, which are invariant to rotation.
Segmentation Technique: Utilizes position-sensitive segmentation maps that are more adaptable to variations in text presentation—whether characters, words, or text lines.
Algorithmic Efficiency: Achieves precise text boundary detection without relying on conventional anchor-based bounding boxes or requiring laborious post-processing efforts.
Experimental Validation: Demonstrates significant performance improvement with an F-measure of 84.3% on ICDAR2015 and 81.5% on MSRA-TD500, confirming the efficacy of the methodology.

Implications

The implications of this research in both practical and theoretical domains are noteworthy. Practically, the proposed method provides a reliable and efficient solution for text detection in environments where text is often at arbitrary orientations and scales. This is particularly valuable in applications such as product identification in retail environments, navigation systems for autonomous vehicles, and automatic document analysis where scene complexity is a common challenge.

Theoretically, the approach encourages a reevaluation of conventional bounding-box-based detection strategies, prompting the exploration of corner-based detection paradigms in other domains such as general object detection and instance segmentation tasks.

Future Prospects

This method opens avenues for further explorations in the integration with end-to-end optical character recognition (OCR) systems, where it could be seamlessly paired with character recognition models, potentially in a unified framework. Moreover, advancements could involve the adaptation of the methodology to handle challenges associated with more complex scene text layouts, such as curved text or overlapping instances.

Conclusion

Overall, Lyu et al.'s work represents a significant contribution to the computer vision field, particularly in the specialized area of scene text detection. The method's practical efficacy paired with robust theoretical underpinnings provides a strong foundation for future advancements in creating adaptable and efficient text detection systems. This research not only addresses some of the critical limitations in existing methodologies but also sets a promising precedent for the exploration of hybrid approaches in visual recognition tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Pengyuan Lyu (19 papers)
Cong Yao (70 papers)
Wenhao Wu (71 papers)
Shuicheng Yan (275 papers)
Xiang Bai (221 papers)

Citations (314)

View on Semantic Scholar