Multi-Oriented Text Detection with Fully Convolutional Networks (1604.04018v2)

Published 14 Apr 2016 in cs.CV

Abstract: In this paper, we propose a novel approach for text detec- tion in natural images. Both local and global cues are taken into account for localizing text lines in a coarse-to-fine pro- cedure. First, a Fully Convolutional Network (FCN) model is trained to predict the salient map of text regions in a holistic manner. Then, text line hypotheses are estimated by combining the salient map and character components. Fi- nally, another FCN classifier is used to predict the centroid of each character, in order to remove the false hypotheses. The framework is general for handling text in multiple ori- entations, languages and fonts. The proposed method con- sistently achieves the state-of-the-art performance on three text detection benchmarks: MSRA-TD500, ICDAR2015 and ICDAR2013.

PDF Abstract

Multi-Oriented Text Detection with Fully Convolutional Networks: An Expert Overview

The paper "Multi-Oriented Text Detection with Fully Convolutional Networks" presents a method for detecting text in natural images, focusing on multi-oriented, multilingual, and various font texts. It introduces a multi-step approach that uniquely integrates local and global image features to enhance text detection accuracy.

Methodology

The proposed framework leverages Fully Convolutional Networks (FCNs) to generate text salient maps and process text line candidates efficiently. The approach is divided into several key stages:

Text Salient Map Generation: The method starts by training an FCN to predict salient maps of text regions. This network processes images holistically, capturing both local and global text features to generate pixel-wise text/non-text predictions efficiently.
Text Line Hypothesis Generation: Utilizing the salient map, the text line hypotheses are created by combining them with character components extracted through MSER. This process crucially accounts for various orientations, which traditional methods often neglect, thus restricting their capability to mostly horizontal texts.
Orientation Estimation and Candidate Extraction: A projection-based method is employed to determine the text orientation from character components within detected text blocks, enhancing accuracy by aligning the candidates with the actual text direction.
False Hypotheses Filtering: An additional FCN is trained to predict character centroids, refining the text line candidates by removing false detections. This step ensures that the text detection is both accurate and applicable to texts with diverse orientations and scales.

Experimental Results

The effectiveness of the proposed method is validated across multiple benchmark datasets, including MSRA-TD500, ICDAR2015, and ICDAR2013. Results indicate superior performance in both precision and recall when compared to existing approaches, showcasing the framework's adaptability to both horizontal and multi-oriented text detection.

MSRA-TD500: Achieving a precision of 0.83 and recall of 0.67, the proposed method demonstrated robust handling of varied orientations, providing a clear improvement over previous methodologies.
ICDAR Datasets: The framework performs well across ICDAR benchmarks, maintaining high levels of accuracy, particularly in challenging incidental text detection scenarios.

Contributions and Implications

This research contributes significantly to the field by introducing a holistic framework that integrates both local character components and global region-based approaches for improved text detection. The use of FCNs for semantic region labeling and character centroid prediction provides a more robust architecture that scales well to diverse orientations and complexities in natural scenes.

Furthermore, the separation of text block detection and text line candidate generation utilizing MSER and orientation estimation underscores a methodological advancement in scene text detection, extending applicability to real-world applications where text appears in arbitrary orientations.

Limitations and Future Directions

While the method exhibits superior performance, challenges remain in accurately detecting highly curved texts and maintaining real-time processing speeds. Additionally, improvement opportunities exist in handling scenes with significant clutter or reflections. Future advancements could focus on integrating more sophisticated learning techniques for component extraction and exploring end-to-end architectures for combining text detection with recognition tasks.

Overall, this approach presents a valuable contribution to text detection methodologies, paving the way for more comprehensive and adaptable scene text recognition systems.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Zheng Zhang (486 papers)
Chengquan Zhang (29 papers)
Wei Shen (181 papers)
Cong Yao (70 papers)
Wenyu Liu (146 papers)
Xiang Bai (221 papers)

Citations (514)

View on Semantic Scholar