TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (1812.01393v2)

Published 4 Dec 2018 in cs.CV

Abstract: Scene text detection is an important step of scene text reading system. The main challenges lie on significantly varied sizes and aspect ratios, arbitrary orientations and shapes. Driven by recent progress in deep learning, impressive performances have been achieved for multi-oriented text detection. Yet, the performance drops dramatically in detecting curved texts due to the limited text representation (e.g., horizontal bounding boxes, rotated rectangles, or quadrilaterals). It is of great interest to detect curved texts, which are actually very common in natural scenes. In this paper, we present a novel text detector named TextField for detecting irregular scene texts. Specifically, we learn a direction field pointing away from the nearest text boundary to each text point. This direction field is represented by an image of two-dimensional vectors and learned via a fully convolutional neural network. It encodes both binary text mask and direction information used to separate adjacent text instances, which is challenging for classical segmentation-based approaches. Based on the learned direction field, we apply a simple yet effective morphological-based post-processing to achieve the final detection. Experimental results show that the proposed TextField outperforms the state-of-the-art methods by a large margin (28% and 8%) on two curved text datasets: Total-Text and CTW1500, respectively, and also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500. Furthermore, TextField is robust in generalizing to unseen datasets. The code is available at https://github.com/YukangWang/TextField.

PDF Abstract

Overview of "TextField: Learning A Deep Direction Field for Irregular Scene Text Detection"

This paper introduces TextField, a novel approach for detecting irregular scene texts by leveraging a learned deep direction field. The detection of scene text is pivotal for numerous applications such as product search, scene understanding, and autonomous driving. However, the presence of varied text orientations, shapes, sizes, and aspect ratios — especially curved text commonly found in natural scenes — poses significant challenges. Traditional methods and existing multi-oriented text detection frameworks suffer reduced performance when tasked with identifying these irregular texts, largely due to their adoption of simplistic text representations (horizontal bounding boxes, rotated rectangles, or quadrilaterals).

TextField Methodology

TextField addresses these limitations by introducing a new representation method: a direction field that points away from the nearest text boundary at each text point. This field is encoded as a two-dimensional vector image derived through a fully convolutional neural network (FCN). This representation captures both the binary text mask and directional information, thereby aiding substantially in separating adjacent text instances. The directional information is used in a morphological post-processing step to achieve refined text detection.

Architecture and Training: The network architecture is built on VGG16, pre-trained on ImageNet, and employs a multi-level feature fusion strategy to capture multi-scale text instances. The framework includes specialized enhancements in the training loss function and employs an online hard negative mining strategy to address foreground-background imbalances.

Experimental Results

Empirical evaluations on datasets such as Total-Text and SCUT-CTW1500 demonstrate that TextField outperforms state-of-the-art methods with substantial margins (28% F-measure improvement on Total-Text and 8% on SCUT-CTW1500). These datasets focus on curved and irregular textforms, where TextField establishes a new benchmark. Additionally, evaluations on multi-oriented text datasets such as ICDAR2015 and MSRA-TD500 show that TextField achieves competitive performance, underscoring its versatility and robustness. Cross-dataset evaluations further portray TextField's strong generalization capability to unseen data without requiring re-training on the target dataset.

Implications and Future Directions

The implications of TextField's robust performance are both practical and theoretical. Practically, TextField's ability to generalize well and detect irregular texts with accurate instance boundary delineation makes it suitable for direct integration into text recognition systems used in various applications. Theoretically, the integration of directional fields in text detection architectures demonstrates the potential of enhancing segmentation-based approaches for complex granular differentiation within datasets characterized by irregular features.

Moving forward, improvements might target the text superpixel grouping process, possibly by learning text center lines, and tackling current limitations, such as object occlusion and large character spacing. Addressing these challenges will enhance the general adoptability and precision of TextField across diverse environments and conditions.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yongchao Xu (43 papers)
Yukang Wang (5 papers)
Wei Zhou (308 papers)
Yongpan Wang (13 papers)
Zhibo Yang (43 papers)
Xiang Bai (221 papers)

Citations (244)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - YukangWang/TextField: TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019) (100 stars)