TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes (1807.01544v2)

Published 4 Jul 2018 in cs.CV

Abstract: Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure.

PDF Abstract

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

The paper "TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes" by Shangbang Long et al. addresses the challenge of detecting text in scenes where text characters are arranged in arbitrary shapes. Traditional methods have primarily focused on representing text with standard geometric forms, such as rectangles or quadrangles, which fall short when encountering curved text. This paper introduces TextSnake, a novel representation that can accurately describe text in a variety of orientations and curvatures.

Core Contributions

The authors propose TextSnake, which conceptualizes text instances as sequences of overlapping disks, parameterized by position, radius, and orientation. This method enables the representation of curvilinear text with consistent fidelity, outperforming conventional geometries that fail with complex text shapes.

A key component of the TextSnake framework is the use of a Fully Convolutional Network (FCN) to estimate geometric attributes directly from the image data. This allows for efficient text detection processes across datasets emphasizing curved and multi-oriented text. The performance enhancements offered by this representation are demonstrated through extensive experimental results.

Experimental Evaluation

The authors validate the efficacy of TextSnake across several benchmarks, including Total-Text and SCUT-CTW1500, which focus on curved text, alongside more traditional datasets like ICDAR 2015 and MSRA-TD500. The results indicate that TextSnake achieves state-of-the-art performance, particularly on datasets that feature non-linear text forms. Notably, on Total-Text, TextSnake surpasses the baseline by over 40% in F-measure, highlighting its effectiveness in handling challenging text shapes.

Implications and Future Directions

TextSnake's flexible representation paves the way for improved scene text recognition systems. By providing a structured yet adaptable means to encapsulate text geometries, this approach enhances the detector's ability to handle variations in text shapes seen in natural scenes. This representation could also benefit other applications in computer vision where non-standard object shapes need accurate modeling.

Furthermore, future research could focus on integrating TextSnake with robust text recognition models, potentially leading to end-to-end architectures that accommodate arbitrary text layouts. This progression may also extend to real-world systems like autonomous vehicles or augmented reality platforms, where interpreting scene text in diverse settings is crucial.

Conclusion

The paper by Long et al. offers a significant advancement in text detection methodology with the introduction of TextSnake. By addressing the limitations of fixed geometric representations, the authors provide a versatile tool for handling text of various forms, enhancing the capacity and scope of scene text detection systems. The documented improvements across multiple benchmarks underscore TextSnake's potential to redefine text detection in realistic environments.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Shangbang Long (13 papers)
Jiaqiang Ruan (1 paper)
Wenjie Zhang (138 papers)
Xin He (135 papers)
Wenhao Wu (71 papers)
Cong Yao (70 papers)

Citations (493)

View on Semantic Scholar

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes (1807.01544v2)