Papers
Topics
Authors
Recent
Search
2000 character limit reached

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Published 4 Jul 2018 in cs.CV | (1807.01544v2)

Abstract: Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure.

Citations (493)

Summary

  • The paper presents TextSnake, a novel method that represents text as overlapping disks to capture arbitrary shapes.
  • It employs a Fully Convolutional Network to estimate geometric attributes directly, enabling robust detection of curved and multi-oriented text.
  • Experimental results show a significant improvement, with a 40% increase in F-measure on benchmarks like Total-Text.

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

The paper "TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes" by Shangbang Long et al. addresses the challenge of detecting text in scenes where text characters are arranged in arbitrary shapes. Traditional methods have primarily focused on representing text with standard geometric forms, such as rectangles or quadrangles, which fall short when encountering curved text. This paper introduces TextSnake, a novel representation that can accurately describe text in a variety of orientations and curvatures.

Core Contributions

The authors propose TextSnake, which conceptualizes text instances as sequences of overlapping disks, parameterized by position, radius, and orientation. This method enables the representation of curvilinear text with consistent fidelity, outperforming conventional geometries that fail with complex text shapes.

A key component of the TextSnake framework is the use of a Fully Convolutional Network (FCN) to estimate geometric attributes directly from the image data. This allows for efficient text detection processes across datasets emphasizing curved and multi-oriented text. The performance enhancements offered by this representation are demonstrated through extensive experimental results.

Experimental Evaluation

The authors validate the efficacy of TextSnake across several benchmarks, including Total-Text and SCUT-CTW1500, which focus on curved text, alongside more traditional datasets like ICDAR 2015 and MSRA-TD500. The results indicate that TextSnake achieves state-of-the-art performance, particularly on datasets that feature non-linear text forms. Notably, on Total-Text, TextSnake surpasses the baseline by over 40% in F-measure, highlighting its effectiveness in handling challenging text shapes.

Implications and Future Directions

TextSnake's flexible representation paves the way for improved scene text recognition systems. By providing a structured yet adaptable means to encapsulate text geometries, this approach enhances the detector's ability to handle variations in text shapes seen in natural scenes. This representation could also benefit other applications in computer vision where non-standard object shapes need accurate modeling.

Furthermore, future research could focus on integrating TextSnake with robust text recognition models, potentially leading to end-to-end architectures that accommodate arbitrary text layouts. This progression may also extend to real-world systems like autonomous vehicles or augmented reality platforms, where interpreting scene text in diverse settings is crucial.

Conclusion

The paper by Long et al. offers a significant advancement in text detection methodology with the introduction of TextSnake. By addressing the limitations of fixed geometric representations, the authors provide a versatile tool for handling text of various forms, enhancing the capacity and scope of scene text detection systems. The documented improvements across multiple benchmarks underscore TextSnake's potential to redefine text detection in realistic environments.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.