Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shape Robust Text Detection with Progressive Scale Expansion Network (1903.12473v2)

Published 28 Mar 2019 in cs.CV
Shape Robust Text Detection with Progressive Scale Expansion Network

Abstract: Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate the texts with arbitrary shape. On the other hand, two text instances which are close to each other may lead to a false detection which covers both instances. Traditionally, the segmentation-based approach can relieve the first problem but usually fail to solve the second challenge. To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. More specifically, PSENet generates the different scale of kernels for each text instance, and gradually expands the minimal scale kernel to the text instance with the complete shape. Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances. Extensive experiments on CTW1500, Total-Text, ICDAR 2015 and ICDAR 2017 MLT validate the effectiveness of PSENet. Notably, on CTW1500, a dataset full of long curve texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure (82.2%) outperforms state-of-art algorithms by 6.6%. The code will be released in the future.

Overview of Shape Robust Text Detection with Progressive Scale Expansion Network

The paper introduces a novel approach for scene text detection called the Progressive Scale Expansion Network (PSENet). The authors propose this method to address two prevalent challenges in current text detection algorithms: handling arbitrary-shaped text and accurately distinguishing between closely positioned text instances. The method is built upon a segmentation-based framework and further augmented with a unique scale expansion algorithm, designed to progressively expand text kernels to detect complete text shapes.

Key Contributions

PSENet is structured to efficiently manage texts of arbitrary shapes, achieved through a series of innovative ideas:

  1. Progressive Scale Expansion Algorithm: Unlike traditional methods, PSENet dynamically expands detected text from smaller kernels to their full contours using a Breadth-First-Search (BFS) inspired strategy. This approach allows for more precise text segmentation, enhanced by the use of multiple segmentation maps that differentiate text instances based on their scale.
  2. Kernel-Based Framework: The method leverages the concept of kernels—multiple predicted segmentation areas of various scales—to address the overlapping or closely positioned text instances effectively. This strategic use circumvents common issues faced by conventional segmentation methods where text elements are incorrectly merged.
  3. Extensive Benchmarking: The authors validate PSENet's efficacy through rigorous experimentation across prominent datasets including CTW1500, Total-Text, ICDAR 2015, and ICDAR 2017 MLT. Notable results include achieving an F-measure of 82.2% on CTW1500 and demonstrating superior performance over existing models by significant margins.

Numerical Results

The performance of PSENet is highlighted by key metrics and comparisons:

  • On the CTW1500 dataset, focused on long curved texts, PSENet achieves an F-measure of 82.2% while operating at 27 FPS, outperforming other state-of-the-art algorithms by 6.6%.
  • The method also shows robust performance on the Total-Text dataset, achieving an F-measure of 80.9%.
  • PSENet manages to balance high precision and recall across multiple datasets and text orientations, further reinforcing its versatility and effectiveness.

Implications

The proposed PSENet has significant implications for the domain of scene text detection:

  • Practical Applications: By effectively detecting texts of arbitrary shapes, PSENet promises enhancements in applications such as autonomous driving and augmented reality, where text detection plays a crucial role.
  • Methodological Innovation: The introduction of the Progressive Scale Expansion algorithm could inspire new ways to address similar segmentation challenges in other computer vision tasks, potentially broadening the scope of applications beyond text detection.

Future Directions

The paper opens several avenues for future research:

  • End-to-End Training: While the current implementation involves separate steps for kernel generation and expansion, a potential area for development is achieving an end-to-end trainable solution, thereby improving efficiency and integration.
  • Broader Segmentation Applications: Beyond text detection, the progressive expansion concept could be adapted for more complex instance-level segmentation problems, especially where precision in crowded environments is critical.

In conclusion, PSENet offers a compelling advancement in the field of text detection through its unique approach to handling shape variability and proximity issues. Its success across various benchmarks underscores its potential, not only as a standalone text detection solution but also as a framework adaptable to other complex segmentation challenges.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Wenhai Wang (123 papers)
  2. Enze Xie (84 papers)
  3. Xiang Li (1002 papers)
  4. Wenbo Hou (5 papers)
  5. Tong Lu (85 papers)
  6. Gang Yu (114 papers)
  7. Shuai Shao (57 papers)
Citations (512)