Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shape Robust Text Detection with Progressive Scale Expansion Network (1806.02559v1)

Published 7 Jun 2018 in cs.CV

Abstract: The challenges of shape robust text detection lie in two aspects: 1) most existing quadrangular bounding box based detectors are difficult to locate texts with arbitrary shapes, which are hard to be enclosed perfectly in a rectangle; 2) most pixel-wise segmentation-based detectors may not separate the text instances that are very close to each other. To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance. These predictions correspond to different `kernels' produced by shrinking the original text instance into various scales. Consequently, the final detection can be conducted through our progressive scale expansion algorithm which gradually expands the kernels with minimal scales to the text instances with maximal and complete shapes. Due to the fact that there are large geometrical margins among these minimal kernels, our method is effective to distinguish the adjacent text instances and is robust to arbitrary shapes. The state-of-the-art results on ICDAR 2015 and ICDAR 2017 MLT benchmarks further confirm the great effectiveness of PSENet. Notably, PSENet outperforms the previous best record by absolute 6.37\% on the curve text dataset SCUT-CTW1500. Code will be available in https://github.com/whai362/PSENet.

Shape Robust Text Detection with Progressive Scale Expansion Network

The paper introduces the Progressive Scale Expansion Network (PSENet), a novel segmentation-based approach designed for text detection in natural scenes, particularly addressing the challenges posed by texts of arbitrary shapes. The authors detail two primary limitations of existing methods: the difficulties faced by quadrangular bounding box-based detectors in accurately enclosing non-rectangular texts, and the challenges segmentation-based detectors encounter in separating closely located text instances.

Methodology

PSENet is structured to overcome these limitations through a progressive scale expansion methodology. This involves generating multiple predictions for each text instance. By progressively expanding these predictions from smaller, kernel-like segments to full text shapes, PSENet effectively distinguishes closely packed text instances and adapts to various text geometries.

The key components of the approach include:

  • Multiple Scale Kernels: Instead of treating text instances as single segments, the method predicts multiple kernels by iteratively shrinking text boundaries to various scales. This enables precise control over text instance boundaries, allowing for the effective handling of texts with unusual shapes.
  • Progressive Expansion: Starting from the minimal kernel, each kernel's scale is progressively increased using a Breadth-First-Search (BFS) inspired algorithm. This gradual expansion process ensures that adjacent text boundaries are meticulously managed, preventing merging of distinct text instances.
  • Robustness and Effectiveness: By starting with minimal scale kernels, PSENet avoids the common pitfalls of merging text instances too early. Moreover, the progressive expansion aligns with smooth supervision for model learning, enhancing precision.

Results and Contributions

PSENet demonstrates state-of-the-art results on multiple benchmarks, including ICDAR 2015, ICDAR 2017 MLT, and SCUT-CTW1500, with particularly strong performance on datasets featuring curved text. Notably, on the SCUT-CTW1500 dataset, PSENet achieves an absolute improvement of 6.37% over previous best results. The architecture efficiently balances precision and recall, marking significant advancements over existing methods.

Implications and Future Directions

From a practical standpoint, PSENet's approach to text detection translates directly to improvements in applications requiring text recognition in varied environments, such as autonomous driving and augmented reality. Theoretically, this work suggests new directions for handling object detection problems involving irregularly shaped or densely packed instances.

Future research could explore end-to-end learning integration for the scale expansion process, potentially leading to performance enhancements and reduced computational costs. Additionally, adapting the progressive scale expansion methodology to other instance segmentation tasks may offer solutions to similar challenges in crowded object scenes.

In summary, PSENet represents a significant stride in the field of text detection, providing a robust framework for challenging scenarios involving complex text shapes and close proximities. The approach balances innovation with practicality, setting a precedent for future developments in both text and broad instance-level segmentation tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiang Li (1002 papers)
  2. Wenhai Wang (123 papers)
  3. Wenbo Hou (5 papers)
  4. Ruo-Ze Liu (7 papers)
  5. Tong Lu (85 papers)
  6. Jian Yang (503 papers)
Citations (578)