Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network (2002.10200v2)

Published 24 Feb 2020 in cs.CV
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

Abstract: Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet). Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on arbitrarily-shaped benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy. Code is available at https://tinyurl.com/AdelaiDet

Adaptive Bezier-Curve Network (ABCNet) for Scene Text Spotting

The paper "ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network" presents a novel approach to the challenging problem of scene text detection and recognition. This task is complicated by the diversity of text shapes, fonts, and sizes found in natural environments. The authors propose a method that directly addresses the limitations of existing character-based and segmentation-based approaches by leveraging the properties of Bezier curves, offering significant improvements in both speed and accuracy over previous methods.

Key Contributions

  1. Bezier Curve Representation: For the first time, the paper introduces the use of Bezier curves to parameterize arbitrarily-shaped text in scenes. This approach simplifies the detection task by reducing the need for complex processing pipelines typical in segmentation-based methods, leading to a more streamlined and efficient process. The method adapts cubic Bezier curves to model text boundaries, showcasing an empirical capability to handle the wide variety of text configurations encountered in the wild.
  2. BezierAlign Sampling: A novel BezierAlign layer is designed for accurate feature sampling of text instances, crucial for connecting the detection branch to the recognition branch. This method enables precise feature extraction, which is critical for maintaining high recognition accuracy while keeping computational overhead low.
  3. Efficiency and Accuracy: The proposed method introduces negligible computation overhead compared to standard bounding box detection, achieving a real-time performance level that is rarely seen in existing methods for this domain. The efficiency of ABCNet enables its deployment in real-world applications, addressing a key shortcoming of other contemporary approaches.

Experimental Evaluation

The authors validate ABCNet's performance on benchmark datasets for arbitrarily-shaped scene text, specifically Total-Text and CTW1500. The results are noteworthy:

  • Total-Text: ABCNet achieves state-of-the-art accuracy while being over ten times faster than the leading methods in the field, with a F-measure of 78.4% in multi-scale testing.
  • CTW1500: The method similarly outperforms previous approaches, demonstrating its robustness across different datasets.

The method's real-time capabilities are underscored by the processing speeds reported: 17.9 FPS in standard configurations, with a potential of up to 22.8 FPS in optimized settings.

Implications and Future Directions

The introduction of Bezier curves for scene text spotting represents a significant step forward in the field. By addressing the computational challenges associated with detecting and recognizing arbitrarily-shaped text, ABCNet paves the way for more responsive and adaptable AI systems capable of interpreting text in complex environments.

From a theoretical standpoint, the parameterization of text with Bezier curves could inspire further research into similar mathematical representations for other irregularly shaped data formats in computer vision. Practically, the system's efficacy and speed suggest potential applications in real-time translation devices, augmented reality interfaces, and autonomous systems requiring text interpretation capabilities.

Looking forward, developments could focus on expanding the adaptive capacities of the Bezier curve approach to accommodate languages with more complex character sets, as well as integrating the model into broader AI systems targeting comprehensive scene understanding tasks.

In sum, the paper presents a robust and innovative solution to the scene text spotting problem, marking a noteworthy advancement in computer vision methodologies and their applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuliang Liu (82 papers)
  2. Hao Chen (1005 papers)
  3. Chunhua Shen (404 papers)
  4. Tong He (124 papers)
  5. Lianwen Jin (116 papers)
  6. Liangwei Wang (11 papers)
Citations (295)