Detecting Curve Text in the Wild: New Dataset and New Solution
The paper "Detecting Curve Text in the Wild: New Dataset and New Solution" presents a significant contribution to the field of scene text detection by introducing a novel approach specifically designed for curve text. The authors address the limitations of existing datasets and methods, which primarily focus on axis-aligned or quadrilateral text regions, by proposing a polygon-based technique and a new dataset named CTW1500.
Dataset and Methodology
CTW1500 is specifically constructed to handle curve text, containing over 10,000 text annotations across 1,500 images. This dataset distinguishes itself through its focus on curve text, a common real-world occurrence that existing datasets inadequately address. The labels utilize a 14-point polygonal annotation system, providing flexibility and precision over traditional bounding boxes.
The proposed Curve Text Detector (CTD) leverages this new dataset, introducing a novel method capable of directly detecting curve text without reliance on empirical combination methods. The approach integrates a recurrent transverse and longitudinal offset connection (TLOC), enhancing the detector's ability to learn context and spatial relationships among the annotated points. This RNN-based connection facilitates more accurate and smooth localization of curve text regions.
Strong Results and Innovative Techniques
Experimental results on CTW1500 reflect the CTD's ability to outperform state-of-the-art methods by a substantial margin, notably with a lightweight backbone such as a reduced ResNet-50. Specifically, the combination of CTD with TLOC excels in both curve and non-curve text subsets, indicating robustness and versatility. Additionally, the introduction of post-processing techniques like non-polygon suppression (NPS) and polygonal non-maximum suppression (PNMS) further refines detection accuracy, reducing false positives and enhancing generalization.
Implications and Future Work
The research presented in this paper holds both practical and theoretical significance. Practically, it provides a robust solution for various applications requiring accurate scene text detection, such as real-time translation and autonomous systems. Theoretically, it suggests a paradigm shift in scene text detection, encouraging further exploration into polygonal-based systems.
Future developments may focus on expanding CTW1500 into a comprehensive recognition dataset, as suggested by the authors, given its current annotation methodology. Moreover, the exploration of detection methods balancing speed and flexibility could further refine the capabilities of curve text detection.
In summary, this work is a valuable addition to the field, providing a novel dataset and methodological framework that addresses an unmet need in detecting curve text in dynamic environments.