Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection (2003.07493v2)

Published 17 Mar 2020 in cs.CV

Abstract: Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.

Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

The paper presents a novel Deep Relational Reasoning Graph Network (DRRG) designed explicitly for the detection of text with arbitrary shapes in complex scene images. The outlined methodology leverages graph convolutional networks (GCNs) to enhance the performance of text detection models by integrating relational reasoning over local graphs of text components. This approach aims to overcome limitations observed in prior methods, particularly those relying on traditional Convolutional Neural Networks (CNNs) which struggle to accurately detect non-Euclidean data like text with irregular shapes.

The proposed methodology begins with a text proposal model built on CNNs that predicts geometric attributes for each detected text component, such as height, width, and orientation. These components serve as nodes within local graphs which facilitate relational deduction using graph-based networks. The inclusion of GCNs allows the model to effectively infer linkages between text components, thus improving the accuracy in grouping components into coherent text instances.

One of the primary technical innovations in the paper is its local graph construction model, which aids in refining the connections between different text components based on their geometric characteristics. As a result, this model enhances the relational reasoning capability of the DRRG network, providing a framework for assessing linkage likelihoods between adjacent components, which is particularly beneficial for texts with irregular or curved shapes. The method yields state-of-the-art performance on multiple public datasets, demonstrating its effectiveness in dealing with the complexities inherent in arbitrary shape text detection tasks.

The experimental results reported in the paper underscore the efficacy of the DRRG network, particularly in its precision and recall metrics across various challenging datasets, including Total-Text, CTW-1500, and MSRA-TD500. These datasets contain diverse text shapes and orientations, effectively testing the robustness of the proposed method. The integration of GCNs shows significant improvements over baseline models, particularly in datasets populated with long and curved text instances, where CNN-based methods often falter.

The implications of this research are multifaceted. Practically, DRRG can be applied to numerous domains requiring robust text detection, such as augmented reality applications, automated document analysis, and real-time video processing. Theoretically, the paper positions relational reasoning over graphs as a powerful tool for advancing the capabilities of AI systems tasked with complex perceptual challenges. Future developments may explore broader applications of GCNs beyond text detection, potentially integrating these techniques into systems that require intricate component aggregation and reasoning.

In conclusion, the DRRG proposed in this paper represents a significant advancement in the detection of arbitrary shape texts, providing an effective solution to overcome challenges experienced by existing methods reliant on CNNs. This novel integration of GCNs opens pathways for future research into relational reasoning applications within AI, driving progress across a spectrum of cognitive and perceptual tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hongfa Wang (29 papers)
  2. Xu-Cheng Yin (35 papers)
  3. Shi-Xue Zhang (12 papers)
  4. Xiaobin Zhu (21 papers)
  5. Jie-Bo Hou (4 papers)
  6. Chang Liu (863 papers)
  7. Chun Yang (45 papers)
Citations (171)