Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection (1703.01425v1)

Published 4 Mar 2017 in cs.CV

Abstract: Detecting incidental scene text is a challenging task because of multi-orientation, perspective distortion, and variation of text size, color and scale. Retrospective research has only focused on using rectangular bounding box or horizontal sliding window to localize text, which may result in redundant background noise, unnecessary overlap or even information loss. To address these issues, we propose a new Convolutional Neural Networks (CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle. First, we use quadrilateral sliding windows in several specific intermediate convolutional layers to roughly recall the text with higher overlapping area and then a shared Monte-Carlo method is proposed for fast and accurate computing of the polygonal areas. After that, we designed a sequential protocol for relative regression which can exactly predict text with compact quadrangle. Moreover, a auxiliary smooth Ln loss is also proposed for further regressing the position of text, which has better overall performance than L2 loss and smooth L1 loss in terms of robustness and stability. The effectiveness of our approach is evaluated on a public word-level, multi-oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 "Incidental scene text localization". The performance of our method is evaluated by using F-measure and found to be 70.64%, outperforming the existing state-of-the-art method with F-measure 63.76%.

PDF Abstract

Deep Matching Prior Network: Multi-oriented Text Detection

The paper presents the Deep Matching Prior Network (DMPNet), a novel approach aimed at enhancing the localization accuracy of incidental scene text detection. This task is inherently complex due to challenges posed by text orientations, distortions, and variations in scale, size, and color. Traditional methods rely heavily on rectangular bounding boxes or horizontal sliding windows, which often result in background noise, overlaps, and potentially significant information loss. DMPNet addresses these limitations through several key innovations.

DMPNet introduces a methodology that leverages Convolutional Neural Networks (CNNs) to detect text using quadrilateral bounding boxes, differing from more conventional methods that use rectangular constraints. The core of the method involves initially employing quadrilateral sliding windows across specific intermediate convolutional layers, which better recall text regions with higher area overlap when compared to rectangular counterparts. The subsequent application of a shared Monte-Carlo method is proposed for efficient polygonal area computation, which enhances both speed and precision.

For precise localization, the paper describes a sequential protocol aimed at relative regression to accurately predict text through compact quadrangles. This involves a novel smooth $L_n$ loss function, proposed to enhance the robustness and stability of positioning text, outperforming traditional $L_2$ and smooth $L_1$ losses.

The experiments conducted on the ICDAR 2015 Robust Reading Competition Challenge 4 dataset underscore DMPNet's effectiveness. A notable improvement in F-measure is reported at 70.64% compared to previous state-of-the-art results at 63.76%. Such results underline the model's capability to better detect multi-oriented text and reduce false positives due to less inclusion of background noise in detections.

Implications

The development of DMPNet holds significant implications in the broader field of computer vision and applied AI sectors, notably in systems requiring precise text recognition under challenging conditions—such as autonomous vehicles, visual assistance aids, and multilingual translation systems. The deployment of quadrilateral sliding windows based on prior knowledge showcases an important step toward adaptive shape recognition, suggesting potential improvements to object detection models.

Speculation on Future Directions

Further development of DMPNet could explore automated shape optimization for sliding windows, reducing the necessity for manual designing and potentially enhancing detection recall. Additionally, advancing shared computational methods for complex polygonal regions could greatly benefit real-time application scenarios. As AI systems evolve to process more unstructured and distorted inputs in natural environments, methods like DMPNet will likely see increased integration into commercial products and more generalized object detection systems. These findings also encourage the broader adoption of alternative labeling methods, such as quadrilateral annotations, which align more closely with the physical arrangement of scene text, enhancing dataset utility for future models.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Yuliang Liu (82 papers)
Lianwen Jin (116 papers)

Citations (299)

View on Semantic Scholar

Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection (1703.01425v1)

Deep Matching Prior Network: Multi-oriented Text Detection

Implications

Speculation on Future Directions

Related Papers