Detecting Oriented Text in Natural Images by Linking Segments (1703.06520v3)

Published 19 Mar 2017 in cs.CV

Abstract: Most state-of-the-art text detection methods are specific to horizontal Latin text and are not fast enough for real-time applications. We introduce Segment Linking (SegLink), an oriented text detection method. The main idea is to decompose text into two locally detectable elements, namely segments and links. A segment is an oriented box covering a part of a word or text line; A link connects two adjacent segments, indicating that they belong to the same word or text line. Both elements are detected densely at multiple scales by an end-to-end trained, fully-convolutional neural network. Final detections are produced by combining segments connected by links. Compared with previous methods, SegLink improves along the dimensions of accuracy, speed, and ease of training. It achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin. It runs at over 20 FPS on 512x512 images. Moreover, without modification, SegLink is able to detect long lines of non-Latin text, such as Chinese.

Authors (3)

Baoguang Shi (15 papers)
Xiang Bai (222 papers)
Serge Belongie (125 papers)

Citations (637)

View on Semantic Scholar

Summary

Overview of "Detecting Oriented Text in Natural Images by Linking Segments"

The paper "Detecting Oriented Text in Natural Images by Linking Segments" presents SegLink, an innovative method for detecting oriented text within natural images. SegLink employs a unique approach by decomposing text into locally-detectable elements, namely segments and links. This approach addresses several limitations of prior methods, which predominantly cater to horizontal Latin text and often lack real-time processing capability.

Key Contributions

The authors propose a fundamental shift in text detection strategy. Instead of detecting entire words or lines directly, SegLink identifies smaller elements—segments and links—which are more amenable to detection by convolutional neural networks (CNNs). Segments are oriented boxes covering parts of words or lines, while links connect segments of the same word or line. This decomposition allows for the detection of text with varied orientations and aspect ratios, challenging tasks for traditional object detection models due to their bounding box designs.

Methodology

The network architecture used in SegLink is rooted in the VGG-16 model, augmented with additional convolutional layers to handle multi-scale text segments. Convolutional predictors determine the orientation and confidence of detected segments and links across various feature map layers. Two types of links are introduced: within-layer links, connecting adjacent segments on the same layer, and cross-layer links, connecting segments on different layers, thereby aiding in the joining of redundancies.

Key steps include:

Segment and Link Detection: Through a fully-convolutional neural network, segments and links are predicted based on confidence and geometric offsets. Each segment is scored and aligned across different scales, allowing detailed detection.
Combining Segments: Detected segments and links are used to construct a graph, and words or text lines are then identified through depth-first search (DFS) algorithms to combine connected segments.

Experimental Results

Empirical validation demonstrates SegLink's superior performance across various datasets, including ICDAR 2015 Incidental Text (IC15), MSRA-TD500, and ICDAR 2013 (IC13). On IC15, SegLink achieved an f-measure of 75.0%, surpassing existing methods by a significant margin due to its enhanced recall capabilities. On MSRA-TD500, which features non-Latin, multi-lingual text, SegLink maintained high precision and efficiency, operating effectively at 8.9 FPS. Meanwhile, on IC13, SegLink demonstrated competitive results for horizontal text detection, further showcasing its robustness and versatility.

Implications and Future Work

This research demonstrates notable implications for both practical applications and the theoretical understanding of text detection in natural images. The proposed segment-linking approach could transform how varied text, especially non-Latin scripts, is recognized in real time across diverse environments. Future investigations could explore enhancing the algorithm's applicability to more complex text forms, such as curved text, and integrating recognition capabilities for an end-to-end text system.

In summary, SegLink stands out by providing a flexible and efficient solution for detecting oriented text, significantly advancing previous state-of-the-art methods through innovative network design and problem decomposition.

PDF Markdown

Related Papers

Find Related Papers