ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) (1708.09585v3)

Published 31 Aug 2017 in cs.CV

Abstract: Chinese is the most widely used language in the world. Algorithms that read Chinese text in natural images facilitate applications of various kinds. Despite the large potential value, datasets and competitions in the past primarily focus on English, which bares very different characteristics than Chinese. This report introduces RCTW, a new competition that focuses on Chinese text reading. The competition features a large-scale dataset with 12,263 annotated images. Two tasks, namely text localization and end-to-end recognition, are set up. The competition took place from January 20 to May 31, 2017. 23 valid submissions were received from 19 teams. This report includes dataset description, task definitions, evaluation protocols, and results summaries and analysis. Through this competition, we call for more future research on the Chinese text reading problem. The official website for the competition is http://rctw.vlrlab.net

PDF Abstract

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)

This essay provides a detailed analysis and expert examination of the paper titled "ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)", in which the authors introduce a novel competition focused on advancing the field of Chinese text recognition in natural images. The competition is notable for its emphasis on Chinese text, addressing the lack of resources and competitions traditionally focused on English.

Dataset and Tasks

The paper presents the CTW-12k dataset, an extensive collection comprising 12,263 annotated images capturing the complexities of Chinese text in various environments. The dataset distinguishes itself by including both 'natural' and 'digital-born' images, contributing to the richness and diversity necessary for robust algorithm development. Annotation is conducted at the text line level, encompassing the location and transcription of each text instance, while difficult-to-read text lines are flagged appropriately.

RCTW-17 competition introduces two distinct tasks: text localization and end-to-end recognition. The text localization task involves accurately identifying text regions within images using polygonal descriptors. Meanwhile, the end-to-end recognition task extends this challenge by requiring participants to both locate and accurately transcribe the text, providing a more comprehensive evaluation of text recognition capabilities.

Evaluation Protocols

The evaluation for text localization relies on the average precision (AP) metric, consistent with the PASCAL VOC methodology. The authors have adapted the standard metrics to accommodate polygonal representations, prioritizing methods that balance precision and recall effectively.

For end-to-end recognition, a novel metric, Average Edit Distance (AED), is introduced. This metric assesses the integration of detection and recognition phases by calculating the edit distances between detected results and ground truth, placing a stronger emphasis on recognizing longer text instances. This approach addresses inherent challenges in Chinese text recognition, such as character variety and text layout complexity, demanding a nuanced understanding of text recognition and localization.

Results and Analysis

The competition documented a diverse range of submissions, demonstrating various methodologies in tackling Chinese text recognition. Notable submissions, exemplified by contributions from teams at Peking University and CASIA, showcased innovative uses of neural networks and integrated detection-recognition frameworks. The results highlight the continued dominance of convolutional and recurrent neural architectures in handling complex, multi-character recognition tasks.

Upon analysis, the paper identifies recurrent challenges within the submissions, such as the difficulty in detecting elongated text lines and suppressing redundant detections. The authors also note frequent recognition errors stemming from perspective distortions and structural similarities among Chinese characters, pointing towards areas for potential methodological improvements.

Implications and Future Directions

The RCTW-17 competition signifies a focused intent to fuel progress in Chinese language image processing, opening possibilities for subsequent research and application in related fields. The continuation of this competition as an ongoing, online challenge is poised to strengthen the resource pool available for Chinese text recognition and encourage continuous participation from the research community. Future efforts will include iterative improvements to the dataset and annotations, ensuring their relevance and utility for cutting-edge recognition systems.

In conclusion, the ICDAR2017 Competition on Reading Chinese Text in the Wild has effectively catalyzed interest and development in Chinese text recognition, providing an imperative contribution to this domain of computer vision research. The introduction of novel evaluation protocols and a comprehensive dataset heralds new opportunities for advancements in robust and versatile text recognition systems.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Baoguang Shi (15 papers)
Cong Yao (70 papers)
Minghui Liao (29 papers)
Mingkun Yang (16 papers)
Pei Xu (18 papers)
Linyan Cui (1 paper)
Serge Belongie (125 papers)
Shijian Lu (151 papers)
Xiang Bai (221 papers)

Citations (195)

View on Semantic Scholar

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) (1708.09585v3)