Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study (2112.15093v2)

Published 30 Dec 2021 in cs.CV

Abstract: The flourishing blossom of deep learning has witnessed the rapid development of text recognition in recent years. However, the existing text recognition methods are mainly proposed for English texts. As another widely-spoken language, Chinese text recognition (CTR) in all ways has extensive application markets. Based on our observations, we attribute the scarce attention on CTR to the lack of reasonable dataset construction standards, unified evaluation protocols, and results of the existing baselines. To fill this gap, we manually collect CTR datasets from publicly available competitions, projects, and papers. According to application scenarios, we divide the collected datasets into four categories including scene, web, document, and handwriting datasets. Besides, we standardize the evaluation protocols in CTR. With unified evaluation protocols, we evaluate a series of representative text recognition methods on the collected datasets to provide baselines. The experimental results indicate that the performance of baselines on CTR datasets is not as good as that on English datasets due to the characteristics of Chinese texts that are quite different from the Latin alphabet. Moreover, we observe that by introducing radical-level supervision as an auxiliary task, the performance of baselines can be further boosted. The code and datasets are made publicly available at https://github.com/FudanVI/benchmarking-chinese-text-recognition

PDF Abstract

Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

The paper "Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study" by Haiyang Yu et al. addresses the underexplored field of Chinese Text Recognition (CTR), proposing a comprehensive benchmark to stimulate research in this domain. Despite the advancements in text recognition facilitated by deep learning, existing methodologies predominantly focus on English texts, leaving a gap in the recognition of Chinese texts, which could have an extensive impact given the large number of Chinese speakers worldwide.

Contribution and Methodology

The authors identify key reasons for the limited focus on CTR, namely the absence of standardized datasets, unification in evaluation protocols, and comprehensive baseline results. To mitigate these issues, the authors provide:

Dataset Compilation: The authors devise a well-structured approach to collate Chinese text datasets from various public competitions, papers, and projects. These datasets are categorized into four distinct scenarios: scene, web, document, and handwriting, ensuring coverage across different application contexts.
Evaluation Protocols: By establishing a set of unified evaluation protocols, the authors aim to offer fair assessment tools for CTR methodologies. This includes guidelines for handling traditional and simplified characters, addressing variances that often pose a challenge in the evaluation phase.
Baseline Performance: A comprehensive set of experiments is conducted across several prominent text recognition methods, such as CRNN, ASTER, and TransOCR. The authors provide a detailed performance benchmark, underscoring the challenges unique to Chinese text, such as complex structures and diverse orthographies, compared to their Latin counterparts.

Results and Observations

The empirical results highlight discrepancies in performance between Chinese and English text recognition. Baselines such as CRNN and TransOCR were evaluated, revealing that methods traditionally effective on English datasets encounter notable performance drops when applied to Chinese texts. This is attributed to the unique characteristics of Chinese characters, such as a larger character set and intricate structures.

Furthermore, the introduction of radical-level supervision demonstrated potential performance improvements, suggesting that fine-grained architectural adjustments tailored to linguistic attributes could enhance recognition accuracy.

Implications and Future Directions

This paper's contributions have significant implications for both practical applications in multilingual environments and theoretical advancements in linguistic-based AI research. By addressing the specific challenges of CTR through structured datasets and standardized protocols, the paper lays down a foundational benchmark necessary for enhancing methodologies tailored to Chinese script.

Future research could explore more sophisticated architectural innovations or hybrid models that leverage linguistic insights from Chinese text properties. There is potential for integrating advanced representational learning constructs to further mitigate the inherent complexities observed in CTR.

In conclusion, Yu et al.'s work significantly enriches the CTR domain, providing a valuable benchmark for future innovations. The insights from this paper encourage holistic approaches in character recognition tasks across diverse languages, emphasizing the need for inclusive and comprehensive research endeavors in AI.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Haiyang Yu (109 papers)
Jingye Chen (16 papers)
Bin Li (514 papers)
Jianqi Ma (13 papers)
Mengnan Guan (1 paper)
Xixi Xu (3 papers)
Xiaocong Wang (7 papers)
Shaobo Qu (3 papers)
Xiangyang Xue (169 papers)

Citations (50)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - FudanVI/benchmarking-chinese-text-recognition: This repository contains datasets and baselines for benchmarking Chinese text recognition. (394 stars)