ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views

Published 25 Mar 2019 in cs.CV | (1903.10412v1)

Abstract: In this paper, we introduce the ShopSign dataset, which is a newly developed natural scene text dataset of Chinese shop signs in street views. Although a few scene text datasets are already publicly available (e.g. ICDAR2015, COCO-Text), there are few images in these datasets that contain Chinese texts/characters. Hence, we collect and annotate the ShopSign dataset to advance research in Chinese scene text detection and recognition. The new dataset has three distinctive characteristics: (1) large-scale: it contains 25,362 Chinese shop sign images, with a total number of 196,010 text-lines. (2) diversity: the images in ShopSign were captured in different scenes, from downtown to developing regions, using more than 50 different mobile phones. (3) difficulty: the dataset is very sparse and imbalanced. It also includes five categories of hard images (mirror, wooden, deformed, exposed and obscure). To illustrate the challenges in ShopSign, we run baseline experiments using state-of-the-art scene text detection methods (including CTPN, TextBoxes++ and EAST), and cross-dataset validation to compare their corresponding performance on the related datasets such as CTW, RCTW and ICPR 2018 MTWI challenge dataset. The sample images and detailed descriptions of our ShopSign dataset are publicly available at: https://github.com/chongshengzhang/shopsign.

Abstract PDF Upgrade to Chat

Authors (7)

Citations (5)

View on Semantic Scholar

Summary

The paper presents the ShopSign dataset, comprising 25,362 images and 196,010 text lines to tackle challenges in Chinese scene text detection and recognition.
It evaluates established methods like CTPN, TextBoxes++, and EAST, revealing that models trained on non-Chinese data underperform on this dataset.
The study underscores the need for language-specific datasets and synthetic data generation to address real-world complexities such as data imbalance and diverse text conditions.

Insights into the ShopSign Dataset: A Comprehensive Resource for Chinese Scene Text Detection and Recognition

The paper "ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views" presents the introduction of a novel dataset aimed at advancing the field of Chinese scene text detection and recognition, herein referred to as Chinese Photo OCR. The ShopSign dataset emerges as an essential contribution given the relatively underexplored domain of Chinese text datasets compared to their English counterparts. This detailed summary examines the dataset's development, characteristics, and implications for future research.

The ShopSign dataset is defined by several pivotal characteristics that distinguish it from existing datasets. First, ShopSign is notable for its scale, containing 25,362 images with 196,010 text lines. This volume makes it a comparable, if not superior, dataset in terms of scope to previously released Chinese scene text datasets. Second, the diversity of the dataset is underlined by the geographic range of image collection, encompassing both developed and developing regions across China. This includes variance in environmental conditions, text orientations, and backgrounds, achieved through using over 50 different mobile devices for capturing images. Third, the dataset encapsulates challenges typical to real-world scenarios by including difficult image categories such as mirror, wooden, deformed, exposed, and obscured texts.

In evaluating the dataset’s utility, baseline experiments were conducted with established scene text detection methods including CTPN, TextBoxes++, and EAST. These methodologies were tested across various challenging categories within the dataset, highlighting ShopSign’s potential to refine and test the robustness of text detection algorithms specifically for Chinese scripts. The experiments indicate that existing models trained on datasets not specifically oriented towards Chinese text perform suboptimally on ShopSign, underscoring the dataset's importance for this language-specific challenge. ShopSign further demonstrates that language-specific factors significantly influence detection performance, validating the need for datasets tailored to the complexities of Chinese characters.

Theoretically, ShopSign not only serves as a foundational dataset for benchmarking but also stimulates dialogue around the unique obstacles inherent in Chinese text recognition, such as handling large character sets and imbalanced data. Practically, this dataset has applications across numerous domains requiring accurate text recognition in natural scenes, including urban planning, autonomous navigation, and digital archiving in Chinese contexts.

Looking towards the future, the creators of ShopSign suggest the potential development of even larger scale synthetic datasets and the application of generative techniques like GANs to generate complex Chinese text scenes. Additionally, addressing the data sparsity and class imbalance within the dataset remains crucial. The publication emphasizes the importance of synthetic dataset creation as a means to support machine learning models in overcoming these challenges and enhancing character recognition capabilities.

In conclusion, ShopSign stands out as a resource meticulously crafted to elevate Chinese scene detection and recognition research. It fills a critical gap within the field, prompting advancements not only through extensive real-world data but also by encouraging the generation and use of synthetic data to support the inherent linguistic complexity found in the Chinese language. The authors hope that the accessibility of ShopSign will drive further innovations and improved methodologies in Chinese Photo OCR.

Markdown Report Issue