Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach (2011.14021v1)

Published 27 Nov 2020 in cs.CV
Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Abstract: Text segmentation is a prerequisite in many real-world text-related tasks, e.g., text style transfer, and scene text removal. However, facing the lack of high-quality datasets and dedicated investigations, this critical prerequisite has been left as an assumption in many works, and has been largely overlooked by current research. To bridge this gap, we proposed TextSeg, a large-scale fine-annotated text dataset with six types of annotations: word- and character-wise bounding polygons, masks and transcriptions. We also introduce Text Refinement Network (TexRNet), a novel text segmentation approach that adapts to the unique properties of text, e.g. non-convex boundary, diverse texture, etc., which often impose burdens on traditional segmentation models. In our TexRNet, we propose text specific network designs to address such challenges, including key features pooling and attention-based similarity checking. We also introduce trimap and discriminator losses that show significant improvement on text segmentation. Extensive experiments are carried out on both our TextSeg dataset and other existing datasets. We demonstrate that TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods. Our dataset and code will be made available at https://github.com/SHI-Labs/Rethinking-Text-Segmentation.

Rethinking Text Segmentation: A Novel Dataset and Refinement Approach

Text segmentation holds a pivotal role in diverse text-related computer vision tasks, such as text style transfer and scene text removal. Despite its significance, progress in this domain has been hindered by the absence of high-quality datasets and specialized segmentation algorithms tailored to the unique characteristics of text. The paper by Xu et al. introduces two major contributions to catalyze advancements in this crucial area: a large-scale text segmentation dataset named TextSeg and a novel segmentation method called Text Refinement Network (TexRNet).

TextSeg Dataset

The development of the TextSeg dataset addresses the limitations of existing datasets by offering a comprehensive and fine-annotated dataset tailored for text segmentation. This dataset introduces over 4,000 images collected from various sources, including posters, road signs, and digital designs, demonstrating a balanced mix of scene and design text. Distinctively, TextSeg provides extensive annotations with six types of labels, including word- and character-wise bounding polygons, masks, and transcriptions, thereby surpassing the scope of former datasets like Total-Text and COCO_TS. This inclusion of nuanced text effects and character-level annotations fosters a granular understanding of text structure, offering a robust platform for both academic and practical applications.

Text Refinement Network (TexRNet)

The second contribution, TexRNet, is a specialized segmentation approach designed to handle the intricate challenges of text, such as non-convex shapes and varied textures. Integrating key design principles like dynamic feature pooling and attention-based similarity checking, TexRNet adjusts its focus dynamically to enhance segmentation accuracy. This dynamic adaptation is achieved via a novel mechanism for computing query-key similarities across image regions, which addresses traditional segmentation limits in scaling and aspect ratio variance. TexRNet incorporates innovative loss functions, including trimap and glyph discriminator losses, to enhance boundary precision and text readability, respectively.

Experimental Validation and Results

TexRNet's performance is validated through extensive experiments across five datasets, notably including the newly introduced TextSeg. The results demonstrate a consistent improvement in text segmentation performance, with a notable 2% increase in fgIoU compared to state-of-the-art models. By achieving superior results on widely recognized benchmarks, TexRNet underscores its effectiveness not only in text-specific tasks but also as a potential general contribution to segmentation methodologies.

Implications and Future Prospects

This research has vital implications for numerous practical applications, as highlighted by exemplary downstream tasks such as text removal and text style transfer using the TexRNet framework. The precision of TexRNet's segmentation facilitates better text-based manipulations, indicating potential industrial impacts. The comprehensive TextSeg dataset is anticipated to drive the development of more sophisticated models and methods, fortifying research efforts in text processing. Future advancements could explore integrating TexRNet with real-time applications or expanding its utility to multilingual text datasets. Collectively, this work provides a substantial framework that enhances both theoretical understanding and practical execution of text segmentation tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xingqian Xu (23 papers)
  2. Zhifei Zhang (156 papers)
  3. Zhaowen Wang (55 papers)
  4. Brian Price (41 papers)
  5. Zhonghao Wang (20 papers)
  6. Humphrey Shi (97 papers)
Citations (56)