Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) (1909.07145v1)

Published 16 Sep 2019 in cs.CV

Abstract: This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants methods. The dataset, the evaluation kit as well as the results are publicly available at https://rrc.cvc.uab.es/?ch=14

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT

The paper entitled "ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT" discusses a pivotal event that challenged participants with three tasks related to arbitrary-shaped text in the wild: scene text detection, scene text recognition, and scene text spotting. The competition hosted a substantial number of submissions, totaling 78 entries from various teams worldwide, highlighting the ongoing interest and advancements in scene text understanding. This report provides critical insight into the performance metrics, dataset attributes, and methodology of top-performing participants.

The ArT Dataset

The ArT dataset serves as the basis for the competition and aims to address the shortcomings of existing datasets by incorporating a more extensive collection of images that feature arbitrary-shaped text arrangements. With a combined total of 10,166 images, sourced from existing benchmarks alongside newly collected samples, ArT significantly expands the horizon for scene text understanding research. This includes leveraging polygonal ground truth annotations, which offer flexibility and precision in capturing the varied geometries of text in natural scenes. This dataset is a confluence of previous successful datasets such as Total-Text and SCUT-CTW1500, augmented with newly curated images to provide a robust platform for evaluating models that address the recognition of both Latin and Chinese scripts.

Performance Metrics and Results

In the scene text detection task, submission performance was gauged using an Intersection over Union (IoU) threshold-based metric, and results indicated that leading models achieved an H-Mean of 82.65% at a 0.5 IoU threshold. The methodology of top performers frequently included segmentation-based approaches, highlighting their efficacy in effectively binding arbitrary-shaped text within images.

For text recognition tasks, an intriguing bifurcation into Latin and mixed script recognition enabled a diverse range of tools to demonstrate proficiency with specific text orientations and scripts. Models leveraging attention-based RNNs alongside image rectification steps proved adept, as demonstrated by leading submissions achieving up to 85.32% in mixed script recognition tasks.

The scene text spotting task was particularly challenging, encompassing the identification and recognition of text at a holistic level. Achieving a top H-Mean accuracy of 54.91% for mixed script spotting indicates significant room for improvement and innovation in end-to-end text detection and recognition pipelines.

Implications and Future Directions

The implications of this research highlight the importance of accommodating non-linear text orientations and proposing robust methodologies that go beyond conventional rectangular bounding box frameworks. Practical applications of these findings can lead to substantial improvements in OCR systems and similar technologies employed in various real-world settings, including autonomous systems and smart environments requiring text interpretation capabilities.

As this challenge pushes the boundaries of current scene text understanding technologies, future directions in AI must consider the integration of semantic information, language-specific knowledge, and improved metrics such as TIoU to increase the reliability and performance of text detection systems. Recognizing and interpreting arbitrary text instances reflects a crucial step towards achieving comprehensive, intelligent visual recognition systems capable of interfacing seamlessly with dynamic real-world environments.

In conclusion, the ICDAR2019 RRC-ArT competition spotlights critical advancements and prevailing challenges within the field of arbitrary-shaped text understanding, advocating for continual innovation and refinement of methods to bridge the gap between current capabilities and practical application needs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Chee-Kheng Chng (5 papers)
  2. Yuliang Liu (82 papers)
  3. Yipeng Sun (20 papers)
  4. Chun Chet Ng (6 papers)
  5. Canjie Luo (20 papers)
  6. Zihan Ni (3 papers)
  7. ChuanMing Fang (1 paper)
  8. Shuaitao Zhang (5 papers)
  9. Junyu Han (53 papers)
  10. Errui Ding (156 papers)
  11. Jingtuo Liu (36 papers)
  12. Dimosthenis Karatzas (80 papers)
  13. Chee Seng Chan (50 papers)
  14. Lianwen Jin (116 papers)
Citations (197)