Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks (2006.06244v1)

Published 11 Jun 2020 in cs.CV

Abstract: Despite the recent success of text detection and recognition methods, existing evaluation metrics fail to provide a fair and reliable comparison among those methods. In addition, there exists no end-to-end evaluation metric that takes characteristics of OCR tasks into account. Previous end-to-end metric contains cascaded errors from the binary scoring process applied in both detection and recognition tasks. Ignoring partially correct results raises a gap between quantitative and qualitative analysis, and prevents fine-grained assessment. Based on the fact that character is a key element of text, we hereby propose a Character-Level Evaluation metric (CLEval). In CLEval, the \textit{instance matching} process handles split and merge detection cases, and the \textit{scoring process} conducts character-level evaluation. By aggregating character-level scores, the CLEval metric provides a fine-grained evaluation of end-to-end results composed of the detection and recognition as well as individual evaluations for each module from the end-performance perspective. We believe that our metrics can play a key role in developing and analyzing state-of-the-art text detection and recognition methods. The evaluation code is publicly available at https://github.com/clovaai/CLEval.

Citations (13)

Summary

  • The paper introduces CLEval as a novel metric that evaluates OCR performance at the character level to capture subtle detection and recognition errors.
  • The methodology integrates instance matching with a character scoring process using Pseudo-Character Centers and longest common subsequence analysis.
  • Experimental results show that CLEval outperforms traditional metrics like IoU by providing granular feedback to improve text detection and recognition systems.

CLEval: Evaluating Text Detection and Recognition at the Character Level

The paper proposes a novel Character-Level Evaluation metric (CLEval) for assessing the performance of text detection and recognition methods. The research addresses deficiencies in existing evaluation metrics that rely on binary scoring processes, which inadequately capture the nuances of text detection and recognition tasks. The existing metrics are criticized for accumulating cascaded errors and not accounting for partially correct results, leading to a gap between quantitative and qualitative analysis.

Key Contributions and Methodology

The CLEval metric introduces a character-level evaluation process that emphasizes fine-grained assessments of both text detection and recognition. This metric integrates the instance matching process and a character scoring process. The instance matching process considers split and merge detection cases and operates on a character-by-character basis, ensuring that partially correct results are duly recognized. This approach allows for a more precise and nuanced evaluation compared to traditional binary metrics.

A central innovation in CLEval is the use of Pseudo-Character Centers (PCC), which are synthesized points within bounding boxes intended to represent character locations in the absence of explicit character annotations. These points serve as the basis for determining whether detection boxes correctly encapsulate the characters.

The CLEval character scoring mechanism calculates recall and precision by evaluating the longest common subsequences (LCS) between ground truth and predicted texts, integrating penalties for incorrect text sequences and overlapping characters. This enables the metric to mitigate issues related to granularity—the degree to which a model captures the sequence of characters as a coherent word entity—and correctness—how accurately recognition captures character content.

Evaluation and Results

The paper evaluates CLEval against other prevalent metrics using synthetic and real-world datasets, highlighting the limitations of area-based and intersection-over-union (IoU) metrics in OCR tasks. Through rigorous experimentation, CLEval is shown to provide reliable and fine-grained assessment capabilities that are well-suited for complex text detection scenarios often found in real-world applications.

CLEval's provision of intermediate statistics—such as character-level missing, overlapping, and false positives—extends its utility beyond mere scoring, offering insights that can drive further development of text detection and recognition methods.

Implications for Future Research

As a more nuanced metric, CLEval provides researchers with a tool that brings OCR evaluation closer to the practical needs of the end user. By addressing both granularity and correctness issues, CLEval paves the way for the creation and assessment of more sophisticated and accurate text detection and recognition systems.

This work has potential implications for the development of future OCR systems and their evaluation metrics. CLEval could be particularly influential in advancing automated performance evaluations and benchmarking, encouraging a closer alignment between qualitative user satisfaction and quantitative evaluation results.

In summation, the CLEval metric stands as a robust proposal for moving beyond traditional binary scoring techniques, enabling more detailed evaluations that reflect real-world applications of OCR technologies. As research into text detection and recognition continues to grow, CLEval is a significant step forward in forming a foundational basis for advancing evaluation methodologies.

Github Logo Streamline Icon: https://streamlinehq.com