Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Out of Length Text Recognition with Sub-String Matching (2407.12317v3)

Published 17 Jul 2024 in cs.CV

Abstract: Scene Text Recognition (STR) methods have demonstrated robust performance in word-level text recognition. However, in real applications the text image is sometimes long due to detected with multiple horizontal words. It triggers the requirement to build long text recognition models from readily available short (i.e., word-level) text datasets, which has been less studied previously. In this paper, we term this task Out of Length (OOL) text recognition. We establish the first Long Text Benchmark (LTB) to facilitate the assessment of different methods in long text recognition. Meanwhile, we propose a novel method called OOL Text Recognition with sub-String Matching (SMTR). SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features, matching the sub-string and simultaneously recognizing its next and previous character. SMTR can recognize text of arbitrary length by iterating the process above. To avoid being trapped in recognizing highly similar sub-strings, we introduce a regularization training to compel SMTR to effectively discover subtle differences between similar sub-strings for precise matching. In addition, we propose an inference augmentation strategy to alleviate confusion caused by identical sub-strings in the same text and improve the overall recognition efficiency. Extensive experimental results reveal that SMTR, even when trained exclusively on short text, outperforms existing methods in public short text benchmarks and exhibits a clear advantage on LTB. Code: https://github.com/Topdu/OpenOCR.

Citations (1)

Summary

  • The paper presents SMTR, a method that overcomes the limitations of absolute positional encoding by leveraging sub-string encoding for text recognition.
  • It uses dual queries and multi-head attention to iteratively identify and decode continuous text from images, achieving superior accuracy on long texts.
  • The results suggest that integrating traditional string-matching techniques with modern attention mechanisms significantly improves recognition versatility in diverse layout scenarios.

Overview of "Out of Length Text Recognition with Sub-String Matching"

The paper "Out of Length Text Recognition with Sub-String Matching" presents a novel approach to Scene Text Recognition (STR) focusing on Out of Length (OOL) text recognition. This task addresses the challenges of recognizing texts of arbitrary length—texts that are often comprised of continuous lines of horizontal words—particularly when models are trained exclusively on datasets comprised mostly of short (word-level) text samples.

Key Contributions

To tackle OOL text recognition, the authors propose a novel method called SMTR (Sub-String Matching for Text Recognition). SMTR identifies sub-strings within a text image and recognizes adjacent characters based on cross-attention mechanisms. Unlike existing STR methods that rely on absolute positional information, the SMTR model is inspired by string-matching techniques that utilize relative sub-string positioning for text recognition. This approach naturally manages texts of any length by iteratively identifying and recognizing sub-strings and their adjacent characters.

Methodology

SMTR leverages:

  • Sub-String Encoding: It encodes sub-strings into two types of queries (next and previous) to facilitate sub-string matching and inference within an image.
  • Sub-String Matcher: Utilizes multi-head attention to focus on sub-string positions in image features, guiding subsequent character predictions.
  • Regularization training aims to resolve issues arising from similar sub-strings by compelling the SMTR model to highlight subtle distinctions between analogous sub-strings.
  • An inference augmentation strategy is used during the decoding phase to handle repetitive sub-strings and enhance recognition accuracy and efficiency.

Evaluation and Results

The paper introduces the Long Text Benchmark (LTB), isolating longer text instances from various STR datasets to specifically evaluate long text recognition capabilities. Empirical evaluations reveal that SMTR achieves superior performance over existing attention-based and CTC-based methods on LTB, with significant improvements in accurately recognizing long texts compared to models leveraging absolute position embeddings. SMTR also demonstrates competitive results on short text benchmarks, highlighting its versatility and robustness across different text lengths.

Implications and Future Directions

The implications of this research are substantial for the field of STR, particularly in applications that require understanding multiple words in line-level contexts. The SMTR paradigm signifies a shift towards more dynamic and adaptable text recognition systems, especially in environments lacking adequate length variation in training datasets.

From a theoretical standpoint, the success of sub-string matching and regularization demonstrates the powerful potential of combining traditional pattern recognition techniques with modern attention mechanisms. Future research could explore optimizing the computational efficiency of SMTR, as the inference process remains iterative. Additionally, addressing other complex text layouts across various languages could benefit from SMTR’s adaptable framework, providing a broader applicability of this method in real-world scenarios where text is abundant and diverse in length.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub