Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor (2401.10110v5)

Published 18 Jan 2024 in cs.CV

Abstract: Scene Text Recognition (STR) is an important and challenging upstream task for building structured information databases, that involves recognizing text within images of natural scenes. Although current state-of-the-art (SOTA) models for STR exhibit high performance, they typically suffer from low inference efficiency due to their reliance on hybrid architectures comprised of visual encoders and sequence decoders. In this work, we propose a VIsion Permutable extractor for fast and efficient Scene Text Recognition (SVIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR. Specifically, SVIPTR leverages a visual-semantic extractor with a pyramid structure, characterized by the Permutation and combination of local and global self-attention layers. This design results in a lightweight and efficient model and its inference is insensitive to input length. Extensive experimental results on various standard datasets for both Chinese and English scene text recognition validate the superiority of SVIPTR. Notably, the SVIPTR-T (Tiny) variant delivers highly competitive accuracy on par with other lightweight models and achieves SOTA inference speeds. Meanwhile, the SVIPTR-L (Large) attains SOTA accuracy in single-encoder-type models, while maintaining a low parameter count and favorable inference speed. Our proposed method provides a compelling solution for the STR challenge, which greatly benefits real-world applications requiring fast and efficient STR. The code is publicly available at https://github.com/cxfyxl/VIPTR.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Xianfu Cheng (9 papers)
  2. Weixiao Zhou (5 papers)
  3. Xiang Li (1002 papers)
  4. Xiaoming Chen (140 papers)
  5. Jian Yang (503 papers)
  6. Tongliang Li (18 papers)
  7. Zhoujun Li (122 papers)
  8. Hang Zhang (164 papers)
  9. Tao Sun (143 papers)
  10. Wei Zhang (1489 papers)
  11. Yuying Mai (2 papers)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub