Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Holistic Representation Guided Attention Network for Scene Text Recognition (1904.01375v5)

Published 2 Apr 2019 in cs.CV

Abstract: Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5x to 9.4x acceleration to backward pass and 1.3x to 7.9x acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lu Yang (82 papers)
  2. Fan Dang (4 papers)
  3. Peng Wang (831 papers)
  4. Hui Li (1004 papers)
  5. Zhen Li (334 papers)
  6. Yanning Zhang (170 papers)
Citations (36)