A Holistic Representation Guided Attention Network for Scene Text Recognition (1904.01375v5)

Published 2 Apr 2019 in cs.CV

Abstract: Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5x to 9.4x acceleration to backward pass and 1.3x to 7.9x acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (6)

Lu Yang (82 papers)
Fan Dang (4 papers)
Peng Wang (831 papers)
Hui Li (1004 papers)
Zhen Li (334 papers)
Yanning Zhang (170 papers)

Citations (36)

View on Semantic Scholar

A Holistic Representation Guided Attention Network for Scene Text Recognition (1904.01375v5)

Related Papers