DTrOCR: Decoder-only Transformer for Optical Character Recognition (2308.15996v1)

Published 30 Aug 2023 in cs.CV

Abstract: Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative LLM that is pre-trained on a large corpus. We examined whether a generative LLM that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

Authors (1)

Masato Fujitake (8 papers)

Citations (18)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

DTrOCR: Decoder-only Transformer for Optical Character Recognition (2308.15996v1)

Summary

Related Papers