Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DTrOCR: Decoder-only Transformer for Optical Character Recognition (2308.15996v1)

Published 30 Aug 2023 in cs.CV

Abstract: Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative LLM that is pre-trained on a large corpus. We examined whether a generative LLM that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Masato Fujitake (8 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.