HyperAIHyperAI
2 months ago

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Fujitake, Masato
DTrOCR: Decoder-only Transformer for Optical Character Recognition
Abstract

Typical text recognition methods rely on an encoder-decoder structure, inwhich the encoder extracts features from an image, and the decoder producesrecognized text from these features. In this study, we propose a simpler andmore effective method for text recognition, known as the Decoder-onlyTransformer for Optical Character Recognition (DTrOCR). This method uses adecoder-only Transformer to take advantage of a generative language model thatis pre-trained on a large corpus. We examined whether a generative languagemodel that has been successful in natural language processing can also beeffective for text recognition in computer vision. Our experiments demonstratedthat DTrOCR outperforms current state-of-the-art methods by a large margin inthe recognition of printed, handwritten, and scene text in both English andChinese.