ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
SCENE text recognition has attracted great interest fromthe academia and the industry in recent years owing toits importance in a wide range of applications. Despite thematurity of Optical Character Recognition (OCR) systemsdedicated to document text, scene text recognition remainsa challenging problem. The large variations in background,appearance, and layout pose significant challenges, whichthe traditional OCR methods cannot handle effectively.Recent advances in scene text recognition are drivenby the success of deep learning-based recognition models.Among them are methods that recognize text by charactersusing convolutional neural networks (CNN), methods thatclassify words with CNNs [24], [26], and methods thatrecognize character sequences using a combination of aCNN and a recurrent neural network (RNN) [54]. In spiteof their success, these methods do not explicitly address theproblem of irregular text, which is text that is not horizontaland frontal, has curved layout, etc. Instances of irregulartext frequently appear in natural scenes. As exemplifiedin Figure 1, typical cases include oriented text, perspectivetext [49], and curved text. Designed without the invarianceto such irregularities, previous methods often struggle inrecognizing such text instances.