EAST: An Efficient and Accurate Scene Text Detector

Previous approaches for scene text detection have already achieved promisingperformances across various benchmarks. However, they usually fall short whendealing with challenging scenarios, even when equipped with deep neural networkmodels, because the overall performance is determined by the interplay ofmultiple stages and components in the pipelines. In this work, we propose asimple yet powerful pipeline that yields fast and accurate text detection innatural scenes. The pipeline directly predicts words or text lines of arbitraryorientations and quadrilateral shapes in full images, eliminating unnecessaryintermediate steps (e.g., candidate aggregation and word partitioning), with asingle neural network. The simplicity of our pipeline allows concentratingefforts on designing loss functions and neural network architecture.Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500demonstrate that the proposed algorithm significantly outperformsstate-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fpsat 720p resolution.