Chinese Text in the Wild Chinese Character Dataset
Date
Publish URL
License
其他
Categories

Chinese Text in the Wild is a large dataset created with text contained in natural images. This dataset contains 32,285 images with 1,018,402 Chinese characters, far exceeding previous datasets. These images come from Tencent Street View and are obtained from dozens of different cities in China without any special purpose.
Due to its diversity and complexity, this dataset is extremely challenging. It contains flat text, raised text, urban text, rural text, low-brightness text, distant text, partially occluded text, etc.
For each image, all Chinese characters are annotated by experts. For each Chinese character, the dataset annotates its underlying character, bounding box, and 6 attributes to indicate whether it is occluded, complex background, distorted, 3D text, artistic text, and handwriting.