HyperAI

Chinese Text in the Wild Chinese Character Dataset

Date

3 years ago

Organization

License

其他

Categories

Download Help
特色图像

Chinese Text in the Wild is a large dataset created with text contained in natural images. This dataset contains 32,285 images with 1,018,402 Chinese characters, far exceeding previous datasets. These images come from Tencent Street View and are obtained from dozens of different cities in China without any special purpose.

Due to its diversity and complexity, this dataset is extremely challenging. It contains flat text, raised text, urban text, rural text, low-brightness text, distant text, partially occluded text, etc.

For each image, all Chinese characters are annotated by experts. For each Chinese character, the dataset annotates its underlying character, bounding box, and 6 attributes to indicate whether it is occluded, complex background, distorted, 3D text, artistic text, and handwriting.