HyperAI

CC-OCR Text Recognition Dataset

Date

3 months ago

Size

1.49 GB

Organization

Huazhong University of Science and Technology
South China University of Technology

Publish URL

github.com

The CC-OCR dataset was jointly developed by Alibaba Group, Huazhong University of Science and Technology, and South China University of Technology in 2024 to provide a comprehensive and challenging benchmark for evaluating the performance of large multimodal models in text recognition (OCR) tasks.CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy".

The dataset covers four core tasks: multi-scene text reading, multi-language text reading, document parsing, and key information extraction, and contains 39 subsets and 7,058 fully annotated images. The launch of CC-OCR fills the gap in the evaluation of current multimodal models in terms of complex structures and fine-grained visual challenges, and is of great significance to promoting the progress of multimodal models in practical applications.

CC-OCR.torrent
Seeding 1Downloading 1Completed 52Total Downloads 94
  • CC-OCR/
    • README.md
      1.52 KB
    • README.txt
      3.04 KB
      • data/
        • CC-OCR.zip
          1.49 GB