OCRBench Text Recognition Benchmark Dataset
Date
2 days ago
Size
60.8 MB
Publish URL
Categories
OCRBench is a text recognition benchmark dataset released by Huazhong University of Science and Technology and Microsoft Research. This dataset is an evaluation benchmark for multimodal large-scale optical character recognition (OCR). The relevant paper results are:OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models", which aims to evaluate the OCR capabilities of large multimodal models (LMMs) in different text-related tasks.
The dataset contains 1000 manually screened and corrected question-answer pairs from five representative text-related tasks: text recognition, scene text centering, document orientation, key information extraction (KIE), and handwritten mathematical expressions (HMER).
The data includes:
- Text recognition 300 images (including regular, irregular, artistic and other text types).
- Scene Text Centric Visual Question Answering 200 Questions.
- Document-guided visual question answering 200 questions.
- 200 questions for key information extraction.
- Handwritten mathematical expression recognition 100 images from the HME100k dataset.
OCRBench.torrent
Seeding 1Downloading 0Completed 0Total Downloads 1