HyperAI

OCRBench Text Recognition Benchmark Dataset

Date

2 days ago

Size

60.8 MB

Organization

Huazhong University of Science and Technology

Publish URL

huggingface.co

OCRBench is a text recognition benchmark dataset released by Huazhong University of Science and Technology and Microsoft Research. This dataset is an evaluation benchmark for multimodal large-scale optical character recognition (OCR). The relevant paper results are:OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models", which aims to evaluate the OCR capabilities of large multimodal models (LMMs) in different text-related tasks.

The dataset contains 1000 manually screened and corrected question-answer pairs from five representative text-related tasks: text recognition, scene text centering, document orientation, key information extraction (KIE), and handwritten mathematical expressions (HMER).

The data includes:

  • Text recognition 300 images (including regular, irregular, artistic and other text types).
  • Scene Text Centric Visual Question Answering 200 Questions.
  • Document-guided visual question answering 200 questions.
  • 200 questions for key information extraction.
  • Handwritten mathematical expression recognition 100 images from the HME100k dataset.
OCRBench.torrent
Seeding 1Downloading 0Completed 0Total Downloads 1
  • OCRBench/
    • README.md
      1.65 KB
    • README.txt
      3.3 KB
      • data/
        • OCRBench.zip
          60.8 MB