Date

9 months ago

Size

6.43 GB

Organization

Paper URL

2501.00321

Tags

OCR

OCRBench-v2 is a multimodal large-scale model optical character recognition (OCR) evaluation benchmark released in 2025 by Huazhong University of Science and Technology, South China University of Technology, ByteDance and other institutions. The relevant paper results are "OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning", which aims to evaluate the OCR capabilities of large multimodal models (LMMs) in different text-related tasks. This dataset is a large-scale upgrade based on OCRBench. It includes 10,000 manually verified Chinese-English question-and-answer pairs as a public test set, and an additional private test set consisting of 1,500 manually annotated rich text images from a variety of sources, including print books, e-books, scanned documents, and web content. The data covers 31 typical text scenarios and 23 subtasks, categorized into eight core OCR functions (text recognition, text detection, text reference location, relationship extraction, element parsing, mathematical operations, visual-text understanding, and knowledge reasoning).

OCRBenchv2.torrent

Seeding 2Downloading 0Completed 38Total Downloads 159

OCRBenchv2/
- README.md
  1.81 KB
- README.txt
  3.62 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

9 months ago

Size

6.43 GB

Organization

Paper URL

2501.00321

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

2 months ago

CHIMERA General Inference Synthetic Dataset

4 months ago

THINGS-EEG EEG Dataset

5 months ago

THINGS-MEG Magnetoencephalography Dataset

5 months ago

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

5 months ago

CL-bench Context Learning Evaluation Benchmark Dataset

4 months ago

LightOnOCR-mix-0126 Text Transcription Dataset

5 months ago

Nemotron-Math-v2 Mathematical Inference Dataset

5 months ago

GroundingME Complex Scene Understanding Evaluation Dataset

6 months ago

MCIF Multimodal Cross-Language Instruction Following Dataset

6 months ago

TxT360-3efforts Multi-Task Inference Dataset

6 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

OCRBench-v2 Text Recognition Benchmark Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

OCRBench-v2 Text Recognition Benchmark Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

OCRBench-v2 Text Recognition Benchmark Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

CHIMERA General Inference Synthetic Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

LightOnOCR-mix-0126 Text Transcription Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset