Command Palette
Search for a command to run...
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Abstract
Scoring the Optical Character Recognition (OCR) capabilities of LargeMultimodal Models (LMMs) has witnessed growing interest. Existing benchmarkshave highlighted the impressive performance of LMMs in text recognition;however, their abilities in certain challenging tasks, such as textlocalization, handwritten content extraction, and logical reasoning, remainunderexplored. To bridge this gap, we introduce OCRBench v2, a large-scalebilingual text-centric benchmark with currently the most comprehensive set oftasks (4x more tasks than the previous multi-scene benchmark OCRBench), thewidest coverage of scenarios (31 diverse scenarios), and thorough evaluationmetrics, with 10,000 human-verified question-answering pairs and a highproportion of difficult samples. Moreover, we construct a private test set with1,500 manually annotated images. The consistent evaluation trends observedacross both public and private test sets validate the OCRBench v2'sreliability. After carefully benchmarking state-of-the-art LMMs, we find thatmost LMMs score below 50 (100 in total) and suffer from five-type limitations,including less frequently encountered text recognition, fine-grainedperception, layout perception, complex element parsing, and logical reasoning.The project website is at: https://99franklin.github.io/ocrbench_v2/
Code Repositories
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.