HyperAI

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue
Release Date: 5/28/2025
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
Abstract

Logical reasoning is a fundamental aspect of human intelligence and anessential capability for multimodal large language models (MLLMs). Despite thesignificant advancement in multimodal reasoning, existing benchmarks fail tocomprehensively evaluate their reasoning abilities due to the lack of explicitcategorization for logical reasoning types and an unclear understanding ofreasoning. To address these issues, we introduce MME-Reasoning, a comprehensivebenchmark designed to evaluate the reasoning ability of MLLMs, which covers allthree types of reasoning (i.e., inductive, deductive, and abductive) in itsquestions. We carefully curate the data to ensure that each questioneffectively evaluates reasoning ability rather than perceptual skills orknowledge breadth, and extend the evaluation protocols to cover the evaluationof diverse questions. Our evaluation reveals substantial limitations ofstate-of-the-art MLLMs when subjected to holistic assessments of logicalreasoning capabilities. Even the most advanced MLLMs show limited performancein comprehensive logical reasoning, with notable performance imbalances acrossreasoning types. In addition, we conducted an in-depth analysis of approachessuch as ``thinking mode'' and Rule-based RL, which are commonly believed toenhance reasoning abilities. These findings highlight the critical limitationsand performance imbalances of current MLLMs in diverse logical reasoningscenarios, providing comprehensive and systematic insights into theunderstanding and evaluation of reasoning capabilities.