5 days ago

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Zhiheng Xi, Guanyu Li, Yutao Fan, Honglin Guo, Yufang Liu, Xiaoran Fan, Jiaqi Liu, Jingchao Ding, Wangmeng Zuo, Zhenfei Yin, Lei Bai, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

View Paper Details View Code

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning
Dataset

Abstract

In this paper, we introduce BMMR, a large-scale bilingual, multimodal,multi-disciplinary reasoning dataset for the community to develop and evaluatelarge multimodal models (LMMs). BMMR comprises 110k college-level questionsspanning 300 UNESCO-defined subjects, spanning diverse formats-multiple-choice,fill-in-the-blank, and open-ended QA-and sourced from both print and digitalmedia such as books, exams, and quizzes. All data are curated and filtered viaa human-in-the-loop and scalable framework, and each instance is paired with ahigh-quality reasoning path. The dataset is organized into two parts: BMMR-Evalthat comprises 20,458 high-quality instances to comprehensively assess LMMs'knowledge and reasoning across multiple disciplines in both Chinese andEnglish; and BMMR-Train that contains 88,991 instances to support furtherresearch and development, extending the current focus on mathematical reasoningto diverse disciplines and domains. In addition, we propose the process-basedmulti-discipline verifier (i.e., BMMR-Verifier) for accurate and fine-grainedevaluation of reasoning paths. Extensive experiments on 24 models reveal that(i) even SOTA models (e.g., o3 and Gemini-2.5-Pro) leave substantial headroomon BMMR-Eval; (ii) reasoning models exhibit discipline bias and outperform LMMsonly on specific subjects; (iii) open-source models still trail theirproprietary counterparts; and (iv) fine-tuning on BMMR-Train narrows this gap.Additionally, we conduct reasoning-chain analyses using BMMR-Verifier and otherin-depth studies, uncovering the challenges LMMs currently face inmultidisciplinary reasoning. We will release the data, and we hope our work canoffer insights and contributions to the community.