HyperAIHyperAI
3 days ago

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Runqi Qiao, Qiuna Tan, Peiqing Yang, Yanzi Wang, Xiaowan Wang, Enhui Wan, Sitong Zhou, Guanting Dong, Yuchen Zeng, Yida Xu, Jie Wang, Chong Sun, Chen Li, Honggang Zhang
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
  Mathematical Reasoning
Abstract

Multimodal Large Language Models (MLLMs) have demonstrated impressivecapabilities across various tasks, but still struggle with complex mathematicalreasoning. Existing research primarily focuses on dataset construction andmethod optimization, often overlooking two critical aspects: comprehensiveknowledge-driven design and model-centric data space modeling. In this paper,we introduce We-Math 2.0, a unified system that integrates a structuredmathematical knowledge system, model-centric data space modeling, and areinforcement learning (RL)-based training paradigm to comprehensively enhancethe mathematical reasoning abilities of MLLMs. The key contributions of We-Math2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-levelhierarchical system encompassing 491 knowledge points and 1,819 fundamentalprinciples. (2) MathBook-Standard & Pro: We develop MathBook-Standard, adataset that ensures broad conceptual coverage and flexibility through dualexpansion. Additionally, we define a three-dimensional difficulty space andgenerate 7 progressive variants per problem to build MathBook-Pro, achallenging dataset for robust training. (3) MathBook-RL: We propose atwo-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns themodel with knowledge-oriented chain-of-thought reasoning; and (ii) ProgressiveAlignment RL, leveraging average-reward learning and dynamic data scheduling toachieve progressive alignment across difficulty levels. (4) MathBookEval: Weintroduce a comprehensive benchmark covering all 491 knowledge points withdiverse reasoning step distributions. Experimental results show thatMathBook-RL performs competitively with existing baselines on four widely-usedbenchmarks and achieves strong results on MathBookEval, suggesting promisinggeneralization in mathematical reasoning.