3 months ago

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan

Abstract

We propose MIRA, a new benchmark designed to evaluate models in scenarios where generating intermediate visual images is essential for successful reasoning. Unlike traditional CoT methods that rely solely on text, tasks in MIRA require models to generate and utilize intermediate images - such as sketches, structural diagrams, or path drawings - to guide their reasoning process. This setup closely mirrors how humans solve complex problems through "drawing to think". To solve this, MIRA focuses on tasks that are intrinsically challenging and involve complex structures, spatial relationships, or reasoning steps that are difficult to express through language alone. To ensure that our evaluation data is of high-quality, we include 546 multimodal problems, annotated with intermediate visual images and final answers. We also propose a unified evaluation protocol for MIRA that spans three levels of evaluation input: direct input with image and question only, text-only CoT input with image and thinking prompts, and Visual-CoT input with both annotated image clues and textual thinking prompts. To probe the upper bound of model capacity on our benchmark, we also report pass@k and majority voting accuracies under different k settings. Experimental results show that existing multimodal large language models, including strongest private models as well as strong open-weight models, perform poorly when relying solely on textual prompts. However, when intermediate visual cues are provided, model performance improves consistently, yielding an average relative gain of 33.7% across all models and tasks. We also probe the upper bound by expanding the search space and designing textual prompts aligned with Visual-CoT, but both yield only limited improvements compared to our Visual-CoT setting. These results underscore the critical role of imagined visual information in enabling successful reasoning on MIRA.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought | Papers | HyperAI

Command Palette

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan4 more

Abstract

Build AI with AI

HyperAI Newsletters

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan

Yiyang Zhou Haoqin Tu Zijun Wang Zeyu Wang Niklas Muennighoff Fan Nie Yejin Choi James Zou Chaorui Deng Shen Yan