a day ago

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang

Abstract

World models (WMs) are intended to serve as internal simulators of the real world that enable agents to understand, anticipate, and act upon complex environments. Existing WM benchmarks remain narrowly focused on next-state prediction and visual fidelity, overlooking the richer simulation capabilities required for intelligent behavior. To address this gap, we introduce WR-Arena, a comprehensive benchmark for evaluating WMs along three fundamental dimensions of next world simulation: (i) Action Simulation Fidelity, the ability to interpret and follow semantically meaningful, multi-step instructions and generate diverse counterfactual rollouts; (ii) Long-horizon Forecast, the ability to sustain accurate, coherent, and physically plausible simulations across extended interactions; and (iii) Simulative Reasoning and Planning, the ability to support goal-directed reasoning by simulating, comparing, and selecting among alternative futures in both structured and open-ended environments. We build a task taxonomy and curate diverse datasets designed to probe these capabilities, moving beyond single-turn and perceptual evaluations. Through extensive experiments with state-of-the-art WMs, our results expose a substantial gap between current models and human-level hypothetical reasoning, and establish WR-Arena as both a diagnostic tool and a guideline for advancing next-generation world models capable of robust understanding, forecasting, and purposeful action.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

a day ago

Agent

Benchmarks

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

a day ago

Agent

Benchmarks

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

World Reasoning Arena

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

World Reasoning Arena

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

World Reasoning Arena

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang7 more

Abstract

Build AI with AI

HyperAI Newsletters

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang

Qiyue Gao Kun Zhou Jiannan Xiang Zihan Liu Dequan Yang Junrong Chen Arif Ahmad Cong Zeng Ganesh Bannur Xinqi Huang