3 months ago

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang

Abstract

The rapid evolution of video generative models has shifted their focus from producing visually plausible outputs to tackling tasks requiring physical plausibility and logical consistency. However, despite recent breakthroughs such as Veo 3's chain-of-frames reasoning, it remains unclear whether these models can exhibit reasoning capabilities similar to large language models (LLMs). Existing benchmarks predominantly evaluate visual fidelity and temporal coherence, failing to capture higher-order reasoning abilities. To bridge this gap, we propose TiViBench, a hierarchical benchmark specifically designed to evaluate the reasoning capabilities of image-to-video (I2V) generation models. TiViBench systematically assesses reasoning across four dimensions: i) Structural Reasoning & Search, ii) Spatial & Visual Pattern Reasoning, iii) Symbolic & Logical Reasoning, and iv) Action Planning & Task Execution, spanning 24 diverse task scenarios across 3 difficulty levels. Through extensive evaluations, we show that commercial models (e.g., Sora 2, Veo 3.1) demonstrate stronger reasoning potential, while open-source models reveal untapped potential that remains hindered by limited training scale and data diversity. To further unlock this potential, we introduce VideoTPO, a simple yet effective test-time strategy inspired by preference optimization. By performing LLM self-analysis on generated candidates to identify strengths and weaknesses, VideoTPO significantly enhances reasoning performance without requiring additional training, data, or reward models. Together, TiViBench and VideoTPO pave the way for evaluating and advancing reasoning in video generation models, setting a foundation for future research in this emerging field.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang

Harold Haodong Chen Disen Lan Wen-Jie Shu Qingyang Liu Zihan Wang Sirui Chen Wenkai Cheng Kanghao Chen Hongfei Zhang Zixin Zhang