3 months ago

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang

Abstract

Empowering Large Multimodal Models (LMMs) to deeply integrate image interaction with long-horizon reasoning capabilities remains a long-standing challenge in this field. Recent advances in vision-centric reasoning explore a promising "Thinking with Images" paradigm for LMMs, marking a shift from image-assisted reasoning to image-interactive thinking. While this milestone enables models to focus on fine-grained image regions, progress remains constrained by limited visual tool spaces and task-specific workflow designs. To bridge this gap, we present V-Thinker, a general-purpose multimodal reasoning assistant that enables interactive, vision-centric thinking through end-to-end reinforcement learning. V-Thinker comprises two key components: (1) a Data Evolution Flywheel that automatically synthesizes, evolves, and verifies interactive reasoning datasets across three dimensions-diversity, quality, and difficulty; and (2) a Visual Progressive Training Curriculum that first aligns perception via point-level supervision, then integrates interactive reasoning through a two-stage reinforcement learning framework. Furthermore, we introduce VTBench, an expert-verified benchmark targeting vision-centric interactive reasoning tasks. Extensive experiments demonstrate that V-Thinker consistently outperforms strong LMM-based baselines in both general and interactive reasoning scenarios, providing valuable insights for advancing image-interactive reasoning applications.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

3 months ago

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

V-Thinker: Interactive Thinking with Images

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

V-Thinker: Interactive Thinking with Images

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

V-Thinker: Interactive Thinking with Images

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang

Runqi Qiao Qiuna Tan Minghan Yang Guanting Dong Peiqing Yang Shiqiang Lang Enhui Wan Xiaowan Wang Yida Xu Lan Yang