HyperAIHyperAI
4 days ago

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Qinsi Wang, Bo Liu, Tianyi Zhou, Jing Shi, Yueqian Lin, Yiran Chen, Hai Helen Li, Kun Wan, Wentian Zhao
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified
  Self-Play
Abstract

Although reinforcement learning (RL) can effectively enhance the reasoningcapabilities of vision-language models (VLMs), current methods remain heavilydependent on labor-intensive datasets that require extensive manualconstruction and verification, leading to extremely high training costs andconsequently constraining the practical deployment of VLMs. To address thischallenge, we propose Vision-Zero, a domain-agnostic framework enabling VLMself-improvement through competitive visual games generated from arbitraryimage pairs. Specifically, Vision-Zero encompasses three main attributes: (1)Strategic Self-Play Framework: Vision-Zero trains VLMs in "Who Is theSpy"-style games, where the models engage in strategic reasoning and actionsacross multiple roles. Through interactive gameplay, models autonomouslygenerate their training data without human annotation. (2) Gameplay fromArbitrary Images: Unlike existing gamified frameworks, Vision-Zero can generategames from arbitrary images, thereby enhancing the model's reasoning abilityacross diverse domains and showing strong generalization to different tasks. Wedemonstrate this versatility using three distinct types of image datasets:CLEVR-based synthetic scenes, charts, and real-world images. (3) SustainablePerformance Gain: We introduce Iterative Self-Play Policy Optimization(Iterative-SPO), a novel training algorithm that alternates between Self-Playand reinforcement learning with verifiable rewards (RLVR), mitigating theperformance plateau often seen in self-play-only training and achievingsustained long-term improvements. Despite using label-free data, Vision-Zeroachieves state-of-the-art performance on reasoning, chart question answering,and vision-centric understanding tasks, surpassing other annotation-basedmethods. Models and code has been released athttps://github.com/wangqinsi1/Vision-Zero.