Command Palette
Search for a command to run...
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Abstract
Training Vision-Language-Action (VLA) models for generalist robots typicallyrequires large-scale real-world robot data, which is expensive andtime-consuming to collect. The inefficiency of physical data collectionseverely limits the scalability, and generalization capacity of current VLAsystems. To address this challenge, we introduce GigaBrain-0, a novel VLAfoundation model empowered by world model-generated data (e.g., videogeneration, real2real transfer, human transfer, view transfer, sim2realtransfer data). By leveraging world models to generate diverse data at scale,GigaBrain-0 significantly reduces reliance on real robot data while improvingcross-task generalization. Our approach further improves policy robustnessthrough RGBD input modeling and embodied Chain-of-Thought (CoT) supervision,enabling the model to reason about spatial geometry, object states, andlong-horizon dependencies during task execution. This leads to substantialgains in real-world performance on dexterous, long-horizon, and mobilemanipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achievessuperior generalization across variations in appearances (e.g., textures,colors), object placements, and camera viewpoints. Additionally, we presentGigaBrain-0-Small, an optimized lightweight variant designed to run efficientlyon devices such as the NVIDIA Jetson AGX Orin.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.