HyperAIHyperAI

Command Palette

Search for a command to run...

NVIDIA Cosmos 3 advances Physical AI reasoning and models

NVIDIA has unveiled Cosmos 3, a frontier foundation model designed to advance Physical AI by integrating physical reasoning, world generation, and action prediction into a single unified architecture. This release aims to equip robots, autonomous vehicles, and smart environments with the capability to understand their surroundings, forecast future events, and generate appropriate actions. Unlike previous iterations that required separate models for different tasks, Cosmos 3 utilizes a Mixture-of-Transformers architecture to handle reasoning and generation simultaneously, simplifying development and eliminating the need for complex orchestration between multiple inference pipelines. The company is open-sourcing the model weights, training scripts, deployment tools, and six synthetic datasets covering domains such as robotics, physics simulation, human motion, driving, and warehouse operations. These datasets are available on Hugging Face to support post-training and model adaptation. Two model sizes are currently available: Cosmos 3 Nano and Cosmos 3 Super, which offer varying performance levels for different compute constraints. The model supports diverse input and output modalities, including text, image, and video, enabling applications ranging from video generation for rare edge cases to action-conditioned world modeling for robotic policy learning. To address the limitations of automated benchmarks that struggle to differentiate high-performing models, NVIDIA introduced the Cosmos Human Evaluation framework. This system uses atomic binary verification to assess video generation quality based on semantic alignment, physical laws, geometric reasoning, and visual integrity. By generating fact-based questions reviewed by human experts, HUE provides a more reliable metric for comparing model quality across physical AI domains. Early results indicate that Cosmos 3 leads open-source benchmarks on VANTAGE-Bench, R-Bench, PAIBench-G, Physics-IQ, and RoboLab. Developers can further customize Cosmos 3 using fully open training recipes for Supervised Fine-Tuning and action post-training. These workflows allow teams to adapt the model to specific datasets, enabling capabilities such as generating future observations conditioned on robot actions, inferring actions from demonstrations, and predicting action sequences. For production deployment, NVIDIA offers Cosmos 3 models via NIM microservices. These packaged services include optimized inference runtimes, delivering high performance without requiring manual infrastructure tuning. The Cosmos 3 Reasoner NIM is available immediately, with the Generator NIM expected soon. NVIDIA emphasized that this release makes physical AI development more reproducible and accessible. By providing end-to-end tools from training to deployment, the company intends to accelerate the adoption of advanced Physical AI systems in industries including logistics, manufacturing, and autonomous transport. The full codebase, documentation, and model weights are accessible through NVIDIA NGC and GitHub, supported by a comprehensive technical report and tutorial videos demonstrating setup and usage.

Related Links