Online Tutorial: Innovation of the Physical AI System, Quick Start of NVIDIA World Basic Model, Capable of Simulating Sunlight and Haze

a year ago

At the CES 2025 conference in early January, Huang Renxun brought multiple surprises to everyone in a new leather jacket. In addition to the "world's fastest GPU RTX 5090", the world's basic model Cosmos also attracted widespread attention.

"The next frontier of AI is physics." Huang used Cosmos to intuitively explain the origin and credibility of this trend analysis.

As the name implies, a world model is able to generate and simulate a virtual world, thereby simulating the spatial relationships and physical interactions of objects in the scene.The Cosmos world base model is a set of open-ended diffusion and autoregressive Transformer models for physics-aware video generation.9 trillion tokens trained on 20 million hours of real-world human interactions, environments, industrial, robotics, and driving data.

Nvidia senior scientist Jim Fan gave an accurate summary of Cosmos on his social account:

* Two modes are available:diffusion patterns (continuous markers) and autoregressive patterns (discrete markers);

* Supports two generation methods:

Text to video (text->video) and text + video to video (text+video->video)

In order to facilitate everyone to experience this innovative world basic model, the tutorial section of HyperAI's official website has launched "One-click deployment of Cosmos world basic model". Interested friends can come and try it out for themselves ~

Tutorial address:

https://go.hyper.ai/GTCAL

Demo Run

1. Log in to hyper.ai, on the Tutorial page, select Deploy Cosmos World Basic Model with One Click, and click Run this Tutorial Online.

2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

3. Select the "NVIDIA RTX A6000" computing power and the "PyTorch" image. You can choose "Pay as you go" or "Daily/Weekly/Monthly Package" according to your needs, and finally click "Continue".

New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of CPU free time!

HyperAI exclusive invitation link (copy and open in browser):

https://openbayes.com/console/signup?r=Ada0322_QZy7

In addition, OpenBayes’ New Year event is in progress, and single-card RTX 4090 and RTX A6000 packages are half-price for both day and week!

4. Wait for resources to be allocated. The first clone will take about 7 minutes. When the status changes to "Running", click "Open Workspace" and open "Terminal".

5. Enter the following command to activate the environment:

conda activate ./cosmos

6. Enter the following command to switch to the Cosmos directory:

cd Cosmos

7. Enter the following command to start the model gradio interface:

PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/gradio_text2world.py --checkpoint_dir checkpoints --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World --offload_prompt_upsampler --offload_text_encoder_model --offload_guardrail_models --video_save_name Cosmos-1.0-Diffusion-7B-Text2World --checkpoint_dir /input0

After port 8080 appears, open the API address on the right to access the gradio interface.

Effect display

1. After entering the gradio interface, enter the prompt word in "Enter your prompt" and click "Submit" to start inference. You can see the generated video after waiting for a few minutes.

The editor has generated a video of a natural scene, and the prompt is placed below for your reference ~

prompt: One morning, the sun shines through the clouds between the mountains, illuminating the tranquil lake. The lake is clear and surrounded by lush green forests. The mountains in the distance are shrouded in mist, a few birds are flying in the sky, the air is fresh, quiet and peaceful.

We have established a "Stable Diffusion Tutorial Exchange Group". Welcome friends to join the group to discuss various technical issues and share application results~

Scan the QR code below to add HyperaiXingXing on WeChat (WeChat ID: Hyperai01), and note "SD Tutorial Exchange Group" to join the group chat.

Online Tutorial: Innovation of the Physical AI System, Quick Start of NVIDIA World Basic Model, Capable of Simulating Sunlight and Haze

a year ago

Information

Artificial Intelligence

"The next frontier of AI is physics." Huang used Cosmos to intuitively explain the origin and credibility of this trend analysis.

Nvidia senior scientist Jim Fan gave an accurate summary of Cosmos on his social account:

* Two modes are available:diffusion patterns (continuous markers) and autoregressive patterns (discrete markers);

* Supports two generation methods:

Text to video (text->video) and text + video to video (text+video->video)

Tutorial address:

https://go.hyper.ai/GTCAL

Demo Run

1. Log in to hyper.ai, on the Tutorial page, select Deploy Cosmos World Basic Model with One Click, and click Run this Tutorial Online.

2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

3. Select the "NVIDIA RTX A6000" computing power and the "PyTorch" image. You can choose "Pay as you go" or "Daily/Weekly/Monthly Package" according to your needs, and finally click "Continue".

New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of CPU free time!

HyperAI exclusive invitation link (copy and open in browser):

https://openbayes.com/console/signup?r=Ada0322_QZy7

In addition, OpenBayes’ New Year event is in progress, and single-card RTX 4090 and RTX A6000 packages are half-price for both day and week!

4. Wait for resources to be allocated. The first clone will take about 7 minutes. When the status changes to "Running", click "Open Workspace" and open "Terminal".

5. Enter the following command to activate the environment:

conda activate ./cosmos

6. Enter the following command to switch to the Cosmos directory:

cd Cosmos

7. Enter the following command to start the model gradio interface:

PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/gradio_text2world.py --checkpoint_dir checkpoints --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World --offload_prompt_upsampler --offload_text_encoder_model --offload_guardrail_models --video_save_name Cosmos-1.0-Diffusion-7B-Text2World --checkpoint_dir /input0

After port 8080 appears, open the API address on the right to access the gradio interface.

Effect display

1. After entering the gradio interface, enter the prompt word in "Enter your prompt" and click "Submit" to start inference. You can see the generated video after waiting for a few minutes.

The editor has generated a video of a natural scene, and the prompt is placed below for your reference ~

We have established a "Stable Diffusion Tutorial Exchange Group". Welcome friends to join the group to discuss various technical issues and share application results~

Scan the QR code below to add HyperaiXingXing on WeChat (WeChat ID: Hyperai01), and note "SD Tutorial Exchange Group" to join the group chat.

Command Palette

Online Tutorial: Innovation of the Physical AI System, Quick Start of NVIDIA World Basic Model, Capable of Simulating Sunlight and Haze

Demo Run

Effect display

Command Palette

Online Tutorial: Innovation of the Physical AI System, Quick Start of NVIDIA World Basic Model, Capable of Simulating Sunlight and Haze

Demo Run

Effect display

Related News

Meituan's open-source Video Generation Model, LongCat-Video, Combines text-based Video Generation, image-based Video Generation, and Video Continuation Capabilities, Rivaling top-tier open-source and closed-source models.

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Online Tutorial | SAM 3 Achieves Hinted Concept Segmentation With 2x Performance Improvement, Processing 100 Detection Objects in 30 Milliseconds

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

Command Palette

Online Tutorial: Innovation of the Physical AI System, Quick Start of NVIDIA World Basic Model, Capable of Simulating Sunlight and Haze

Demo Run

Effect display

Related News

Meituan's open-source Video Generation Model, LongCat-Video, Combines text-based Video Generation, image-based Video Generation, and Video Continuation Capabilities, Rivaling top-tier open-source and closed-source models.

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Online Tutorial | SAM 3 Achieves Hinted Concept Segmentation With 2x Performance Improvement, Processing 100 Detection Objects in 30 Milliseconds

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

Related News

Meituan's open-source Video Generation Model, LongCat-Video, Combines text-based Video Generation, image-based Video Generation, and Video Continuation Capabilities, Rivaling top-tier open-source and closed-source models.

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Online Tutorial | SAM 3 Achieves Hinted Concept Segmentation With 2x Performance Improvement, Processing 100 Detection Objects in 30 Milliseconds

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.

Related News

Meituan's open-source Video Generation Model, LongCat-Video, Combines text-based Video Generation, image-based Video Generation, and Video Continuation Capabilities, Rivaling top-tier open-source and closed-source models.

Jensen Huang's Latest Speech: 5 Innovations, Rubin Performance Data Revealed for the First Time; Diverse Open Source, Covering Agent/Robot/Autonomous Driving/AI4S

Innovative Input/Output Technology! Tencent Hunyuan Launches HunyuanWorld-Mirror, Refreshing 3D Reconstruction to State-of-the-Art; Decoding the Full Picture of Netflix Content! Netflix Movie and TV Catalog Dataset Helps Insights Into Entertainment Trends

Online Tutorial | SAM 3 Achieves Hinted Concept Segmentation With 2x Performance Improvement, Processing 100 Detection Objects in 30 Milliseconds

Baidu Makes a Move! Its OCR Model, PaddleOCR-VL, Breaks Through the Limitations of Pipeline and end-to-end Methods; the Facial Emotion Recognition Dataset Empowers AI to Understand Facial expressions.

A low-barrier Trial of Open-AutoGLM: an Intelligent Agent Experience Combining Screen Understanding and Automated Execution; Spatial-SSRL-81k: Building a self-supervised Improvement Path for Spatial awareness.

Practical Experience | Elementwise Operator Optimization Practice Based on HyperAI Cloud Computing Platform

Unveiling AI Inference: OpenAI's Sparse Model Makes Neural Networks Transparent for the First Time; Calories Burnt Prediction: Injecting Precise Energy Data Into Fitness Models

Open Source, Best Value! Mistral AI Releases the Ministral 3 Series of Models, Integrating Multimodal Understanding and Intelligent Execution Capabilities; From high-dynamic Dance to Everyday Behavior, the X-Dance Dataset Unlocks multi-dimensional Testing for Human Animation generation.