Command Palette
Search for a command to run...
Qwen3-Omni: An all-rounder That Breaks Through Modal Boundaries
1. Tutorial Introduction

Qwen3-Omni is the industry's first native end-to-end omnimodal AI model launched by Alibaba's Tongyi Qianwen team in September 2025. It can process multiple types of inputs, including text, images, audio, and video, and can output results in real-time streaming through text and natural speech, solving the long-standing problem of multimodal models requiring trade-offs between different capabilities.Qwen3-Omni Technical Report".
This tutorial uses dual-GPU RTX A6000 computing resources and provides two models, Qwen3-Omni-30B-A3B-Instruct and Qwen3-Omni-30B-A3B-Thinking, for testing.
Qwen3-Omni-30B-A3B-Instruct is an instruction model for Qwen3-Omni-30B-A3B. It includes a thinker and a speaker, supports audio, video, and text input, and outputs audio and text.
Qwen3-Omni-30B-A3B-Thinking is the thinking model of Qwen3-Omni-30B-A3B. It includes a thinker component, has the ability of thought chain reasoning, supports audio, video and text input, and outputs text.
2. Effect display
Online audio conversation

Online video conversation

Offline audio conversation


Offline video chat

Image understanding

3. Operation steps
1. Start the container

2. Usage steps
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 3-5 minutes and refresh the page.
Online audio conversation

Online video conversation

Offline audio conversation

Offline video chat

Image understanding

Parameter Description:
- System Prompt: The initial prompt given to the model by the system.
- Temperature: The smaller the value, the more "conservative" and certain the subtitles are; the larger the value, the more random and novel they are.
- Top-p: Only select from the "high-scoring words" whose probability accumulates to p. The smaller p is, the fewer candidates there are, and the more conservative the text is.
- Top-k: Only retain the k words with the highest probability. The smaller k is, the fewer candidates there are and the more conservative the text is.
4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.