Alibaba Releases Qwen-3 Inference Model, Outperforming Llama-4

Alibaba has open-sourced the Qwen3 series of "hybrid" reasoning models, which claim to offer significant advancements over the Llama 4 models. These new models, such as Qwen3-235B-A22B, have undergone extensive training and optimization through a series of steps that include: Long-CoT Cold Start: Alibaba used a diverse set of long-chain reasoning data to fine-tune the models. This data covers areas like mathematics, coding, logical reasoning, and STEM fields, ensuring the model acquires robust foundational reasoning capabilities. Long-CoT Reasoning RL: The company employed large-scale reinforcement learning (RL) to strengthen the model's capabilities. This approach uses rule-based rewards to enhance the model's exploration and problem-solving skills. Thinking Mode Fusion: By combining datasets that encompass long-chain reasoning and everyday conversational data, Alibaba ensured the model integrates both non-thinking and thinking modes seamlessly. This guarantees a balance between strong reasoning power and quick response times. General RL: The model underwent extensive reinforcement learning on over 20 tasks across various domains, including command compliance, formatting adherence, and agent capabilities. This step further enhanced the model's general abilities and corrected any undesirable behaviors. For smaller, lightweight models, Alibaba utilized knowledge distillation. The process involves using a complex, pre-trained model as a "teacher" model to transfer its capabilities to a more parameter-efficient "student" model. This method helps ensure that even smaller models can maintain high performance and robust capabilities, effectively inheriting the strengths of their larger counterparts. The training pipeline for these models is designed to maximize efficiency and performance. By leveraging a four-stage complex pre-training process, Alibaba created a robust foundation for the models, which was then fine-tuned to enhance specific capabilities. This comprehensive approach allows the Qwen3 series to stand out in terms of versatility and effectiveness, making it a notable advancement in the field of artificial intelligence.

Alibaba Releases Qwen-3 Inference Model, Outperforming Llama-4

Related Links