HyperAI

Qwen3.5–397B-A17B, released by Alibaba on February 16, 2026, marks a significant step forward in the evolution of large foundation models. Positioned as a next-generation Mixture-of-Experts model with 397 billion total parameters and only 17 billion active parameters per token, it stands out in a crowded field of recent high-profile releases including GLM-5, MiniMax M2.5, and Kimi K2.5. The standout innovation is its Hybrid Attention Architecture, which blends Gated Delta Networks—a linear attention variant inspired by Mamba2 and the "Gated Delta Networks: Improving Mamba2 with Delta Rule" paper—with standard full attention. In a 3:1 ratio, three out of every four transformer blocks use linear attention, enabling near-linear scaling with sequence length, while the fourth block uses full attention for precision. This design reduces computational overhead during long-context processing without sacrificing accuracy. The gated attention mechanism helps stabilize training by mitigating attention sinks and extreme activations. Qwen3.5 also introduces scalable reinforcement learning at agent scale, trained across million-agent environments with increasingly complex task distributions. This approach, aligned with trends seen in MiniMax’s Forge and Zhipu’s Slime, aims to enhance real-world adaptability and long-horizon reasoning. The model is natively multimodal from the start, integrating vision and language through early fusion training—eliminating the need for separate vision adapters. It outperforms Qwen3-VL on visual understanding while maintaining strong text performance. The model supports 201 languages and dialects, expanding beyond Qwen3’s 119, making it the most linguistically diverse open model available. However, quality varies, especially for low-resource languages. In benchmarks, Qwen3.5 is not the top performer in any single category but excels in instruction following. It leads on IFBench (76.5) and MultiChallenge (67.6), outperforming GPT-5.2 and Claude. On reasoning and math, it scores 91.3 on AIME 2026 and 94.8 on HMMT Feb 25—strong but not dominant. Coding performance is solid: 76.4 on SWE-bench Verified, matching K2.5 and Gemini 3 Pro, though behind GPT-5.2 and Claude. It shines on SecCodeBench (68.3) and shows strong vision capabilities—85.0 on MMMU, 88.6 on MathVision, and 90.8 on OmniDocBench. Its ZEROBench score of 12 is particularly impressive given the benchmark’s difficulty. Agentic performance is mixed. It scores 86.7 on Tau2-Bench (second to Claude), 46.1 on MCPMark (behind GPT-5.2), and achieves 78.6 on BrowseComp using a discard-all strategy—highlighting how benchmark results depend heavily on scaffolding. The attention landscape has become fragmented. While DeepSeek pioneered sparse and linear attention, each Chinese lab now has its own approach: Qwen3.5 and Kimi K2.5 use hybrid linear-full attention, MiniMax relies on proprietary Lightning Attention, and GLM-5 combines DeepSeek Sparse Attention with Multi-Head Latent Attention. With Qwen3.5, Alibaba signals a shift from model size to architectural innovation. The focus is no longer just on dense vs. MoE, but on how attention is managed. The model’s success suggests that hybrid architectures may offer the best balance of efficiency and performance. The initial release of only the 397B-A17B variant hints at a broader family rollout, potentially including smaller models adopting the same hybrid design. This release validates the direction first previewed in Qwen3-Next, now scaled to production readiness. In a year where attention mechanisms are the new frontier, Qwen3.5 stands as a well-rounded, strategically advanced model—proving that being the best at one thing isn’t enough anymore. Being excellent across the board, especially in instruction following and multimodal integration, may be the new standard.

Related Links

Related Links

Related Links

Online Tutorial | Compress a 27B Large Model to 7.2GB! Ternary-Bonsai Uses "ternary Magic" to Make Large Models Run on Personal computers.

Online Tutorial | Compress a 27B Large Model to 7.2GB! Ternary-Bonsai Uses "ternary Magic" to Make Large Models Run on Personal computers.

Command Palette

Qwen3.5 Unveils Hybrid Attention Breakthrough, Competes Across Benchmarks with Sparsity and Multimodal Strength

Related Links

Command Palette

Qwen3.5 Unveils Hybrid Attention Breakthrough, Competes Across Benchmarks with Sparsity and Multimodal Strength

Related Links

Command Palette

Qwen3.5 Unveils Hybrid Attention Breakthrough, Competes Across Benchmarks with Sparsity and Multimodal Strength

Related Links

Online Tutorial | Compress a 27B Large Model to 7.2GB! Ternary-Bonsai Uses "ternary Magic" to Make Large Models Run on Personal computers.

Online Tutorial | Compress a 27B Large Model to 7.2GB! Ternary-Bonsai Uses "ternary Magic" to Make Large Models Run on Personal computers.