HyperAI
Back to Headlines

Tencent Unveils Hunyuan-A13B: Efficient 13B Active Parameter MoE Model with Dual-Mode Reasoning and 256K Context Length

2 days ago

Tencent’s Hunyuan team has unveiled Hunyuan-A13B, a new open-source large language model (LLM) designed with a sparse Mixture-of-Experts (MoE) architecture. Despite having a total of 80 billion parameters, only 13 billion are active during inference, making it highly efficient and cost-effective. The model is notable for its dual-mode reasoning capability, supporting both fast and slow thinking, and handling contexts up to 256K tokens. Architecture: Sparse MoE with 13B Active Parameters Hunyuan-A13B's core design features a fine-grained MoE system, including one shared expert and 64 non-shared experts, with eight experts activated per forward pass. This setup, informed by extensive scaling experiments, maintains high performance while minimizing computational demands. The model comprises 32 layers, uses SwiGLU activations, and has a vocabulary size of 128,000 tokens. It leverages Grouped Query Attention (GQA) to enhance memory efficiency, particularly during long-context inference. Training Curriculum: Optimized for Long-Context Adaptation The training process involves a comprehensive curriculum, starting with a 20 trillion-token pretraining phase. This is followed by fast annealing and long-context adaptation, where the context window is scaled up to 32K tokens and then to 256K using NTK-aware positional encoding. This ensures consistent and stable performance across a wide range of sequence lengths. Dual-Mode Reasoning: Balancing Latency and Complexity One of the most innovative aspects of Hunyuan-A13B is its dual-mode Chain-of-Thought (CoT) capability. Users can choose between a low-latency fast-thinking mode for quick, routine queries and a more detailed slow-thinking mode for complex, multi-step reasoning tasks. The selection is straightforward, using tags like "/no think" for fast inference and "/think" for reflective reasoning. This flexibility optimizes computational resources based on the task's complexity. Post-Training: Enhancing Performance with Reinforcement Learning After initial training, Hunyuan-A13B undergoes multi-stage supervised fine-tuning (SFT) and reinforcement learning (RL). The RL phases incorporate task-specific reward models, including sandbox execution environments for code and rule-based checks for agents. In the agent training phase, the team created a variety of tool-use scenarios, combining roles like planner, checker, and tool executor, to generate over 20,000 format combinations. This extensive training reinforces the model's ability to perform real-world workflows, such as spreadsheet processing, information search, and structured reasoning. Evaluation: Top-Tier Agentic Performance Hunyuan-A13B achieves state-of-the-art performance across numerous agentic benchmarks, including BFCL-v3, τ-Bench, C3-Bench, and ComplexFuncBench. It outperforms larger models in tool-calling tasks and long-context scenarios. For instance, it scores 87.7 on PenguinScrolls, close to Gemini 2.5 Pro, and maintains a high score of 73.9 on RULER even at contexts ranging from 64K to 128K tokens. In these contexts, it surpasses models like Qwen3-A22B and DeepSeek R1. Inference Optimization and Deployment The model is well-integrated with popular inference frameworks like vLLM, SGLang, and TensorRT-LLM. It supports various precision formats, including W16A16, W8A8, and KV Cache FP8, along with advanced features like Auto Prefix Caching and Chunk Prefill. Hunyuan-A13B boasts impressive throughput, achieving up to 1981.99 tokens per second on a 32-batch input (2048 input, 14336 output length), making it suitable for real-time applications. Open Source and Industry Impact Hunyuan-A13B is available on platforms like Hugging Face and GitHub under a permissive open-source license. Its architecture is optimized for efficient research and production use, particularly in environments sensitive to latency and long-context tasks. By offering a highly scalable and capable model, Tencent aims to democratize access to powerful LLMs, encouraging broader experimentation and deployment in the AI community. Conclusion Combining the efficiency of MoE architecture, robust dual-mode reasoning, and open-source availability, Hunyuan-A13B stands out as a versatile and powerful option for AI developers. It addresses key challenges in computational efficiency and task complexity, making it a valuable addition to the landscape of large language models. To explore the full capabilities of Hunyuan-A13B, you can check out the research paper. Credit for this groundbreaking work goes to the researchers involved in the project. Stay updated with the latest developments by following us on Twitter and joining our 100,000+ member ML SubReddit, and subscribe to our newsletter.

Related Links