German Team Unveils DeepSeek R1T2 Model, Achieving 200% Speed Improvement
Scale AI seems to have taken too long to release R2, prompting German company TNG Technology Consulting GmbH (TNG) to develop its own variant, the DeepSeek-TNG R1T2 Chimera model (R1T2). This new model boasts a 200% speed improvement over Deepseek-R1-0528 and is an open-source hybrid model with 671 billion parameters. It is the latest addition to TNG's Chimera series of large models. Deepseek-R1-0528 is known for its detailed and lengthy responses due to extended chain-of-thought reasoning. In contrast, R1T2 is designed to be more concise, generating equally intelligent answers with significantly fewer words. This reduction in verbosity translates directly into faster inference times and lower computational costs. The R1T2 model builds on the Assembly-of-Experts (AoE) method proposed by TNG. Henrik Klagges, TNG’s co-founder and the first author of the related paper, has been at the helm of the company for 24 years. Klagges graduated from Oxford University in 1994 and founded TNG in 2001. Today, the company employs 917 people, 99.9% of whom hold academic degrees, and over 50% have Ph.D.s in mathematics, physics, or computer science. In previous experiments, TNG combined the expert tensors from DeepSeek-V3-0324 and DeepSeek-R1 to create the DeepSeek-R1T-Chimera model (R1T). The R1T2 further enhances this approach, achieving significant improvements in efficiency and speed while retaining the strong reasoning capabilities of DeepSeek-R1. Specifically, R1T2 integrates three parent models: Deepseek-R1-0528, Deepseek-R1, and DeepSeek-V3-0324. It inherits the reasoning strength of Deepseek-R1-0528, the structured thought patterns of Deepseek-R1, and the concise command-oriented behavior of DeepSeek-V3-0324, all without additional fine-tuning or retraining. Benchmark tests conducted by TNG show that R1T2 maintains 90% to 92% of the intelligence of its most sophisticated parent model, DeepSeek-R1-0528, while reducing the number of output tokens by 60%. This efficiency allows R1T2 to cut inference time and computing load, thereby doubling the speed. Additionally, R1T2 is about 20% more concise than the original DeepSeek-R1, making it highly beneficial for high-throughput or cost-sensitive deployments without sacrificing intelligence. The AoE method differs from the Mixture of Experts (MoE) approach, which dynamically activates different components or "experts" based on input. In traditional MoE models like DeepSeek-V3 and Mixtral, only a subset of experts (e.g., 8 out of 256) is active during each forward pass of a given token. This allows for large models with high parameter counts and specialized functions while keeping inference costs manageable, as each token only activates a small portion of the network. Pretraining these large models can require 10^13 to 10^15 floating-point operations (FLOPs), making it both costly and inefficient. To leverage these investments more effectively, TNG developed the AoE method. Unlike MoE, AoE is a model fusion technique that selectively interpolates the weight tensors of pre-trained MoE models to create a new composite model. This process can be done in linear time, allowing for the creation of efficient sub-model variants from existing parent models. Weight tensors are individually interpolated, enhancing or suppressing semantic features as needed. TNG's implementation of AoE focuses on merging routing expert tensors, which handle specialized reasoning, while retaining efficient shared layers and attention mechanisms from faster models like DeepSeek-V3-0324. This results in the Chimera models, including R1T and R1T2, inheriting reasoning abilities while avoiding the verbosity and latency issues of the most powerful parent models. The benefits of R1T2 are clear for CTOs, AI platform owners, engineering managers, and IT procurement teams: 1. Lower Inference Costs: With fewer output tokens per task, R1T2 reduces GPU usage and energy consumption, saving infrastructure costs, especially in high-throughput or real-time environments. 2. High-Quality Inference Without Redundancy: R1T2 retains much of the reasoning capability of top models like DeepSeek-R1-0528 but avoids their verbose responses, making it ideal for structured tasks such as math, programming, and logic, where concise answers are preferred. 3. Open-Source and Customizable: The MIT license allows full deployment control and customization, enabling private hosting, model alignment, or further training in regulated or isolated environments. 4. Modular Future: The AoE method hints at a future where models can be built modularly, assembling specialized variants from existing models instead of training them from scratch. However, companies using R1T2 should be aware of potential limitations, particularly those reliant on function calls, tool use, or advanced agent orchestration. TNG provides early variants of Chimera models through platforms like OpenRouter and Chutes, which process billions of tokens daily. While R1T2 is well-suited for general inference tasks, TNG advises against its use for scenarios requiring function calls due to limitations inherited from the DeepSeek-R1 series. For European users, TNG recommends evaluating whether R1T2 complies with the EU AI Act, set to take effect on August 2, 2025. Companies operating within the EU should review the regulations and consider discontinuing use if they cannot meet the requirements. However, U.S.-based companies serving U.S. or international users are not subject to these regulations, giving them greater flexibility in deploying the free, fast, and open-source inference model. This development reflects a broader trend where global developers are increasingly creating advanced AI variants based on established models, showcasing the growing influence of non-U.S. tech communities in the AI landscape. References for further reading include the related paper on arXiv, the Hugging Face repository, and statements from AI industry leaders on platforms like X.