DeepSeek Unveils Breakthrough AI Training Method to Scale LLMs Efficiently and Stably
China’s DeepSeek has unveiled a novel AI training method that could significantly simplify the scaling of large language models, potentially influencing the future development of foundational AI systems. The company published a research paper on Wednesday detailing a technique called "Manifold-Constrained Hyper-Connections," or mHC, developed by its founder Liang Wenfeng and other researchers. The method addresses a major challenge in scaling language models: as models grow larger and more complex, internal information sharing between components increases, which can lead to instability and training collapse. DeepSeek’s mHC approach enables richer communication across model layers while maintaining stability and computational efficiency through constrained architectural design. According to Wei Sun, principal analyst for AI at Counterpoint Research, the breakthrough represents a significant leap forward. She described the technique as a strategic innovation that reduces training costs while boosting performance. Sun noted that DeepSeek’s ability to redesign its training stack end-to-end demonstrates strong technical maturity and a capacity for rapid, unconventional experimentation. The announcement follows DeepSeek’s earlier “Sputnik moment” in January 2025, when its R1 reasoning model demonstrated performance on par with top-tier systems like OpenAI’s o1, but at a fraction of the cost. That launch disrupted the AI landscape and signaled China’s growing capability in frontier AI development. Lian Jye Su, chief analyst at Omdia, said the publication of this research reflects a growing confidence in China’s AI industry. By openly sharing key findings while still delivering proprietary advantages through new models, DeepSeek is positioning openness as a strategic strength. The timing of the paper has sparked speculation about the upcoming release of DeepSeek’s next flagship model, R2. Originally expected in mid-2025, the launch was delayed after Liang Wenfeng expressed dissatisfaction with early results and amid ongoing challenges related to access to advanced AI chips. While the new paper does not explicitly reference R2, analysts believe the mHC technique is likely to be integrated into DeepSeek’s next-generation model. Sun suggests that rather than a standalone R2, the innovation may form the foundation of a new version—possibly V4—building on the improvements already rolled into the V3 model. Still, some caution remains. Business Insider’s Alistair Barr noted that despite technical advances, DeepSeek has struggled to gain traction in Western markets, where distribution and ecosystem reach remain critical advantages for leaders like OpenAI and Google. While the new training method marks a major technical achievement, its real-world impact will depend on how effectively DeepSeek can deploy and scale its models globally.
