HyperAIHyperAI

Command Palette

Search for a command to run...

Moore Threads Unveils Next-Gen Huagang GPUs with 15x Gaming Boost, 50x Ray Tracing Gain, and AI Chip Rivaling Hopper and Blackwell

Chinese GPU manufacturer Moore Threads has unveiled its next-generation architecture, Huagang, during its MUSA Developer Conference, setting the stage for a major leap in both gaming and AI performance. The new architecture will power two upcoming products: Lushan, a high-performance gaming GPU, and Huashan, a specialized AI accelerator, both expected to launch next year. Lushan, the gaming-focused GPU, is set to succeed the current MTT S80 and S90 models, which have struggled to match the performance of mainstream competitors like NVIDIA’s RTX 4060. Moore Threads claims Lushan will deliver a 15x improvement in "AAA" gaming performance—though the specific metric (likely rasterization) remains undefined—and a dramatic 50x increase in ray tracing capabilities. The company also highlights a 64x boost in AI compute, 16x in texture geometry processing, 4x in texture fill rate, 8x in atomic access, and a 4x increase in memory capacity. If accurate, this would mean the new GPUs could feature up to 64 GB of GDDR6 memory, a significant jump from the 16 GB found in current models. A key focus of the new design is modern API support, including full compatibility with DirectX 12 Ultimate, which should improve game compatibility and performance. The architecture includes a second-generation hardware ray tracing engine and a new AI processing block designed for the company’s UniTE unified rendering architecture—aimed at closing the gap with NVIDIA, AMD, and Intel in rendering efficiency and feature completeness. Alongside Lushan, Moore Threads introduced Huashan, a dual-chiplet AI GPU equipped with eight HBM memory modules. The company positions Huashan’s performance as comparable to NVIDIA’s Hopper and Blackwell architectures, with memory bandwidth surpassing even the B200. Moore Threads claims a 50% improvement in compute density and a 10x leap in energy efficiency. The GPU supports FP4 through FP64 precision and introduces proprietary low-precision formats: MTFP4, MTFP6, and MTFP8, which could offer advantages in AI training and inference workloads. Connectivity is another area of focus, with Moore Threads planning to deploy these GPUs at scale in AI data centers. The MTLink 4.0 interconnect is designed to link over 100,000 GPUs at speeds of up to 1314 GB/s, enabling large-scale AI model training. While no benchmarks for the new GPUs have been released, Moore Threads demonstrated the MTT S5000 GPU—part of the current lineup—running DeepSeek V3 at 1000 tokens per second in decode mode and 4000 tokens per second in prefill. These results are said to outperform NVIDIA’s Hopper-based systems, a notable claim given Hopper’s dominance in the Chinese AI GPU market. The MTT S5000 is set to launch next year, but it is not part of the Huagang family. The full details of the Lushan and Huashan GPUs are expected to be revealed in the coming months. As China continues to push for semiconductor self-reliance, Moore Threads’ new hardware could play a pivotal role in building a domestic alternative to Western GPU giants like NVIDIA, AMD, and Intel.

Related Links