AWS Launches AI Chip Trainium4 with NVIDIA NVLink Fusion Integration
Amazon Web Services has unveiled significant advancements in its AI infrastructure, introducing the Trainium3 UltraServer system and teasing the upcoming Trainium4 chip, as part of its push to challenge Nvidia’s dominance in AI hardware. At AWS re:Invent 2025, AWS announced that its third-generation Trainium3 chip delivers over 4x faster performance and 4x more memory than the previous generation, while also being 40% more energy efficient. The Trainium3 UltraServer can host 144 chips per unit, with thousands linked together to support up to 1 million chips in a single AI workload—10 times the capacity of the prior generation. This enables massive-scale AI training and inference, particularly for complex models requiring high throughput and low latency. A key driver behind the success of AWS’s AI chips is its strategic partnership with Anthropic, a major AI company and Amazon investor. AWS revealed that over 500,000 Trainium2 chips power Anthropic’s Project Rainier, one of the largest AI clusters ever built, deployed across multiple U.S. data centers. This collaboration has become a cornerstone of AWS’s AI strategy, with Anthropic relying heavily on AWS for model training—despite also running on Microsoft’s cloud via Nvidia GPUs. Amazon’s CEO Andy Jassy emphasized that Trainium2 has already become a multi-billion-dollar revenue stream, with over 1 million chips in production and used by more than 100,000 companies, especially through AWS’s Bedrock AI development platform. To further accelerate innovation and deployment, AWS has partnered with NVIDIA to integrate its Trainium chips with NVIDIA’s NVLink Fusion platform. This rack-scale architecture enables high-bandwidth, low-latency interconnectivity between AI chips using NVLink 6 technology and the MGX rack design. By incorporating the NVLink Fusion chiplet and Vera-Rubin NVLink Switch tray, AWS can connect up to 72 custom ASICs all-to-all at 3.6 TB/s per ASIC—delivering a total of 260 TB/s of scale-up bandwidth. This integration allows Trainium4 to coexist with Nvidia GPUs in the same system, enabling hybrid deployments that combine Amazon’s cost-efficient infrastructure with Nvidia’s proven performance and CUDA ecosystem. While CUDA remains the dominant software framework for AI development, AWS’s strategy aims to reduce dependency on Nvidia by offering competitive price-performance advantages. The Trainium4 chip, currently in development, will support NVLink Fusion, allowing seamless interoperability with Nvidia GPUs while leveraging AWS’s proprietary Nitro System, Elastic Fabric Adapters (EFAs), and Graviton CPUs. This hybrid approach could help AWS attract AI developers who want to avoid vendor lock-in or reduce costs without sacrificing performance. The collaboration also addresses major challenges in deploying custom AI silicon: long development cycles, complex supplier ecosystems, and high risks of delays. NVLink Fusion provides a modular, proven infrastructure stack—including GPUs, DPUs, networking hardware, and cooling systems—enabling hyperscalers to reduce time-to-market and lower development costs. Despite Nvidia’s entrenched position in AI chips and software, AWS’s combination of in-house silicon, strategic partnerships, and cost leadership positions it as a formidable competitor. With Trainium3 already driving substantial revenue and Trainium4 poised to enhance interoperability, AWS is building a scalable, flexible AI ecosystem that could reshape the industry. While full disruption of Nvidia may be unlikely, AWS is clearly aiming to capture a significant share of the AI infrastructure market—offering customers a powerful, cost-effective alternative.
