Pegatron Unveils 128-GPU Rack-Scale System with AMD Instinct MI350X for High-Performance AI Training and Inference
Pegatron Unveils Cutting-Edge AI Rack with 128 AMD MI350X GPUs At Computex, Pegatron introduced a powerful rack-scale solution powered by 128 of AMD’s next-generation Instinct MI350X accelerators, designed to tackle demanding AI inference and training tasks. While AMD is currently developing its own in-house rack-scale solutions, Pegatron’s system will serve as a valuable stepping stone, allowing them to refine designs for the upcoming AMD Instinct MI450X-based IF64 and IF128 platforms, expected to be available in about a year. The Pegatron AS501-4A1/AS500-4A1 rack-scale system consists of eight 5U compute trays. Each tray houses an AMD EPYC 9005-series processor and four AMD Instinct MI350X accelerators, specifically optimized for AI and high-performance computing (HPC) applications. Liquid cooling is employed to maintain maximum and consistent performance even under heavy computational loads. Constructed in a 51OU ORV3 form factor, the system is ideal for cloud data centers that adhere to the Open Compute Project (OCP) standards, such as those used by Meta. To facilitate communication between GPUs in different chassis, the system employs 400 GbE Ethernet. This approach contrasts with Nvidia’s GB200/GB300 NVL72 platform, which uses the company’s ultra-fast NVLink interconnect to link 72 GPUs. As a result, the scaling capabilities of AMD’s Instinct MI350X system are limited compared to Nvidia’s NVL72, which can handle more tightly synchronized large language model (LLM) training tasks. Pegatron’s 128-GPU rack-scale system offers a theoretical peak performance of up to 1,177 petaflops (PFLOPS) of FP4 compute for inference, assuming near-linear scalability. Each MI350X accelerator supports up to 288GB of high-bandwidth memory (HBM3E), providing the system with a total of 36.8 terabytes of high-speed memory. This extensive memory capacity allows the system to support very large AI models, potentially outpacing the memory capabilities of Nvidia’s current Blackwell-based GPUs. Despite its limitations in scalability, the system represents a significant advancement in the field, particularly for inference workloads and multi-instance training scenarios. Its robust performance and substantial memory resources make it a compelling option for researchers and organizations working with large datasets and complex models today. Moreover, the system serves as a crucial testbed for developing and optimizing AMD Instinct-based solutions, paving the way for the next-generation Instinct MI400-series products. By challenging Nvidia’s current dominance in the high-performance AI market, Pegatron’s system underscores the growing competition and innovation in this sector. It highlights the importance of adhering to open standards like OCP, fostering collaboration and broader adoption of advanced AI technologies across the industry. For the latest updates, analysis, and reviews in the tech world, follow Tom's Hardware on Google News. Don’t forget to click the "Follow" button to stay informed about future developments.