Nvidia Delays SOCAMM Memory Tech for Next-Gen Rubin GPUs Amid Design and Supply Chain Challenges
Nvidia has postponed the introduction of its cutting-edge SOCAMM memory technology, initially planned for the Blackwell Ultra GB300, to a future generation of GPUs codenamed "Rubin." According to ZDNet, the decision was driven by several factors, including issues with the GB300’s motherboard design and supply chain challenges. The GB300 was originally intended to debut with the SOCAMM technology and was designed for high-performance workstations. It was a smaller, more workstation-friendly variant of the larger GB200, which combines a Blackwell datacenter GPU and a Grace CPU into a motherboard package suitable for OEM desktop and workstation environments. However, the GB300’s motherboard design, initially codenamed "Cordelia," encountered significant reliability problems, leading to a switch to the existing "Bianca" design. "Cordelia" was ambitious, aiming to embed two Grace CPUs and four Blackwell GPUs on the board while leveraging SOCAMM memory. This advanced design suffered from reliability issues, notably data loss, forcing Nvidia to adopt the more stable "Bianca" configuration, which includes a single Grace CPU and two Blackwell GPUs, along with traditional LPDDR memory. Moreover, SOCAMM technology itself faced reliability and thermal management challenges, further complicating its integration into the GB300. These issues, combined with ongoing supply chain difficulties, led to the delay. Nvidia, a company valued at over a trillion dollars, is grappling with production yields as it ramps up its supply chain for the upcoming GB300. Transitioning to existing, more reliable technologies will help alleviate these issues and improve overall product stability. SOCAMM, developed in collaboration with SK Hynix and Micron, represents a significant advancement in memory form factors. It draws inspiration from the CAMM2 standard, designed for datacenter use, and offers superior memory performance and storage density. Each SOCAMM module measures 14x90mm and comprises four 16-die LPDDR5 memory stacks, providing a total of 128GB of capacity and 7.5 Gbps of memory bandwidth. These specifications make SOCAMM a potentially transformative technology for high-performance computing and datacenter applications. The new technology is now set to debut with Rubin, Nvidia’s next-generation datacenter GPU architecture, which will succeed the Blackwell line. While details about Rubin and its ultra variant are scarce, projections indicate that it will support 12 stacks of HBM4E, delivering an impressive 13TB/s of memory bandwidth by 2027. Rubin will utilize 5.5-reticle-size CoWoS interposers and 100mm x 100mm substrates manufactured by TSMC, enhancing its performance capabilities. Additionally, Rubin will be backward-compatible with the existing Blackwell NVL72 infrastructure, ensuring a smooth transition for users. In summary, the delay of SOCAMM to the Rubin GPU generation reflects Nvidia’s commitment to reliability and performance. By addressing the issues with the GB300’s motherboard design and managing supply chain constraints, Nvidia aims to bring this innovative memory technology to market when it can meet the stringent demands of enterprise and datacenter applications. For the latest updates and insights, follow Tom's Hardware on Google News.