HyperAIHyperAI

Command Palette

Search for a command to run...

Cerebras Secures $10 Billion Inference Deal with OpenAI

For generative AI to move beyond a fleeting trend and become a mainstream force, inference costs must plummet and token generation speeds must dramatically improve—especially as AI evolves from chatbots to autonomous agents that operate without human oversight. A pivotal development in this race is Nvidia’s $20 billion “aquihire” acquisition of Groq on Christmas Eve 2025, which included licensing Groq’s Learning Processing Unit (LPU) technology and hiring key engineers like co-founder Jonathan Ross and COO Sunny Madra. This move signals Nvidia’s intent to supercharge inference performance, particularly through deterministic, inference-optimized hardware—contrasting with the more general-purpose, dynamically scheduled GPUs. While Nvidia’s Blackwell GB200 and GB300 NVL72 systems have already improved cost per token, and the upcoming Rubin VB200 promises even better efficiency, specialized systems like Cerebras’ CS-3 waferscale machines and Groq’s GroqRack still lead in raw speed. The CS-3, with its WSE-3 wafer-scale engine, delivered 2,700 tokens per second and a time-to-first-token of just 280 milliseconds in tests with OpenAI’s GPT-OSS-120B model—outperforming Groq’s cloud offering despite higher pricing (69 cents per megatoken output vs. Groq’s 75 cents). This raises the question: Why would OpenAI enter a $10 billion cloud deal with Cerebras if cost per token matters? The answer lies in future potential. OpenAI likely has insider knowledge of Cerebras’ upcoming WSE-4 processors and CS-4 systems, expected later in 2026, which may feature 3D-stacked SRAM, optical interconnects, and expanded MemoryX capacity—enabling far greater memory bandwidth and reduced hardware needs. These advancements could drastically cut inference costs and boost performance beyond what current CS-3 clusters offer. The deal involves OpenAI renting 750 megawatts of compute capacity from Cerebras, equivalent to approximately 32,768 CS-3 systems across 16,384 racks. Though the upfront cost would be around $131 billion, OpenAI avoids capital expenditure by leasing capacity. Cerebras, in turn, uses the revenue to fund future expansion, creating a scalable, self-sustaining cloud infrastructure. The partnership is expected to scale from Q1 2026 through 2028, with OpenAI driving massive, continuous workloads that validate the system’s performance and reliability. Cerebras CEO Andrew Feldman emphasizes this as a transformative, world-scale partnership, aiming to bring high-performance inference to a billion users. The deal also likely prevents competitors from accessing Cerebras’ hardware, though antitrust concerns may limit exclusivity. Despite the Groq acquisition, OpenAI is not abandoning its own Titan inference XPU project with Broadcom. Instead, the Cerebras deal appears to be a strategic hedge—leveraging proven, high-performance hardware now while developing proprietary solutions for the future. The real win is not just speed or cost, but proving that large-scale, reliable AI inference is now feasible at a global scale. This partnership marks a turning point: AI is no longer just about training—it’s about deploying intelligent systems that work faster, cheaper, and smarter, paving the way for the next era of augmented thinking.

Related Links