HyperAI

Richard Ho, head of hardware at OpenAI, delivered a forward-looking keynote at the AI Infra Summit in Santa Clara, outlining the foundational principles for global-scale computing required to support the next era of generative AI. While he did not disclose details about the rumored “Titan” inference chip co-developed with Broadcom, his talk centered on the transformative infrastructure needs driven by increasingly complex, long-lived AI agents. Ho emphasized that the future of AI lies not just in more powerful models, but in systems capable of sustaining persistent, collaborative agent workflows. Unlike traditional chat interactions that reset after each user input, future agents will operate continuously, executing multi-day tasks, coordinating with other agents, and reacting in real time to dynamic inputs. This shift demands stateful computing, persistent memory, and ultra-low-latency interconnects across racks and data centers—hallmarks of a true global-scale computer. The presentation highlighted exponential growth in compute requirements, illustrated by a chart tracking training compute (in flops) against performance on the MMLU benchmark. GPT-4 notably flattened the curve, but Ho suggested that models like GPT-5—estimated to require around 10²⁷ flops—could approach near-perfect scores, rendering the MMLU obsolete. Meanwhile, newer architectures like the o3 mixture-of-experts model operate around 10²⁶ flops, signaling a move toward efficiency through smarter reasoning rather than sheer scale alone. Another chart traced the evolution of model size from AlexNet (60M parameters in 2012) to GPT-4 (estimated 1.5 trillion parameters), showing continued exponential growth in compute demand despite a flattening trend in recent years. This growth remains feasible only due to advances in reduced-precision arithmetic and efficient data formats, though training costs remain astronomical—and ROI remains uncertain outside a few key players like Nvidia and OpenAI. A major theme was the need for hardware-level safety and alignment. Ho argued that relying solely on software-based safeguards is insufficient, given the "devious" nature of advanced models. He called for integrating real-time kill switches into AI cluster orchestration, silicon-level telemetry to detect anomalous behavior, secure enclaves in CPUs and XPUs, and trusted execution environments to enforce safety policies at the hardware level. He also stressed the importance of observability as a core hardware feature—not just for debugging, but for continuous monitoring of latency, power efficiency, and system reliability. With AI workloads becoming increasingly distributed and optical networking emerging, Ho warned that current network reliability is still inadequate, requiring extensive testing of optical and communication testbeds. Finally, Ho advocated for deep collaboration across chipmakers, foundries, packagers, and cloud providers to ensure dual sourcing and supply chain resilience for critical components. While a bold vision, he acknowledged the challenge of aligning such diverse stakeholders. In sum, Ho’s message was clear: the era of global-scale AI demands not just more compute, but a reimagined infrastructure—secure, observable, low-latency, and built from the ground up for autonomous, persistent agents. OpenAI, with Ho’s deep expertise from Arm, Google, and Lightmatter, is positioning itself at the forefront of this transformation.

OpenAI ترسم مبادئ الحوسبة على نطاق عالمي لمستقبل الذكاء الاصطناعي

Related Links