OpenAI Outlines Vision for Global-Scale AI Computing and Agent-Driven Infrastructure
OpenAI has outlined the foundational principles for building global-scale computing systems necessary to support the next era of artificial intelligence, emphasizing that the future of AI is not just about more powerful models, but about fundamentally rethinking infrastructure to handle complex, long-running, agent-driven workloads. Richard Ho, head of hardware at OpenAI, delivered the vision during his keynote at the AI Infra Summit in Santa Clara, framing the challenge as one of creating computing systems that operate at a planetary scale—far beyond the warehouse-scale data centers of the past. Ho highlighted the exponential growth in compute requirements for training large language models, illustrating the trend with a chart plotting aggregate compute against performance on the Massive Multitask Language Understanding (MMLU) benchmark. While GPT-4 demonstrated a notable improvement in bending this curve, Ho noted that future models like GPT-5 and the o3 series—likely leveraging mixture-of-experts or chain-of-thought reasoning—will require staggering amounts of compute, potentially reaching 10^27 floating-point operations. He suggested that such models may asymptotically approach perfect scores on MMLU, rendering the test obsolete, though he stopped short of confirming exact figures. Another chart traced the evolution of model size and compute from AlexNet in 2012 to GPT-4, which OpenAI estimates has around 1.5 trillion parameters. Despite a flattening of the growth curve in recent years, the underlying trend remains exponential, driven by advances in precision—such as reduced floating-point and integer bit-widths—that have made training increasingly feasible, though still extremely costly. The central shift Ho identified is the move from interactive, human-paced conversations to persistent, long-lived agent workflows. In this new paradigm, AI agents operate autonomously over extended periods, performing tasks in the background without constant human input. This demands stateful computing, sustained memory retention, and real-time coordination across multiple agents. Ho stressed that this requires low-latency interconnects not just within racks, but across data centers and even continents, to ensure synchronized, reliable operation. A key challenge, he said, is managing tail latencies—delays in communication that can derail complex, multi-agent tasks. If one agent is delayed in sharing critical information, it can compromise the entire workflow. This places immense pressure on networking infrastructure, which Ho described as a major source of tension in current AI system designs. Beyond performance, Ho emphasized the need for hardware-level safety and alignment. He argued that relying solely on software-based safeguards is insufficient, as models can behave unpredictably. To counter this, he called for embedded security features such as real-time kill switches in the orchestration fabric, silicon-level telemetry to detect anomalous behavior, secure enclaves in CPUs and XPUs, and trusted execution environments that enforce alignment policies at the chip level. Ho also pointed out the lack of standardized benchmarks for agent-aware hardware and systems, underscoring the need for better observability—built into hardware, not just used for debugging. He expressed concern about the reliability of emerging optical networking technologies, calling for extensive testing of optical testbeds before widespread deployment. Finally, he advocated for deeper collaboration across the AI ecosystem—between chipmakers, packaging firms, foundries, and cloud providers—to ensure dual sourcing and supply chain resilience for critical components. While OpenAI can call for such partnerships, the execution will depend on the willingness of industry players to align on shared infrastructure goals. In sum, Ho’s message is clear: the future of AI is not just about bigger models, but about building a new kind of global computing infrastructure—one that is persistent, secure, coordinated, and capable of supporting intelligent agents that act autonomously at machine speed.