HyperAI

Google Cloud Next unveiled the eighth generation of its custom Tensor Processing Units (TPU), introducing two specialized architectures designed for the emerging era of AI agents: the TPU 8t for training and the TPU 8i for inference. Developed in partnership with Google DeepMind, these chips aim to address the unique demands of autonomous systems that require complex reasoning, multi-step workflows, and continuous learning loops. The TPU 8t serves as a high-performance training engine, engineered to reduce frontier model development cycles from months to weeks. It delivers nearly three times the compute performance per pod compared to the previous generation, offering massive throughput and inter-chip bandwidth. A key feature of the TPU 8t is its ability to maintain over 97% goodput, a metric reflecting productive compute time. This reliability is achieved through advanced capabilities such as real-time telemetry, automatic rerouting of faulty links, and Optical Circuit Switching, which allows hardware to reconfigure around failures without human intervention. These features minimize downtime caused by hardware errors or network stalls, which can otherwise cost significant training time at scale. Conversely, the TPU 8i is optimized for latency-sensitive inference workloads. It features enhanced memory bandwidth to handle the rapid interactions required by AI agents, where even minor inefficiencies can compound at scale. While both chips are capable of running various workloads, their specialized designs unlock significant efficiency gains. The TPU 8i complements the TPU 8t, reflecting Google's strategic decision to anticipate long hardware development cycles by building distinct solutions for training and serving needs. Both new processors run on Google's Axion ARM-based CPU host, enabling a fully co-designed system from software to silicon. This approach optimizes the entire stack for performance and efficiency, moving beyond chip-level metrics to system-wide improvements. The new TPUs support native integration with major frameworks including JAX, PyTorch, SGLang, and vLLM, and offer bare metal access without virtualization overhead. Customers like Citadel Securities are already leveraging TPUs for cutting-edge AI workloads, setting a precedent for industry adoption. Power efficiency remains a central pillar of the new architecture. The TPU 8t and 8i deliver up to twice the performance-per-watt over the previous Ironwood generation. This is achieved through integrated power management, co-designed network connectivity, and fourth-generation liquid cooling technology that supports high-performance densities unattainable with air cooling. Google reports its data centers now deliver six times more computing power per unit of electricity compared to five years ago, a trajectory the new chips continue. Available later this year, the TPU 8t and 8i will serve as the backbone for Google's AI Hypercomputer, a unified platform combining purpose-built hardware, open software, and flexible consumption models. These chips represent the culmination of over a decade of development, addressing the specific infrastructure needs of the agentic era. By redefining what is possible in model training, agent orchestration, and complex reasoning, Google aims to empower organizations to build more capable AI systems efficiently and at scale. Interested customers may request further information regarding availability and implementation.

Related Links

Related Links

Related Links

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Command Palette

Google unveils eighth-gen TPUs for agentic era

Related Links

Command Palette

Google unveils eighth-gen TPUs for agentic era

Related Links

Command Palette

Google unveils eighth-gen TPUs for agentic era

Related Links

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.