Create a Video View Paper

Five Generations of TPU: Architectural Stability at Supercomputer Scale

This presentation examines Google's TPU evolution from v2 to Ironwood, revealing how a stable microarchitecture achieved 3600x performance scaling while improving power efficiency 30-fold. We explore the resilience innovations enabled by optical circuit switches, the surprising adaptability of domain-specific ASICs across shifting DNN paradigms, and the emergence of Compute Carbon Intensity as a holistic sustainability metric for AI infrastructure.

Script

Google's TPU training supercomputers scaled 3600 times in aggregate performance across five generations, despite the end of Dennard scaling and Moore's Law slowdown. The authors document how architectural stability, not constant redesign, enabled this exponential growth from TPU v2 to Ironwood.

The two-core TPU microarchitecture with systolic matrix units, vector processors, and compiler-controlled memory remained unchanged across all generations. This stability allowed the same hardware to seamlessly adapt from early convolutional networks to Transformers and diffusion models, disproving early skepticism about domain-specific ASIC longevity.

Beginning with TPU v4, optical circuit switches transformed system resilience by enabling independent rack commissioning and dynamic topology reconfiguration. When hardware fails, the OCS isolates defective units and reroutes traffic instantly, achieving over 90% goodput for synchronous training across 9216-node supercomputers.

Performance per Watt improved 30-fold across the TPU lineage, with Ironwood alone delivering a 6-times jump over the previous generation. The authors reveal that design priorities shifted from performance per total cost of ownership to performance per Watt, driven by hard physical limits on data center power availability.

Compute Carbon Intensity combines embodied and operational emissions into a single metric measuring grams of carbon dioxide equivalent per exaFLOP. Ironwood reduced both components nearly fourfold compared to TPU v5p, enabling practical carbon budgeting for foundation model training and shifting sustainability from aspiration to measurable engineering constraint.

The TPU trajectory demonstrates that training accelerators can maintain a stable microarchitecture across paradigm shifts, much like the historic persistence of IBM 360 and x86 instruction sets. If you want to explore more research like this and create your own explainer videos, visit EmergentMind.com.