Ironwood: Quantum-Resilient Protocols & AI Hardware
- Ironwood is a dual system integrating a post-quantum cryptographic protocol based on braid groups with a seventh-generation TPU for AI.
- The Ironwood MKAAP leverages non-abelian braid group algebra to resist both classical and quantum attacks, achieving efficient key agreement on constrained platforms.
- The TPU 7 architecture employs multi-chiplet designs and high-speed interconnects to deliver exascale AI performance with enhanced energy efficiency and fault tolerance.
Ironwood denotes two distinct and significant systems in contemporary computing and cryptography: the Ironwood Meta Key Agreement and Authentication Protocol (MKAAP), an advanced post-quantum authentication and key agreement protocol leveraging braid group algebra, and the Ironwood (TPU 7), the seventh generation of Google’s AI training supercomputers, representing a culmination of eight years of architectural scaling in high-performance, resilient, and sustainable AI hardware. The following sections provide a comprehensive exposition of both Ironwood protocols and systems, including their internal methodology, core mathematical and computational innovations, performance metrics, and architectural significance in their respective domains.
1. Ironwood Meta Key Agreement and Authentication Protocol (MKAAP)
Ironwood MKAAP is an asymmetric-style protocol for mutual authentication and ephemeral key agreement, designed to withstand quantum attacks by exploiting the complexity of group-theoretic operations in braid groups rather than elliptic curve or number-theoretic primitives. Deployment enables authentication of two entities, “Home Device” (HD) and “Device” (), with only a single pre-provisioning stage from a Trusted Third Party (TTP), but without real-time third-party interaction (Anshel et al., 2017).
System Model and Provisioning
Key parameters and entities in Ironwood include:
- : Even integer, , size of the Artin braid group .
- : Finite field, .
- : Non-singular base matrix in .
- , : TTP-selected conjugate sets in 0 whose elements pairwise commute across sets.
- 1-values 2: Non-unit elements in 3, distributed per device.
Key provisioning involves TTP sampling private matrices 4 (polynomials in 5) and private braids for each 6, generating signed device-specific certificates 7. HD only needs to store 8 and 9-values post-provisioning.
Key Agreement and Protocol Flow
The interactive protocol consists of these high-level stages:
- 0 presents 1 to HD.
- HD selects random matrices and braids, computes E-Multiplications via the Colored Burau representation, then blends these with the public data from 2.
- Shared secret is computed from designated columns of resulting matrices and exchanged, with device-side verification ensuring protocol consistency and mutual authentication.
- Mutual authentication is finalized using hashes or MACs over the shared secret and fresh nonces.
Algebraic Backbone
The protocol’s security derives from the infinite, non-abelian, torsion-free structure of 3, their representation in terms of colored Burau matrices, and the E-Multiplication operation:
4
where 5 substitutes 6-values into the corresponding Laurent polynomial entries, and 7 permutes matrix indices.
2. Security Properties and Quantum Resistance of MKAAP
Ironwood MKAAP is specifically constructed to resist both classical and quantum attacks:
- Classical attacks such as invalid-public-key, length-based, and simultaneous conjugacy attacks are prevented since adversaries cannot obtain both conjugate sets and state validation requires nonzero entry checks and valid certificates.
- Quantum resistance: Shor’s algorithm is ineffective over 8 due to its non-abelian, infinite nature. Secret guessing complexity scales linearly in 9, so Grover’s quantum search only provides quadratic, not exponential, improvement. For 0, 1, brute-force attack cost exceeds 2.
- Weak key mitigation: Probability of weak (commuting) matrices occurring is negligible: 3.
3. Implementation and Performance: MKAAP
Ironwood is engineered for efficient execution on resource-constrained platforms typical in IoT:
| Platform | Clock | ROM (bytes) | RAM (bytes) | Avg. Key-Agreement Time |
|---|---|---|---|---|
| MSP430 | 25 MHz | 3,126 | 354 | 212 ms |
| ARM Cortex-M3 (LPC1768) | 48 MHz | 2,578 | 1,192 | 37.4 ms |
| ARM Cortex-M3 (CC2650) | 48 MHz | 3,568 | 1,192 | 37.4–37.6 ms |
For comparison, Curve25519 key agreement typically requires 200–700 ms and 48 kB code size on comparable MCUs. This demonstrates that Ironwood achieves sub-millisecond mutual authentication and shared secret agreement at ROM 5 kB, RAM 6 kB, and with quantum-resilient primitives (Anshel et al., 2017).
4. Architectural Innovations in Google Ironwood TPU (TPU 7)
Google’s “Ironwood” denotes the seventh-generation TPU, representing the apex of a lineage focused on architectural stability, massive scaling, and efficiency for AI training applications (Jouppi et al., 14 Jun 2026). Its architecture is characterized by:
- Multi-chiplet packaging: Two compute dies per package with eight HBM3E stacks, four per die; four times the HBM2E stacks of previous generations.
- Enhanced TensorCores: Each with four 7 BF16 arrays and four 8 FP8 arrays; doubles both count and size of v5p arrays.
- Vector and fabric scaleout: 16 vector lanes of 9-bit (was 0), each with four full ALUs (was two restricted).
- Persistent SparseCores: Four per node, each with 16 tiles for embedding and collective operations.
- High-speed interconnect: Six 1 GB/s ICI links per node, full 3D torus at pod scale, with distributed on-chip routers.
The VMEM scratchpad remains compiler-managed (128 MiB), eschewing hardware caches for predictable memory operations.
5. Performance, Scaling, and Power Efficiency of Ironwood TPU
Ironwood achieves significant scaling in compute and memory bandwidth:
- HBM memory: 2 GiB/node (3 increase vs. v2); 4 GB/s/node (5).
- Per-node compute performance:
- 6 PFLOPS
- 7 PFLOPS
- Pod-scale throughput: 8; 9 EFLOPS, 0 EFLOPS.
Power and sustainability advances are quantitatively significant:
- Power efficiency: 1; 2 improvement over TPU v2, driven by both architectural and process scaling.
- Carbon intensity (“CCI”): 3 gCO₂e/ExaFLOP (4 gCO₂e/FLOP), a 5 (operational) and 6 (embodied) improvement over TPU v4.
6. Fault Tolerance, Network Architecture, and Scalability
Ironwood incorporates advanced features for large-scale AI job reliability and deployment:
- Optical circuit switches (OCS): Millisecond-responsive, scalable 3D MEMS-mirror OCSes support rapid topology changes, incremental upgrades, and routing around failures; a single cube of 7 chips forms the network building block.
- Functional Built-In Self-Test (FBIST): MXU-embedded PVT testers run during production, burn-in, and in situ, targeting silent fault exclusion.
- Hardware VPU replay: Compiler-transparent, lane-randomized replay for on-the-fly detection of transient datapath errors, maintaining 890% goodput at pod scale.
7. Defining Features of Ironwood Systems
Across both Ironwood cryptography and AI hardware, the following design features are emphasized (Anshel et al., 2017, Jouppi et al., 14 Jun 2026):
- For MKAAP: Unique blend of asymmetric deployment properties and symmetric-like TTP bootstrapping, quantum-resilient braid group structure, and extremely low resource demands for target platforms.
- For TPU 7: Enduring value of systolic matrix-multiply cores, narrow floating-point formats (BF16/FP8/FP4), dedicated HBM main memory, custom high-speed interconnects, DMA-managed on-chip SRAM, and vector units supporting general non-matrix operations. OCS-enabled scaling and SparseCore-accelerated embedding and collective operations are distinctive to the TPU lineage.
These systems illustrate that stability in architectural primitives, coupled with targeted advances in scale, resilience, and efficiency, enables quantum-resilient cryptography and exascale AI training with regime-leading energy and carbon efficiency (Anshel et al., 2017, Jouppi et al., 14 Jun 2026).