Exascale Computing Capabilities

Updated 28 December 2025

Exascale computing capabilities are defined as systems executing at least 10^18 FLOP/s with heterogeneous hardware and scalable architectures.
They leverage innovative node designs, deep memory hierarchies, and high-bandwidth networks to support extreme data-centric workflows and complex scientific simulations.
Co-designed software stacks employing MPI + X models and task-based runtimes ensure fault resilience, energy efficiency, and scalable performance.

Exascale computing denotes systems capable of executing at least $10^{18}$ floating-point operations per second (FLOP/s) with scalable support for extreme concurrency, data-centric workflows, domain-specific heterogeneity, and fault resilience. The transition from petascale ( $10^{15}$ FLOP/s) to exascale requires an order-of-magnitude shift in system architecture, software methodologies, and application co-design to meet the stringent demands of energy efficiency, memory bandwidth, and data movement. This paradigm fundamentally enables scientific discovery in domains ranging from materials and biomolecular science to astrophysics, plasma physics, and engineering simulations (Abdulbaqi, 2018).

1. System Architectures and Hardware Foundations

Modern exascale platforms are shaped by node heterogeneity, deep memory hierarchies, and low-latency, high-bandwidth network fabric. Leading systems such as Aurora (ALCF), Frontier (OLCF), and LUMI (CSC) employ tens of thousands of nodes with multi-socket CPUs (e.g., Intel Sapphire Rapids, AMD EPYC), dense GPU arrays (e.g., Intel Ponte Vecchio, AMD MI250X), and in-node HBM2e/DDR5 memory (Ibeid et al., 3 Dec 2025, Allen et al., 10 Sep 2025).

Key architectural features include:

Heterogeneous Nodes: Hybrid CPU/GPU designs support compute-intensive and data-intensive portions of workflows. CPUs typically offer $2\times52$ cores/node, GPUs up to 6 per node, with per-GPU HBM2e capacity reaching 128 GB and bandwidth exceeding 2 TB/s (Ibeid et al., 3 Dec 2025).
Memory Topology: Multi-level caching (L1/L2/L3), on-package HBM, node-local DDR4/DDR5, and burst-buffer NVMe combine to deliver hierarchical bandwidth $\sim$ PB/s aggregate (Allen et al., 10 Sep 2025).
Network Fabric: Dragonfly/Slingshot interconnects provide $\sim$ 1 PB/s bisection bandwidth with point-to-point latencies $\lesssim2\,\mu$ s. High-radix topologies enable efficient routing and congestion management across $>85,000$ NICs and $>5,000$ switches (Aurora) (Ibeid et al., 3 Dec 2025).
Power Envelope: Entire systems operate within 20–30 MW, driving per-flop energy costs $<$ 1 nJ, a critical constraint guiding kernel fusion, memory locality, and data movement minimization (Abdulbaqi, 2018, Carrasco-Busturia et al., 2024).

2. Software Stacks and Programming Models

Exascale systems integrate multi-layered software stacks designed for portability, fault tolerance, and scalable exploitation of concurrency. High-level architectures are dominated by modular libraries, hybrid parallelism, and task-graph scheduling.

Programming Abstractions:
- MPI + X Hybrids: Distributed-memory MPI combined with node-level OpenMP/CUDA/HIP/DPC++ for threads and accelerator kernels (Abdulbaqi, 2018, Xiaa et al., 1 Oct 2025).
- Task-Based Runtimes: PaRSEC, Parsl, Balsam, and Legion express computations as DAGs, enabling dynamic scheduling and latency hiding (Xiaa et al., 1 Oct 2025).
- Performance Portability: Libraries (AMReX, Kokkos, OCCA, oneAPI) expose single-source code paths retargetable to CPUs, NVIDIA/AMD/Intel GPUs, ARM SVE (Roussel-Hard et al., 6 Mar 2025, Shukla et al., 21 Dec 2025).
Fault Tolerance and Resilience:
- Local checkpoint/restart strategies; algorithm-based fault tolerance (ABFT); asynchronous checkpointing to burst buffers or object stores; global distributed transactions (Abdulbaqi, 2018, Narasimhamurthy et al., 2018).
- Separation of concerns between user code, domain libraries, and device-specific backends allows rapid adaptation as hardware evolves (Giannozzi et al., 2021).
Legacy and Big-Data Support: Integration of HDF5, pNFS, JSON/XML, and MPI Storage Windows preserves support for existing HPC and data-analytic workflows (Narasimhamurthy et al., 2018, Narasimhamurthy et al., 2018).

3. Core Computational Algorithms and Performance Metrics

Application domains at exascale exploit specialized discretizations, parallelization strategies, and communication-avoiding methods for scalable simulation of complex phenomena.

Particle and Mesh-Based Methods:
- Particle-in-cell (PIC), molecular dynamics (MD), and mesh-based finite-volume/spectral-element solvers employ hierarchical domain decomposition (Huebl et al., 2022, Shukla et al., 21 Dec 2025).
- Block-structured AMR enables localized refinement, dynamic load balancing via space-filling curves or patch migration (Vay et al., 2018, Huebl et al., 2022).
- Proxy apps (CabanaMD, ExaMiniMD, ExaSP2) are used for rapid benchmarking and algorithmic co-design (Mniszewski et al., 2021).
Parallel Scaling Laws:
- Strong Scaling: $S(N) = T(1)/T(N)$ , $E(N) = S(N)/N$ . Efficiencies $>$ 70% on O( $10^4$ )– $10^5$ cores/nodes are reported for QM/MM MD, GW, cosmological and hydrodynamics codes (Carrasco-Busturia et al., 2024, Zhang et al., 27 Sep 2025, Frontiere et al., 3 Oct 2025).
- Weak Scaling: $E_{\text{weak}}(N) = T(1)/T(N)$ for fixed per-process workload; observed close to ideal for up to tens of thousands of GPUs/nodes (Frontiere et al., 3 Oct 2025).
Roofline Model: Performance is bounded by $\min\{\pi_{\text{peak}}, \beta_{\text{mem}} \times I\}$ , where $\pi_{\text{peak}}$ is device FLOP/s, $\beta_{\text{mem}}$ is memory bandwidth, and $I$ is arithmetic intensity (Abdulbaqi, 2018, Giannozzi et al., 2021).

4. Data-Centric Computing, I/O, and Storage Hierarchies

Exascale science is characterized by extreme data volumes (100 PB–1 EB/week), necessitating deep, multi-tiered I/O architectures and object-centric data management.

Multi-Tier Storage: NVRAM/3D XPoint (Tier-1, $<$ 20 $\mu$ s latency), SSD (Tier-2), SAS HDD (Tier-3), SMR/SATA archival (Tier-4); aggregate bandwidth $B_{\text{total}} = \sum_{i=1}^4 B_i \cdot N_i$ for $N_i$ devices per tier (Narasimhamurthy et al., 2018, Narasimhamurthy et al., 2018).
Object Stores (Mero/DAOS): Support distributed transactions, containerized data layouts, and metadata-rich indexing for billions of objects (Narasimhamurthy et al., 2018, Allen et al., 10 Sep 2025).
Function Shipping and In-Situ Analytics: Compute offload occurs directly on storage nodes, minimizing data movement energy and latency. MPI Streams decouple simulation and analysis ranks for streaming post-processing (Narasimhamurthy et al., 2018, Narasimhamurthy et al., 2018).
Performance Metrics: Linear scaling of read/write bandwidth up to $\sim45$ GB/s (prototype scale), aggregate sustained I/O $\sim5.5$ TB/s (Frontier-E) during trillion-particle runs (Frontiere et al., 3 Oct 2025).

5. Domain Applications and Performance Benchmarks

Exascale enables previously infeasible simulations and workflows in fundamental and applied science.

Materials Science and Quantum Simulations:
- Massive-scale GW calculations ( $>10^4$ atoms) reach FP64 kernel rates $\sim$ 1.07 EFLOP/s (Frontier) and $0.7$ EFLOP/s (Aurora), with performance portability across AMD/Intel GPUs (Zhang et al., 27 Sep 2025).
- exa-AMD demonstrates automated phase diagram construction via DAG-screened ML and DFT workflows, with $>80\%$ efficiency up to 128 nodes (Xiaa et al., 1 Oct 2025).
- Quantum ESPRESSO achieves 3.3 $\times$ speedup over CPUs for large cell DFT, emphasizing the necessity of accelerator-friendly kernels, fused memory accesses, and portable library interfaces (Giannozzi et al., 2021).
Astrophysics and Cosmology:
- CRK-HACC executes four-trillion particle hydrodynamics runs, attaining $513$ PFLOP/s peak, 46.6 billion particles/s throughput, and $>$ 90\% scaling to 9,000 nodes. I/O hierarchy writes $>$ 100 PB in a week with sustained $5.45$ TB/s (Frontiere et al., 3 Oct 2025).
- HERACLES++ demonstrates sub-degree 3D supernova shock simulations with $O(10^{10})$ cells/s per GPU, leveraging Kokkos/MPI hybrid parallelism and modular functor organization (Roussel-Hard et al., 6 Mar 2025).
- SPACE CoE codes (RAMSES, Pluto, OpenGadget3, BHAC, ChaNGa) achieve $>$ 90\% weak scaling over thousands of GPUs/cores and introduce ML-driven in-situ analysis and federated learning workflows (Shukla et al., 21 Dec 2025).
Fusion/Fission Engineering:
- NekRS achieves trillion-point spectral element CFD on Frontier/Aurora, with sustained rates $\sim12$ PF/s, and demonstrates GPU-resident overset grid Schwarz preconditioning at scale (Min et al., 2024).

6. Co-Design, Energy Constraints, and Future Trends

Exascale capability is predicated on holistic co-design of hardware, system software, and application codes.

Co-Design Methodology: Iterative refinement of proxy apps, modular libraries (Cabana, PROGRESS/BML), and runtime frameworks aligns scientific kernels with hardware capabilities (Mniszewski et al., 2021, Goz et al., 2017).
Energy-Aware Design: Optimizing data movement, kernel fusion, precision management, and DVFS scheduling is mandatory under power envelopes $<$ 30 MW ( $<$ 20 pJ/FLOP target) (Abdulbaqi, 2018, Giannozzi et al., 2021).
Resilience Strategies: Frequent checkpoints, application-aware redundancy, and ABFT mitigate elevated fault rates ( $>10^{-6}$ FIT/h/node) (Abdulbaqi, 2018, Narasimhamurthy et al., 2018).
Programming Paradigm Evolution: Movement toward task-DAG runtimes, asynchronous collective communication, and accelerator programming models (CUDA/HIP/DPC++) ensures scalability, portability, and maintainability (Xiaa et al., 1 Oct 2025, Huebl et al., 2022).
Opportunities and Open Challenges: Integration of quantum/neuromorphic accelerators, federated cross-facility ML, further reductions in memory power, and scalable data analytic/visualization workflows are poised to expand exascale utility (Carrasco-Busturia et al., 2024, Shukla et al., 21 Dec 2025, Goz et al., 2017).

7. Tables: Key System Benchmarks and Scaling Metrics

System	Application	Nodes/GPUs	Peak Perf.	Scaling Eff.
Aurora	HPL-MxP	9,500 nodes	11.64 EF/s	78.84% (@HPL DP)
Frontier-E	CRK-HACC	9,000 nodes	513 PF/s	92% strong, 95% weak
JUWELS	QM/MM MD (MiMiC)	80,000 cores	5.4 ps/day	70% strong
Perlmutter	BLAST PIC	256 GPUs		97% weak
Trinion Fusion	NekRS CHIMERA	33,792 ranks	~12 PF/s	80% strong

References

(Abdulbaqi, 2018) Programming at Exascale: Challenges and Innovations.
(Ibeid et al., 3 Dec 2025) Scaling MPI Applications on Aurora.
(Allen et al., 10 Sep 2025) Aurora: Architecting Argonne's First Exascale Supercomputer for Accelerated Scientific Discovery.
(Zhang et al., 27 Sep 2025) Advancing Quantum Many-Body GW Calculations on Exascale Supercomputing Platforms.
(Min et al., 2024) Exascale Simulations of Fusion and Fission Systems.
(Frontiere et al., 3 Oct 2025) Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability.
(Giannozzi et al., 2021) Quantum ESPRESSO toward the exascale.
(Carrasco-Busturia et al., 2024) Multiscale Biomolecular Simulations in the Exascale Era.
(Xiaa et al., 1 Oct 2025) exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials.
(Huebl et al., 2022) Next Generation Computational Tools for the Modeling and Design of Particle Accelerators at Exascale.
(Vay et al., 2018) Warp-X: a new exascale computing platform for beam-plasma simulations.
(Roussel-Hard et al., 6 Mar 2025) HERACLES++: A multidimensional Eulerian code for exascale computing.
(Mniszewski et al., 2021) Enabling particle applications for exascale computing platforms.
(Goz et al., 2017) Cosmological Simulations in Exascale Era.
(Narasimhamurthy et al., 2018) SAGE: Percipient Storage for Exascale Data Centric Computing.
(Narasimhamurthy et al., 2018) The SAGE Project: a Storage Centric Approach for Exascale Computing.
(Shukla et al., 21 Dec 2025) EuroHPC SPACE CoE: Redesigning Scalable Parallel Astrophysical Codes for Exascale.
(Taffoni et al., 2019) Shall numerical astrophysics step into the era of Exascale computing?

Exascale computing capabilities represent a convergence of heterogeneous hardware, deep software stacks, communication-avoiding algorithms, and resilient I/O architectures, powering unprecedented simulations across all science and engineering domains. Scientific progress at this scale depends on co-engineered workflows capable of sustaining $>10^{18}$ FLOP/s, handling petabyte-to-exabyte data, and reliably maintaining productivity within a strict energy budget.

Markdown Upgrade to Chat

References (18)

Programming at Exascale: Challenges and Innovations (2018)

Scaling MPI Applications on Aurora (2025)

Aurora: Architecting Argonne's First Exascale Supercomputer for Accelerated Scientific Discovery (2025)

Multiscale Biomolecular Simulations in the Exascale Era (2024)

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials (2025)

HERACLES++: A multidimensional Eulerian code for exascale computing (2025)

EuroHPC SPACE CoE: Redesigning Scalable Parallel Astrophysical Codes for Exascale (2025)

SAGE: Percipient Storage for Exascale Data Centric Computing (2018)

Quantum ESPRESSO toward the exascale (2021)

10.

The SAGE Project: a Storage Centric Approach for Exascale Computing (2018)

11.

Next Generation Computational Tools for the Modeling and Design of Particle Accelerators at Exascale (2022)

12.

Warp-X: a new exascale computing platform for beam-plasma simulations (2018)

13.

Enabling particle applications for exascale computing platforms (2021)

14.

Advancing Quantum Many-Body GW Calculations on Exascale Supercomputing Platforms (2025)

15.

Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability (2025)

16.

Exascale Simulations of Fusion and Fission Systems (2024)

17.

Cosmological Simulations in Exascale Era (2017)

18.

Shall numerical astrophysics step into the era of Exascale computing? (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Exascale Computing Capabilities.

Exascale Computing Capabilities

1. System Architectures and Hardware Foundations

2. Software Stacks and Programming Models

3. Core Computational Algorithms and Performance Metrics

4. Data-Centric Computing, I/O, and Storage Hierarchies

5. Domain Applications and Performance Benchmarks

6. Co-Design, Energy Constraints, and Future Trends

7. Tables: Key System Benchmarks and Scaling Metrics

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Exascale Computing Capabilities

1. System Architectures and Hardware Foundations

2. Software Stacks and Programming Models

3. Core Computational Algorithms and Performance Metrics

4. Data-Centric Computing, I/O, and Storage Hierarchies

5. Domain Applications and Performance Benchmarks

6. Co-Design, Energy Constraints, and Future Trends

7. Tables: Key System Benchmarks and Scaling Metrics

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research