Papers
Topics
Authors
Recent
2000 character limit reached

Exascale Computing Capabilities

Updated 28 December 2025
  • Exascale computing capabilities are defined as systems executing at least 10^18 FLOP/s with heterogeneous hardware and scalable architectures.
  • They leverage innovative node designs, deep memory hierarchies, and high-bandwidth networks to support extreme data-centric workflows and complex scientific simulations.
  • Co-designed software stacks employing MPI + X models and task-based runtimes ensure fault resilience, energy efficiency, and scalable performance.

Exascale computing denotes systems capable of executing at least 101810^{18} floating-point operations per second (FLOP/s) with scalable support for extreme concurrency, data-centric workflows, domain-specific heterogeneity, and fault resilience. The transition from petascale (101510^{15} FLOP/s) to exascale requires an order-of-magnitude shift in system architecture, software methodologies, and application co-design to meet the stringent demands of energy efficiency, memory bandwidth, and data movement. This paradigm fundamentally enables scientific discovery in domains ranging from materials and biomolecular science to astrophysics, plasma physics, and engineering simulations (Abdulbaqi, 2018).

1. System Architectures and Hardware Foundations

Modern exascale platforms are shaped by node heterogeneity, deep memory hierarchies, and low-latency, high-bandwidth network fabric. Leading systems such as Aurora (ALCF), Frontier (OLCF), and LUMI (CSC) employ tens of thousands of nodes with multi-socket CPUs (e.g., Intel Sapphire Rapids, AMD EPYC), dense GPU arrays (e.g., Intel Ponte Vecchio, AMD MI250X), and in-node HBM2e/DDR5 memory (Ibeid et al., 3 Dec 2025, Allen et al., 10 Sep 2025).

Key architectural features include:

  • Heterogeneous Nodes: Hybrid CPU/GPU designs support compute-intensive and data-intensive portions of workflows. CPUs typically offer 2×522\times52 cores/node, GPUs up to 6 per node, with per-GPU HBM2e capacity reaching 128 GB and bandwidth exceeding 2 TB/s (Ibeid et al., 3 Dec 2025).
  • Memory Topology: Multi-level caching (L1/L2/L3), on-package HBM, node-local DDR4/DDR5, and burst-buffer NVMe combine to deliver hierarchical bandwidth \simPB/s aggregate (Allen et al., 10 Sep 2025).
  • Network Fabric: Dragonfly/Slingshot interconnects provide \sim1 PB/s bisection bandwidth with point-to-point latencies 2μ\lesssim2\,\mus. High-radix topologies enable efficient routing and congestion management across >85,000>85,000 NICs and >5,000>5,000 switches (Aurora) (Ibeid et al., 3 Dec 2025).
  • Power Envelope: Entire systems operate within 20–30 MW, driving per-flop energy costs <<1 nJ, a critical constraint guiding kernel fusion, memory locality, and data movement minimization (Abdulbaqi, 2018, Carrasco-Busturia et al., 3 Mar 2024).

2. Software Stacks and Programming Models

Exascale systems integrate multi-layered software stacks designed for portability, fault tolerance, and scalable exploitation of concurrency. High-level architectures are dominated by modular libraries, hybrid parallelism, and task-graph scheduling.

3. Core Computational Algorithms and Performance Metrics

Application domains at exascale exploit specialized discretizations, parallelization strategies, and communication-avoiding methods for scalable simulation of complex phenomena.

4. Data-Centric Computing, I/O, and Storage Hierarchies

Exascale science is characterized by extreme data volumes (100 PB–1 EB/week), necessitating deep, multi-tiered I/O architectures and object-centric data management.

  • Multi-Tier Storage: NVRAM/3D XPoint (Tier-1, <<20 μ\mus latency), SSD (Tier-2), SAS HDD (Tier-3), SMR/SATA archival (Tier-4); aggregate bandwidth Btotal=i=14BiNiB_{\text{total}} = \sum_{i=1}^4 B_i \cdot N_i for NiN_i devices per tier (Narasimhamurthy et al., 2018, Narasimhamurthy et al., 2018).
  • Object Stores (Mero/DAOS): Support distributed transactions, containerized data layouts, and metadata-rich indexing for billions of objects (Narasimhamurthy et al., 2018, Allen et al., 10 Sep 2025).
  • Function Shipping and In-Situ Analytics: Compute offload occurs directly on storage nodes, minimizing data movement energy and latency. MPI Streams decouple simulation and analysis ranks for streaming post-processing (Narasimhamurthy et al., 2018, Narasimhamurthy et al., 2018).
  • Performance Metrics: Linear scaling of read/write bandwidth up to 45\sim45 GB/s (prototype scale), aggregate sustained I/O 5.5\sim5.5 TB/s (Frontier-E) during trillion-particle runs (Frontiere et al., 3 Oct 2025).

5. Domain Applications and Performance Benchmarks

Exascale enables previously infeasible simulations and workflows in fundamental and applied science.

  • Materials Science and Quantum Simulations:
    • Massive-scale GW calculations (>104>10^4 atoms) reach FP64 kernel rates \sim1.07 EFLOP/s (Frontier) and $0.7$ EFLOP/s (Aurora), with performance portability across AMD/Intel GPUs (Zhang et al., 27 Sep 2025).
    • exa-AMD demonstrates automated phase diagram construction via DAG-screened ML and DFT workflows, with >80%>80\% efficiency up to 128 nodes (Xiaa et al., 1 Oct 2025).
    • Quantum ESPRESSO achieves 3.3×\times speedup over CPUs for large cell DFT, emphasizing the necessity of accelerator-friendly kernels, fused memory accesses, and portable library interfaces (Giannozzi et al., 2021).
  • Astrophysics and Cosmology:
    • CRK-HACC executes four-trillion particle hydrodynamics runs, attaining $513$ PFLOP/s peak, 46.6 billion particles/s throughput, and >>90\% scaling to 9,000 nodes. I/O hierarchy writes >>100 PB in a week with sustained $5.45$ TB/s (Frontiere et al., 3 Oct 2025).
    • HERACLES++ demonstrates sub-degree 3D supernova shock simulations with O(1010)O(10^{10}) cells/s per GPU, leveraging Kokkos/MPI hybrid parallelism and modular functor organization (Roussel-Hard et al., 6 Mar 2025).
    • SPACE CoE codes (RAMSES, Pluto, OpenGadget3, BHAC, ChaNGa) achieve >>90\% weak scaling over thousands of GPUs/cores and introduce ML-driven in-situ analysis and federated learning workflows (Shukla et al., 21 Dec 2025).
  • Fusion/Fission Engineering:
    • NekRS achieves trillion-point spectral element CFD on Frontier/Aurora, with sustained rates 12\sim12 PF/s, and demonstrates GPU-resident overset grid Schwarz preconditioning at scale (Min et al., 27 Sep 2024).

Exascale capability is predicated on holistic co-design of hardware, system software, and application codes.

  • Co-Design Methodology: Iterative refinement of proxy apps, modular libraries (Cabana, PROGRESS/BML), and runtime frameworks aligns scientific kernels with hardware capabilities (Mniszewski et al., 2021, Goz et al., 2017).
  • Energy-Aware Design: Optimizing data movement, kernel fusion, precision management, and DVFS scheduling is mandatory under power envelopes <<30 MW (<<20 pJ/FLOP target) (Abdulbaqi, 2018, Giannozzi et al., 2021).
  • Resilience Strategies: Frequent checkpoints, application-aware redundancy, and ABFT mitigate elevated fault rates (>106>10^{-6} FIT/h/node) (Abdulbaqi, 2018, Narasimhamurthy et al., 2018).
  • Programming Paradigm Evolution: Movement toward task-DAG runtimes, asynchronous collective communication, and accelerator programming models (CUDA/HIP/DPC++) ensures scalability, portability, and maintainability (Xiaa et al., 1 Oct 2025, Huebl et al., 2022).
  • Opportunities and Open Challenges: Integration of quantum/neuromorphic accelerators, federated cross-facility ML, further reductions in memory power, and scalable data analytic/visualization workflows are poised to expand exascale utility (Carrasco-Busturia et al., 3 Mar 2024, Shukla et al., 21 Dec 2025, Goz et al., 2017).

7. Tables: Key System Benchmarks and Scaling Metrics

System Application Nodes/GPUs Peak Perf. Scaling Eff.
Aurora HPL-MxP 9,500 nodes 11.64 EF/s 78.84% (@HPL DP)
Frontier-E CRK-HACC 9,000 nodes 513 PF/s 92% strong, 95% weak
JUWELS QM/MM MD (MiMiC) 80,000 cores 5.4 ps/day 70% strong
Perlmutter BLAST PIC 256 GPUs 97% weak
Trinion Fusion NekRS CHIMERA 33,792 ranks ~12 PF/s 80% strong

References

Exascale computing capabilities represent a convergence of heterogeneous hardware, deep software stacks, communication-avoiding algorithms, and resilient I/O architectures, powering unprecedented simulations across all science and engineering domains. Scientific progress at this scale depends on co-engineered workflows capable of sustaining >1018>10^{18} FLOP/s, handling petabyte-to-exabyte data, and reliably maintaining productivity within a strict energy budget.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Exascale Computing Capabilities.