Papers
Topics
Authors
Recent
2000 character limit reached

PDEBench: Benchmark for PDE Surrogates

Updated 8 February 2026
  • PDEBench is an open-source benchmark suite that standardizes evaluation of ML models on time-dependent partial differential equations, covering diverse problems like advection, Burgers, and Navier–Stokes.
  • It offers pre-generated datasets across 1D, 2D, and 3D with comprehensive simulation codes, APIs, and detailed metadata for reproducible scientific experiments.
  • The benchmark integrates advanced evaluation metrics, facilitating robust research in neural operator learning, surrogate modeling, and cross-domain pretraining.

PDEBench is an open-source benchmark suite designed to rigorously evaluate scientific machine learning models on time-dependent partial differential equations (PDEs). It has become a central resource for the systematic assessment and development of neural operators, surrogate modeling techniques, and foundation models aimed at learning solution operators for a broad spectrum of physical systems. PDEBench provides standardized datasets, simulation code, APIs, and baseline metrics, spanning canonical problems such as advection, Burgers’, reaction–diffusion, sorption, Navier–Stokes, shallow-water, Darcy flow, and compressible/incompressible multi-dimensional fluid flows. The benchmark is deeply embedded in both classical and contemporary research on scientific ML, serving as the foundational evaluation platform for a wide array of recent advances in PDE operator learning, few-shot transfer, multi-physics pretraining, and sim-to-real transfer scenarios (Takamoto et al., 2022).

1. Problem Suite and Data Specifications

PDEBench contains a diverse set of PDE types and configurations across 1D, 2D, and 3D, selected to probe the breadth of spatiotemporal modeling and numerical surrogate challenges. Each class of problem is defined along four dimensions: governing equation, parameter variations, initial/boundary conditions, and discretization scheme.

  • 1D Problems:
    • Advection: tu+βxu=0\partial_t u + \beta \partial_x u = 0, x(0,1)x\in(0,1), t(0,2]t\in(0,2], periodic BC, IC as sum of two randomized sine waves, β{0.1,0.4,1.0,4.0}\beta \in \{0.1, 0.4, 1.0, 4.0\}.
    • Burgers: tu+x(u2/2)=ν/πxxu\partial_t u + \partial_x(u^2 / 2) = \nu / \pi \,\partial_{xx}u, ν{103,102,101,1}\nu \in \{10^{-3},10^{-2},10^{-1},1\}, periodic BC.
    • Diffusion–Reaction: tuνxxuρu(1u)=0\partial_t u - \nu\partial_{xx}u - \rho u(1-u) = 0, (ν,ρ)(\nu,\rho) varied, periodic BC.
    • Diffusion–Sorption: nonlinear diffusion with explicit retardation, mixed BCs.
  • 2D Problems:
    • Diffusion–Reaction: 2-component FitzHugh–Nagumo, Neumann BC.
    • Steady-state Darcy flow: (a(x)u)=β-\nabla(a(\mathbf{x})\nabla u) = \beta, with random fields.
    • Navier–Stokes, shallow-water, compressible/incompressible, with distinct forcing, outflow, or periodic BCs.
  • 3D Problems:
    • Compressible Navier–Stokes flow on 1283^3 grid, periodic/outflow boundary conditions.

Datasets are pre-generated and stored in HDF5 format in standardized tensor shapes, with explicit YAML metadata for parameters, BC/IC description, and discretization. Resolutions reach up to 1024 points in 1D, 5122^2 in 2D, and 1283^3 in 3D. For each problem, thousands to tens of thousands of simulation samples are available, generated via classical finite-difference, finite-volume, PyClaw, or spectral codes (Takamoto et al., 2022).

2. Standardized API, Codebase, and Extensibility

PDEBench provides an open Python API for both dataset loading and simulation generation. Data can be accessed locally or downloaded via DOI (DaRUS platform), and is directly compatible with PyTorch and other ML frameworks. The codebase includes:

  • Example scripts for on-the-fly data generation with customizable discretizations via Hydra.
  • Baseline model implementations (U-Net, FNO, PINN, gradient-based inverse surrogates).
  • Training loops, evaluation routines, and logging/checkpointing with minimal user configuration required.

Data loading and usage in neural operators is streamlined through provided PyTorch DataLoader wrappers and utility scripts. The API is extensible: new PDEs, BCs, and parameter regimes can be added by subclassing simulation or data classes. All datasets maintain full provenance, including grid, timestep, random seed, and physical parameter descriptions (Takamoto et al., 2022).

3. Evaluation Metrics and Physics-Informed Assessment

PDEBench includes a multidimensional evaluation protocol that goes beyond conventional pointwise errors to assess physical fidelity and surrogate generalization.

  • Data fidelity metrics:
    • RMSE, normalized RMSE (nRMSE), maximum error (MaxErr).
  • Physics-inspired metrics:
    • Conservation RMSE (cRMSE): global integral conservation violation.
    • Boundary RMSE (bRMSE): (pseudo-)norm over boundary points, quantifies BC satisfaction.
    • Spectral RMSE (fRMSE): computed over low, middle, high Fourier bands to expose performance on different solution scales.
  • Additional metrics include: rollout stability, frequency drift, and inverse problem accuracy.

These metrics have revealed task-specific challenges: for example, high-frequency shock content in low-viscosity Burgers' flows causes significant FNO degradation beyond its spectral bandlimit, while normalized errors in weak-forcing Darcy may diverge despite small absolute error due to near-zero solution magnitude (Takamoto et al., 2022).

4. Role in the SciML and Foundation Model Ecosystem

PDEBench is the canonical benchmark for PDE surrogate evaluation and multi-physics pretraining in scientific machine learning. Its datasets and protocols are used in:

5. Baseline Methods and Performance Results

PDEBench includes precomputed baseline results for key operator learning and surrogate architectures:

Model Key Characteristics Typical Tasks/Regimes
U-Net Encoder–decoder CNN, skip connections Localized/structured dynamics
FNO Mesh-invariant, global Fourier filter layers Global dynamics, band-limited flows, 1D–3D
PINN Physics-constrained MLP via loss Lagrangian Low-data, stiff, or BC-critical tasks
Gradient-based Inverse IC/parameter inversion via backprop Inverse/identification tasks

Performance on hard problems:

  • Low-viscosity Burgers: FNO suffers spectral ringing on shocks; U-Net requires autoregressive stabilization for long rollouts.
  • Complex Navier–Stokes and shock tubes: Neural surrogates may struggle with high-amplitude discontinuities, motivating further development in adaptive/spectral operator design.
  • Grid-invariance and BC handling: CompNO (Hmida et al., 12 Jan 2026) achieves exact Dirichlet satisfaction with zero boundary loss, robust generalization on ×2 grid, and error stability across Peclet/Reynolds numbers.

Enhanced unified models such as OmniArch, UPS, and AMR-Transformer achieve order-of-magnitude improvements over classical FNO/U-Net on 1D–3D tasks with fewer trainable parameters or data.

6. Limitations, Identified Hard Cases, and Future Directions

PDEBench has revealed persistent challenges in learning-based surrogate modeling:

  • Surrogates often fail on temporal extrapolation, rarely maintaining error growth < linear beyond training horizon.
  • Black-box CNNs struggle with sharp discontinuities unless equipped with explicit spectral or conservation priors.
  • Rollout instability and BC/parameter sensitivity remain major open problems, motivating research into mesh-adaptive, conservation-law-respecting, and multi-domain operators.

Proposed research directions include mesh- and spectrum-adaptive neural operators, explicit conservation and entropy stability constraints, multi-phase and irregular-domain extensions, as well as domain/parameter generalization via hypernetworks and hybrid (neural–numerical) architectures.

PDEBench code, datasets, and documentation are maintained at https://github.com/pdebench/PDEBench, and the platform is being actively extended with new PDEs, metrics, and evaluation protocols to support the evolving needs of the scientific ML community (Takamoto et al., 2022).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PDEBench.