PDEBench: Benchmark for PDE Surrogates

Updated 8 February 2026

PDEBench is an open-source benchmark suite that standardizes evaluation of ML models on time-dependent partial differential equations, covering diverse problems like advection, Burgers, and Navier–Stokes.
It offers pre-generated datasets across 1D, 2D, and 3D with comprehensive simulation codes, APIs, and detailed metadata for reproducible scientific experiments.
The benchmark integrates advanced evaluation metrics, facilitating robust research in neural operator learning, surrogate modeling, and cross-domain pretraining.

PDEBench is an open-source benchmark suite designed to rigorously evaluate scientific machine learning models on time-dependent partial differential equations (PDEs). It has become a central resource for the systematic assessment and development of neural operators, surrogate modeling techniques, and foundation models aimed at learning solution operators for a broad spectrum of physical systems. PDEBench provides standardized datasets, simulation code, APIs, and baseline metrics, spanning canonical problems such as advection, Burgers’, reaction–diffusion, sorption, Navier–Stokes, shallow-water, Darcy flow, and compressible/incompressible multi-dimensional fluid flows. The benchmark is deeply embedded in both classical and contemporary research on scientific ML, serving as the foundational evaluation platform for a wide array of recent advances in PDE operator learning, few-shot transfer, multi-physics pretraining, and sim-to-real transfer scenarios (Takamoto et al., 2022).

1. Problem Suite and Data Specifications

PDEBench contains a diverse set of PDE types and configurations across 1D, 2D, and 3D, selected to probe the breadth of spatiotemporal modeling and numerical surrogate challenges. Each class of problem is defined along four dimensions: governing equation, parameter variations, initial/boundary conditions, and discretization scheme.

1D Problems:
- Advection: $\partial_t u + \beta \partial_x u = 0$ , $x\in(0,1)$ , $t\in(0,2]$ , periodic BC, IC as sum of two randomized sine waves, $\beta \in \{0.1, 0.4, 1.0, 4.0\}$ .
- Burgers: $\partial_t u + \partial_x(u^2 / 2) = \nu / \pi \,\partial_{xx}u$ , $\nu \in \{10^{-3},10^{-2},10^{-1},1\}$ , periodic BC.
- Diffusion–Reaction: $\partial_t u - \nu\partial_{xx}u - \rho u(1-u) = 0$ , $(\nu,\rho)$ varied, periodic BC.
- Diffusion–Sorption: nonlinear diffusion with explicit retardation, mixed BCs.
2D Problems:
- Diffusion–Reaction: 2-component FitzHugh–Nagumo, Neumann BC.
- Steady-state Darcy flow: $-\nabla(a(\mathbf{x})\nabla u) = \beta$ , with random fields.
- Navier–Stokes, shallow-water, compressible/incompressible, with distinct forcing, outflow, or periodic BCs.
3D Problems:
- Compressible Navier–Stokes flow on 128 $^3$ grid, periodic/outflow boundary conditions.

Datasets are pre-generated and stored in HDF5 format in standardized tensor shapes, with explicit YAML metadata for parameters, BC/IC description, and discretization. Resolutions reach up to 1024 points in 1D, 512 $^2$ in 2D, and 128 $^3$ in 3D. For each problem, thousands to tens of thousands of simulation samples are available, generated via classical finite-difference, finite-volume, PyClaw, or spectral codes (Takamoto et al., 2022).

2. Standardized API, Codebase, and Extensibility

PDEBench provides an open Python API for both dataset loading and simulation generation. Data can be accessed locally or downloaded via DOI (DaRUS platform), and is directly compatible with PyTorch and other ML frameworks. The codebase includes:

Example scripts for on-the-fly data generation with customizable discretizations via Hydra.
Baseline model implementations (U-Net, FNO, PINN, gradient-based inverse surrogates).
Training loops, evaluation routines, and logging/checkpointing with minimal user configuration required.

Data loading and usage in neural operators is streamlined through provided PyTorch DataLoader wrappers and utility scripts. The API is extensible: new PDEs, BCs, and parameter regimes can be added by subclassing simulation or data classes. All datasets maintain full provenance, including grid, timestep, random seed, and physical parameter descriptions (Takamoto et al., 2022).

3. Evaluation Metrics and Physics-Informed Assessment

PDEBench includes a multidimensional evaluation protocol that goes beyond conventional pointwise errors to assess physical fidelity and surrogate generalization.

Data fidelity metrics:
- RMSE, normalized RMSE (nRMSE), maximum error (MaxErr).
Physics-inspired metrics:
- Conservation RMSE (cRMSE): global integral conservation violation.
- Boundary RMSE (bRMSE): (pseudo-)norm over boundary points, quantifies BC satisfaction.
- Spectral RMSE (fRMSE): computed over low, middle, high Fourier bands to expose performance on different solution scales.
Additional metrics include: rollout stability, frequency drift, and inverse problem accuracy.

These metrics have revealed task-specific challenges: for example, high-frequency shock content in low-viscosity Burgers' flows causes significant FNO degradation beyond its spectral bandlimit, while normalized errors in weak-forcing Darcy may diverge despite small absolute error due to near-zero solution magnitude (Takamoto et al., 2022).

4. Role in the SciML and Foundation Model Ecosystem

PDEBench is the canonical benchmark for PDE surrogate evaluation and multi-physics pretraining in scientific machine learning. Its datasets and protocols are used in:

Foundation model development: OmniArch (Chen et al., 2024), UPS (Shen et al., 2024), and CompNO (Hmida et al., 12 Jan 2026) perform united 1D-3D or cross-modal pretraining directly on the full suite, setting performance benchmarks on one-step and rollout prediction, inverse tasks, and zero-shot transfer.
Operator generalization studies: AMR-Transformer (Xu et al., 13 Mar 2025) and APEBench (Koehler et al., 2024) employ PDEBench for validating grid-invariance, parameter generalization, and long-horizon stability.
Control and RL frameworks: PDE Control Gym (a.k.a. PDEBench) (Bhan et al., 2024) builds RL environments around core 1D/2D tasks with proprioceptive and exteroceptive observation models.
Symbolic and model discovery: MDBench (Bideh et al., 24 Sep 2025) for equation recovery, and RealPDEBench (Hu et al., 5 Jan 2026) for sim-to-real gap analysis, both extend and ground-truth against original PDEBench data.

5. Baseline Methods and Performance Results

PDEBench includes precomputed baseline results for key operator learning and surrogate architectures:

Model	Key Characteristics	Typical Tasks/Regimes
U-Net	Encoder–decoder CNN, skip connections	Localized/structured dynamics
FNO	Mesh-invariant, global Fourier filter layers	Global dynamics, band-limited flows, 1D–3D
PINN	Physics-constrained MLP via loss Lagrangian	Low-data, stiff, or BC-critical tasks
Gradient-based Inverse	IC/parameter inversion via backprop	Inverse/identification tasks

Performance on hard problems:

Low-viscosity Burgers: FNO suffers spectral ringing on shocks; U-Net requires autoregressive stabilization for long rollouts.
Complex Navier–Stokes and shock tubes: Neural surrogates may struggle with high-amplitude discontinuities, motivating further development in adaptive/spectral operator design.
Grid-invariance and BC handling: CompNO (Hmida et al., 12 Jan 2026) achieves exact Dirichlet satisfaction with zero boundary loss, robust generalization on ×2 grid, and error stability across Peclet/Reynolds numbers.

Enhanced unified models such as OmniArch, UPS, and AMR-Transformer achieve order-of-magnitude improvements over classical FNO/U-Net on 1D–3D tasks with fewer trainable parameters or data.

6. Limitations, Identified Hard Cases, and Future Directions

PDEBench has revealed persistent challenges in learning-based surrogate modeling:

Surrogates often fail on temporal extrapolation, rarely maintaining error growth < linear beyond training horizon.
Black-box CNNs struggle with sharp discontinuities unless equipped with explicit spectral or conservation priors.
Rollout instability and BC/parameter sensitivity remain major open problems, motivating research into mesh-adaptive, conservation-law-respecting, and multi-domain operators.

Proposed research directions include mesh- and spectrum-adaptive neural operators, explicit conservation and entropy stability constraints, multi-phase and irregular-domain extensions, as well as domain/parameter generalization via hypernetworks and hybrid (neural–numerical) architectures.

PDEBench code, datasets, and documentation are maintained at https://github.com/pdebench/PDEBench, and the platform is being actively extended with new PDEs, metrics, and evaluation protocols to support the evolving needs of the scientific ML community (Takamoto et al., 2022).

Markdown Upgrade to Chat

References (9)

PDEBENCH: An Extensive Benchmark for Scientific Machine Learning (2022)

OmniArch: Building Foundation Model For Scientific Computing (2024)

UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation (2024)

CompNO: A Novel Foundation Model approach for solving Partial Differential Equations (2026)

AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation (2025)

APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs (2024)

PDE Control Gym: A Benchmark for Data-Driven Boundary Control of Partial Differential Equations (2024)

MDBench: Benchmarking Data-Driven Methods for Model Discovery (2025)

RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PDEBench.

PDEBench: Benchmark for PDE Surrogates

1. Problem Suite and Data Specifications

2. Standardized API, Codebase, and Extensibility

3. Evaluation Metrics and Physics-Informed Assessment

4. Role in the SciML and Foundation Model Ecosystem

5. Baseline Methods and Performance Results

6. Limitations, Identified Hard Cases, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

PDEBench: Benchmark for PDE Surrogates

1. Problem Suite and Data Specifications

2. Standardized API, Codebase, and Extensibility

3. Evaluation Metrics and Physics-Informed Assessment

4. Role in the SciML and Foundation Model Ecosystem

5. Baseline Methods and Performance Results

6. Limitations, Identified Hard Cases, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research