Data-Efficient Time-Dependent PDE Surrogates

Updated 1 January 2026

The paper demonstrates decoupling of spatial and temporal learning via operator neural surrogates, reducing required samples by orders of magnitude while supporting zero-shot time extrapolation.
It applies structured inductive biases—using graph neural networks and SE(2)-equivariance—to capture PDE locality and symmetry, achieving sub-1% relative L2 errors on benchmark problems.
Hybrid multi-fidelity, latent-space, and active learning approaches further cut simulation cost, enabling scalable and robust real-time surrogate modeling of complex dynamical systems.

Data-efficient time-dependent partial differential equation (PDE) surrogates are machine learning models or hybrid numerical/neural frameworks designed to emulate the evolution of PDE-governed dynamical systems using minimal high-fidelity data, often with orders-of-magnitude lower computational cost versus conventional solvers. Their development is motivated by the increasing demands of uncertainty quantification, optimization, and real-time control in domains where direct numerical simulation is intractable for many-query or high-dimensional settings. This article details modern approaches to data-efficiency in time-dependent PDE surrogates, covering operator-based neural architectures, graph-based inductive bias, multi-fidelity and hybrid models, active learning, and continuous-time resolution independence, with a focus on technical advances validated by recent large-scale benchmarks.

1. Operator Neural Surrogates: Decoupling Space and Time

Data-efficient surrogate learning for time-dependent PDEs frequently leverages neural operator formalisms that exploit structural decomposition between spatial and temporal variables. The hybrid DeepONet framework, exemplified in "Hybrid DeepONet Surrogates for Multiphase Flow in Porous Media" (Santos et al., 4 Nov 2025), formalizes the solution operator as

$\mathcal{G}: v(\cdot) \mapsto u(\cdot, t),\quad u(x,t)\approx \sum_{k=1}^p b_k(v)(x)\ t_k(t)$

where $v(x)$ encodes spatially heterogeneous inputs (e.g., permeability fields), and $u(x,t)$ is the desired time-dependent solution. The surrogate network decouples spatial learning (branch, $b_k$ ) and temporal learning (trunk, $t_k$ ), which drastically reduces memory and parameter requirements compared to treating $(x,t)$ jointly (as in full FNOs).

Spatial branches may use Fourier Neural Operators (FNO), classic MLPs, or Kolmogorov–Arnold Networks (KAN) for functional richness and inductive bias.
Temporal trunks are typically small MLPs or KANs, enabling direct prediction at arbitrary time stamps, supporting "zero-shot" time extrapolation without retraining.

This decomposition yields substantial data-efficiency:

Only the branch must encode spatial complexity; the trunk is lightweight, as the time signal is low-dimensional.
Training needs orders of magnitude fewer samples; e.g., achieving relative $L^2$ errors below $0.07$ on million-cell 3D porous flow problems from $20$ training trajectories.
Models with $10^5$ – $10^7$ parameters achieve accuracy comparable to monolithic FNOs requiring $10^7$ – $10^9$ parameters and multi-GPU memory (Santos et al., 4 Nov 2025).

This structural split enables modularity: spatial branches can be replaced by more advanced neural operators (wavelet spectral, SE(2)-equivariant, or graph-based), and time trunks by continuous-time models or RNNs, broadening applicability to a wide class of time-dependent physical systems.

2. Data Efficiency via Structured Inductive Bias and Graph Neural Surrogates

Incorporating PDE locality and geometric symmetries into neural surrogates is key for sample-efficient learning. Two exemplary approaches are:

Graph Neural Simulators (GNS) (Nayak et al., 7 Sep 2025): The GNS method builds a message-passing neural network (MPNN) on a spatial graph (e.g., regular or irregular mesh), where node features represent field values and coordinates, and edge features encode geometric relations. The model is trained to regress instantaneous time derivatives (i.e., learns $\partial u/\partial t$ ), which are integrated via explicit time discretization,

$u^{n+1} = u^n + \Delta t\,\text{GNS}(u^n)$

This aligns with conventional numerical schemes and respects locality, resulting in sub-1% relative $L^2$ errors for 2D Burgers and Allen–Cahn equations using only 30 training trajectories (3% of the data) (Nayak et al., 7 Sep 2025). GNS outperforms neural operator baselines (FNO, DeepONet) by $1$–$2$ orders of magnitude in the low-data regime, owing to its explicit encoding of causality and spatial interactions.

SE(2)-equivariant Graph Neural Networks (Bånkestad et al., 2024): For problems where 2D rotational and translational symmetry is fundamental (e.g., fluid dynamics), incorporating SE(2)-equivariance via principal-axis alignment within GNN message-passing layers yields an additional 8 $\times$ reduction in sample complexity compared to non-equivariant baselines for irregular-mesh and geometry-varying Navier–Stokes surrogates.

These approaches demonstrate that explicit encoding of physical structure, causality, and symmetries is critical for robust generalization from limited data in time-dependent regimes.

3. Multi-Fidelity, Hybrid, and Latent Space Approaches

Effective exploitation of low-fidelity or reduced-order models further enhances data efficiency in time-dependent PDE surrogates:

Multi-fidelity Reduced-Order Modeling (Conti et al., 2023): High-fidelity (HF) solution snapshots are compressed via proper orthogonal decomposition (POD) to form a reduced basis; low-fidelity (LF) models (coarse mesh, mis-specified physics) are projected onto this basis. A multi-fidelity LSTM is trained to map LF reduced coefficients to their HF counterparts over time and parameter variations,

$M_{\text{LF}\to\text{HF}}:(t, \mu, a^{\text{LF}}(t; \mu)) \mapsto a^{\text{HF}}(t; \mu)$

This enables $100\times$ speedup and $3$– $5\times$ error reduction versus pure LF surrogates and accurately captures transient instabilities.

Physics-Enhanced Deep Surrogates (PEDS) (Pestourie et al., 2021): Injects coarse, explainable physics (e.g., coarse-grid or simplified-physics time-steppers) into the surrogate pipeline, where a small neural network perturbs the LF input to match HF outputs. PEDS reduces the high-fidelity data requirement by $100\times$ to reach target errors, achieves $3$– $8\times$ improvement over pure data-driven approaches, and immediately extends to time-dependent problems using coarse time resolvers in the LF surrogate.
Latent-space and Continuous Convolutional Surrogates (Hagnberger et al., 19 May 2025): CALM-PDE encodes irregular or regular field samples into a fixed-size set of adaptive latent tokens, propagates dynamics in latent space using Transformer blocks, and reconstructs to arbitrary query points via continuous convolutional decoders. Adaptive placement of latent tokens in complex regions (e.g., boundary layers) and $O(\ell M)$ computational scaling of continuous convolutions allow competitive accuracy with as few as 200–1000 training trajectories, compared to several thousand required by Transformer-only surrogates.

4. Active Learning and Smart Data Acquisition

Active learning frameworks that adaptively select high-value queries—either parameter combinations or specific time steps—offer another route to data efficiency:

MelissaDL × Breed (Dymchenko et al., 2024) uses on-line supervised training with active steering via Adaptive Multiple Importance Sampling (AMIS), measuring the surrogate's local loss variance to focus simulation resources on hard-to-learn regions of input parameter space. In online 2D heat equation experiments, Breed reduces overfitting and increases generalization in challenging regions for small models, with negligible computational overhead.
Selective Time-Step Acquisition for PDEs (STAP) (Kim et al., 22 Nov 2025) extends active learning from parameter-space exploration to selective temporal sampling. Rather than generating full PDE trajectories at every iteration, STAP queries only those time steps expected to maximally reduce model uncertainty, as estimated by a committee-based acquisition function. This allows a given computational budget to cover a more diverse set of initial/boundary conditions, and in benchmarks on Burgers', KdV, Kuramoto–Sivashinsky, and Navier–Stokes, STAP reduces average and quantile errors by large margins compared to full-trajectory active learning.

5. Resolution Independence, Continuous-Time Operators, and Autoregressive Advances

A major axis of recent progress is the construction of surrogates invariant to input and output discretization, improving robustness in realistic data-scarce, irregularly sampled regimes:

NCDE-DeepONet (Abueidda et al., 3 Jul 2025) introduces Neural Controlled Differential Equation (NCDE) branches that encode entire boundary and/or load histories as controlled ODEs. Spline-interpolated input paths enable the branch to process arbitrarily sampled histories, while trunk MLPs probe arbitrary space–time locations. This endows the surrogate with "input-resolution independence" (robust to sampling sparsity) and "output-resolution independence" (arbitrary space–time queries), with 40% lower prediction errors than GRU-based DeepONet alternatives.
Continuous Flow Operator (CFO) (Hou et al., 4 Dec 2025) leverages flow matching on spline-fitted trajectory data to directly learn the continuous-time right-hand side of PDEs. CFO manages irregular or sparse time grids in training and supports arbitrary temporal querying at inference. On canonical benchmarks, CFO trained with only 25% of the time points outperforms autoregressive baselines on full data by up to 87% in relative error, and is orders of magnitude faster owing to fewer function evaluations and no ODE solver backpropagation in training.
Autoregressive Ensembling (Khurjekar et al., 5 Jul 2025) demonstrates that training ensembles of (e.g., UNet-attention) autoregressive models with random initializations and forming inference-time averages consistently reduces long-horizon error drift by 15–33% and halves mean absolute error as $N_{\text{ensemble}}$ grows from 2 to 8. Each surrogate is data- and compute-efficient, requiring only $L=3$ previous time steps and modestly sized architectures.
Spectral and Diffusion-Augmented Objectives (Lippe et al., 2023): PDE-Refiner introduces a diffusion-style multi-step denoising refinement specifically designed to improve accuracy for low-amplitude, high-frequency spatial modes. The injected spectral noise acts as a strong regularizer, such that with only 10% of training trajectories, PDE-Refiner matches the accuracy of a baseline model trained with 100% of data. This constitutes a form of spectrum-wide data augmentation, uniquely well-suited to low-data fluid dynamics setups.

6. Quantitative Benchmarks and Practical Guidelines

A range of methodologies have been quantitatively benchmarked:

Method/Architecture	Problem	Samples	Test Rel. $L^2$ Error	Memory/Params	Key Reference
Hybrid DeepONet (FNO+MLP)	3D porous flow, 1.12M cells	20	0.062	$10^5$ – $10^7$	(Santos et al., 4 Nov 2025)
Graph Neural Simulator	2D Burgers/Allen–Cahn	30–50	$<$ 1%	n/a	(Nayak et al., 7 Sep 2025)
CALM-PDE (latent)	2D NS, 64x64 grid	200–1000	$<$ 3%	$<<$ Transformer	(Hagnberger et al., 19 May 2025)
Multi-fidelity LF $\to$ HF LSTM+POD	Cylinder NS	15	3.5%	$>$ 100 $\times$ speedup	(Conti et al., 2023)
PEDS (coarse+MLP)	Diffusion/Fisher's eqn.	1000	$<$ 5%	$100\times$ data reduction	(Pestourie et al., 2021)
CFO (Spline-matched flow)	Burgers', SWE	25% of time pts	$<$ 1%	Time-resolution-independent	(Hou et al., 4 Dec 2025)
PDE-Refiner (diffusion)	KS eqn., $10\%$ data	matched	baseline (100\% data)	$3$–$8$ denoising steps	(Lippe et al., 2023)

Guidelines emerging from these empirical studies:

Prefer branch/trunk or encoder–decoder decompositions to isolate high-dimensional spatial structure from low-dimensional time/parameter queries.
Where applicable, integrate local inductive bias (physics, mesh connectivity, symmetry) into neural architectures.
Exploit cheap low-fidelity models either via multi-fidelity learning (e.g., LF $\to$ HF mapping) or as physics-enhanced surrogates (PEDS).
Pursue active learning for sample selection in both parameter and time domains.
Use continuous-time operator learning (CFO, NCDE-DeepONet) for robustness to sampling sparsity and arbitrary-grained inference.

7. Limitations, Trade-offs, and Future Directions

While state-of-the-art data-efficient PDE surrogate models achieve impressive accuracy and scalability, certain caveats and open questions remain:

High-fidelity surrogates for multi-scale phenomena (e.g., fully developed turbulence) remain challenging, as small latent spaces or purely local inductive biases may undersample rare dynamics or sharp features.
Continuous-time operator models (CFO, NCDE-DeepONet) incur inference cost from ODE integration. Inference speed remains lower than one-shot encoder–decoder models, but is mitigated by selective stepping and the absence of backpropagation through time.
Autoregressive surrogates, even when ensembled, can accumulate errors over very long horizons. Explicit enforcement of physical invariants (e.g., mass/energy conservation) within denoising or refinement steps may further improve stability (Lippe et al., 2023).
Practical integration of active learning frameworks with HPC solver batch scheduling, streaming data ingestion, and on-the-fly retraining is non-trivial but critical for scaling to real-world engineering workflows (Dymchenko et al., 2024).
Transferability to unstructured, multi-physics, or partially observed systems is an ongoing research focus, with promising directions leveraging SE(2)-equivariance, graph or mesh neural operators, and continuous/implicit representations.

The collective findings across these architectures and strategies demonstrate that data-efficient surrogate modeling for time-dependent PDEs is now a practicable and mature technology, enabling simulation-accelerated science and engineering with dramatically reduced data requirements and wall-clock times (Santos et al., 4 Nov 2025, Nayak et al., 7 Sep 2025, Conti et al., 2023, Pestourie et al., 2021, Hagnberger et al., 19 May 2025, Dymchenko et al., 2024, Abueidda et al., 3 Jul 2025, Hou et al., 4 Dec 2025, Lippe et al., 2023).