P3D: Scalable Neural Surrogates

Updated 10 March 2026

The paper presents scalable neural surrogate models that integrate CNNs, Transformers, and GNNs to simulate high-resolution 3D PDE physics with hundreds of millions of degrees of freedom.
It employs rigorous domain decomposition and patchwise processing to optimize memory, ensure parallelism, and maintain mathematically equivalent gradient updates across simulation patches.
Advanced multifidelity training and mesh-free neuro-symbolic solvers reduce simulation costs while achieving competitive accuracy compared to classical Monte Carlo methods.

P3D refers to a class of scalable neural surrogate modeling techniques for high-dimensional and high-resolution scientific simulation problems, especially 3D physics governed by PDEs. P3D-type methods enable learning surrogates for massive simulation domains (up to hundreds of millions of degrees of freedom) by combining neural operator models, architectural advances (CNNs, Transformers, GNNs), principled domain decomposition, transfer learning across dimensionalities, and mesh-free neuro-symbolic solvers. Their defining characteristics include efficient scaling with grid size, aggressive memory and compute optimizations, and the ability to integrate global context and physical constraints. These advances have established P3D surrogates as state-of-the-art for applications in fluid dynamics, uncertainty quantification, optimization-constrained design, and multiscale physical modeling (Holzschuh et al., 12 Sep 2025, Bartoldson et al., 2023, Propp et al., 2024, Alkin et al., 13 Feb 2025, El-Kabid et al., 24 Jul 2025, Parker et al., 2024, Mistani et al., 2022).

1. Neural Surrogate Architectures for Large-Scale 3D Physics

P3D surrogates encompass a variety of architecture classes—hybrid CNN-Transformer backbones (Holzschuh et al., 12 Sep 2025), mesh-based graph neural networks (Bartoldson et al., 2023), fully convolutional encoder-decoders with multifidelity transfer (Propp et al., 2024), neural integral operators (e.g., FNO, U-NO) (El-Kabid et al., 24 Jul 2025), and mesh-free neuro-symbolic PDE solvers (Mistani et al., 2022). These designs share several scaling principles:

Hybrid Local/Global Feature Extraction: 3D CNNs or GNNs extract spatially localized representations; Transformer backbones or context modules fuse global interactions within or across patches (Holzschuh et al., 12 Sep 2025).
Patchwise Processing and Memory Optimization: Training on overlapping spatial crops (patches) drastically lowers memory requirements. Windowed attention or domain decomposition ensures each local patch "sees" its full neighborhood, preserving fidelity while allowing distributed or minibatch-based processing (Bartoldson et al., 2023, Holzschuh et al., 12 Sep 2025).
Operator Learning and Field Querying: Architectures such as neural operators (FNO, U-NO, DFNO) learn mappings between function spaces, enabling prediction at arbitrary spatial resolutions and zero-shot generalization to unseen grids (El-Kabid et al., 24 Jul 2025).
Anchored Decoding and Geometry Handling: Geometry-preserving encoders and branch decoders ensure faithful conditioning on complex boundary or CAD inputs, as seen in the AB-UPT/GP-UPT family (Alkin et al., 13 Feb 2025).

These models typically minimize pixelwise or fieldwise regression losses (MSE, L1) but may be augmented by physical constraint terms such as divergence-free penalties or PDE residuals, supporting both deterministic surrogates and diffusion/probabilistic samplers (Holzschuh et al., 12 Sep 2025, El-Kabid et al., 24 Jul 2025).

2. Domain Decomposition and Patch Training

One of the technical foundations of P3D surrogates is the rigorous application of domain decomposition during both training and inference, exemplified in patchwise MeshGraphNets (Bartoldson et al., 2023) and CNN/Transformer patch fusion (Holzschuh et al., 12 Sep 2025). For a mesh-based simulation domain $\mathcal{G}=(V,E)$ , the mesh is split into $P$ patches with ghost-node extension. Message passing or convolutional operations are restricted to these patches, with the following crucial guarantees:

Mathematical Equivalence: For message passing step $m$ and ghost region size $k$ satisfying $m\leq k$ , patchwise training yields gradient updates mathematically identical to training on the entire domain, provided losses are aggregated over cores only (Bartoldson et al., 2023).
Parallelism and Efficiency: Batched or distributed patch updates enable training on domains with tens of millions of nodes using conventional hardware, strong scaling nearly linearly in the number of GPUs (Bartoldson et al., 2023, Holzschuh et al., 12 Sep 2025).
Global Context Correction: A small context or seq2seq model integrates patch-level information, ensuring coherent global outputs and capturing long-range correlations—essential for turbulence and multiscale phenomena (Holzschuh et al., 12 Sep 2025).

For meshless domains or field settings, multi-patch inference is similarly fused by averaging overlapping predictions or aggregating via adaptive instance normalization schemes guided by region tokens (Holzschuh et al., 12 Sep 2025, El-Kabid et al., 24 Jul 2025).

3. Multifidelity and Transfer-Learning Schemes

High-fidelity simulation data in three (or higher) spatial dimensions are computationally expensive to generate. P3D surrogates leverage transfer learning and multifidelity training between lower- and higher-dimensional data to reduce sample and computational complexity (Propp et al., 2024). The key workflow is:

Low-Fidelity Pretraining: Train the core encoder-decoder or operator backbone on a large set of $(d-1)$ -dimensional solutions (e.g., 2D slices of a 3D domain), exploiting much lower sample generation costs.
Layerwise or Full Fine-Tuning: Freeze most network parameters and adapt only the final layers to a small, high-fidelity $(d)$ -dimensional dataset; then optionally unfreeze all layers for fine-tuning.
Reverse Curse-of-Dimensionality: This mixture achieves, for a fixed wall-clock budget $B$ , an effective $M$ -fold increase in the number of training samples, where $M$ is the grid size along the new dimension (e.g., $P$ 0) (Propp et al., 2024).

This architecture-agnostic approach achieves up to $P$ 1 reduction in required high-fidelity solver runs for a fixed accuracy, and in benchmarking matches or outperforms classical Monte Carlo uncertainty quantification with orders less data (Propp et al., 2024).

4. Mesh-Free and Neuro-Symbolic Approaches

Certain P3D surrogates forgo mesh-based discretizations, employing neural parameterizations of the solution field and directly minimizing residuals of discretized PDE operators (Mistani et al., 2022). In such neuro-symbolic pipelines:

Architecture: Region-specific MLPs parameterize the field (e.g., for $P$ 2), evaluated at arbitrary collocation points.
Residual Minimization: The loss is the preconditioned norm of the discretized PDE residual at a large set of random points, including interface and boundary collocation. The symbolic discretization kernel is embedded within the differentiable computation graph.
Convergence Matching Classical Solvers: For benchmark Helmholtz/interface problems, with only modest MLP size, mesh-free surrogates achieve second-order accuracy in $P$ 3 and competitive $P$ 4 norms compared to immersed-interface/ghost-fluid solvers, while retaining full grid and geometry agnosticism (Mistani et al., 2022).

This framework is fully parallelizable, supporting strong and weak scaling up to effective $P$ 5 grids, and once trained, enables millisecond-time inference for new boundary or source conditions.

5. Operator-Based and Spectral Surrogates

P3D design principles can be formalized via operator-theoretic and spectral mathematics (Herrmann et al., 2022, El-Kabid et al., 24 Jul 2025):

Affine Encoding and Frame Representation: Surrogate maps $P$ 6 are composed with stable affine encoders $P$ 7 and decoders $P$ 8 based on ONBs, frames, or Riesz bases aligned with the PDE's physical regularity.
Neural Operator or Spectral Expansion: The surrogate $P$ 9, where $m$ 0 is a ReLU deep neural network of $m$ 1 size, yields algebraic error rates $m$ 2, with $m$ 3 determined by smoothness of $m$ 4 ( $m$ 5) and $m$ 6-target ( $m$ 7).
gPC/Polynomial Surrogates: Sparse polynomial chaos expansions on the coefficient random field also deliver $m$ 8 complexity and error bounds, requiring only $m$ 9 functional evaluations (Herrmann et al., 2022).

This formalism confirms that P3D surrogates, when equipped with appropriate encoding and regularization, can transcend the curse-of-dimensionality and realize online $k$ 0 complexity for infinite-dimensional input spaces.