Graph Neural Operators

Updated 9 March 2026

Graph Neural Operators are neural operator architectures that leverage graph networks to learn mappings between function spaces, particularly for solving PDEs.
They employ mesh-invariant designs using kernel integrals, multi-scale fusions, and attention mechanisms to efficiently approximate continuous operators on irregular domains.
Benchmark studies show that GNOs enable significant speedups and robustness in surrogate modeling for complex physical systems compared to traditional numerical solvers.

Graph Neural Operators (GNOs) are a prominent class of neural operator architectures designed to learn mappings between function spaces, particularly solution operators to partial differential equations (PDEs), leveraging the expressive power and geometric flexibility of graph neural networks. GNOs generalize traditional neural networks from finite-dimensional spaces to operator settings, enabling mesh-invariant, geometry-aware, and efficient surrogate modeling for complex physical systems, especially on irregular or unstructured spatial domains.

1. Mathematical Foundation and Operator Formulation

GNOs are grounded in the neural operator framework, wherein the target is an operator

$\mathcal{G} : \mathcal{F} \to \mathcal{U}$

mapping input functions (e.g., coefficients, forcings, or initial conditions) to solution functions of a PDE. In continuous form, the neural operator is constructed as a composition of local point-wise mappings and non-local integral operators:

$\mathcal{G}_\theta = \mathcal{Q} \circ \sigma_L \left( W_{L-1}+K_{L-1}+b_{L-1} \right) \circ \cdots \circ \sigma_1 \left(W_0+K_0+b_0\right) \circ \mathcal{P}$

where $K_{\ell}(v)(x) = \int_{D} \kappa^{(\ell)}(x,y)v(y)\,dy$ is a learned kernel integral, and $\sigma_\ell$ are pointwise nonlinearities. For graph-based discretizations, the spatial domain $D$ is sampled at points $\{x_i\}$ , forming nodes in a graph $G$ with edges defined by proximity (e.g., within radius $r$ or $k$ -nearest neighbors). The integral becomes a sum over neighbors:

$u_i^{(t+1)} = \sigma \bigg( W v_i^{(t)} + \sum_{j\in\mathcal{N}(i)} \kappa_\theta(x_i,x_j) v_j^{(t)} + b_i \bigg)$

with $\kappa_\theta(x_i, x_j)$ parameterized, for example, by an MLP of relative (or absolute) coordinates, potentially augmented with physical features (Kovachki et al., 2021, Li et al., 2023). Discretization-invariance is achieved by sharing all kernel and bias parameters across nodes irrespective of the mesh.

2. Core Architectures and Variants

Several GNO architectural designs have been developed to address the challenges of operator learning on complex or irregular domains:

Plain GNO (radius-graph kernel integral): Implements a local neighborhood integral as a message-passing layer. Each GNO layer is a sum over spatial neighbors within a fixed radius, weighted by a learned kernel $\kappa_\ell(x, y)$ (Li et al., 2023, Kovachki et al., 2021).
Mollified GNO (mGNO): Replaces the non-differentiable indicator kernel with a smooth mollifier $w(x, y)$ , enabling exact automatic differentiation of outputs with respect to spatial positions, critical for physics-informed loss terms involving derivatives (Lin et al., 11 Apr 2025).
Multi-scale GNO (MGNO): Leverages multiscale graph hierarchies (V-, F-, W-cycles) to enhance global propagation, improve expressive capacity, and mimic multigrid solvers; down-sampling and up-sampling operators communicate across graph resolutions (Migus et al., 2022).
Spatio-spectral GNO (Sp $^2$ GNO): Integrates spatial (local) graph convolution with spectral (global) graph Fourier filtering, combining efficient propagation of local and long-range dependencies (Sarkar et al., 2024).
Attention-enhanced GNO (GOLA): Introduces Fourier-based node encodings and attention mechanisms for message weighting, providing robust operator learning with sparsely and irregularly sampled input locations (Li et al., 25 May 2025).
Two-scale GNO (LGN): Hierarchically addresses lattice systems, learning coarse (reduced) and fine-scale solution maps between graph representations (Jain et al., 2024).

This diversity underscores the methodological adaptability of GNOs to application- and data-driven constraints.

3. Discretization-Invariance and Approximation Properties

A defining property of GNOs is discretization-invariance: the same learned weights generalize across grids or point clouds of varying densities, mesh refinements, or even geometric configurations. Discretization-convergence is formalized as follows:

For the radius-graph integral layer,

$v_\ell(x) \approx \int_{B_r(x)} \kappa_\ell(x, y) v_{\ell-1}(y) dy \quad \leadsto \quad \sum_{j} \kappa_\ell(x, y_j) v_{\ell-1}(y_j) \mu(y_j)$

as the mesh fill-distance $h\to 0$ , the discrete sum converges to the continuum at rate $O(h^p)$ (typically $p=1$ ) under the following: - $\varepsilon$ -ball radius graphs, not fixed- $k$ neighbors, ensuring uniform spatial coverage. - Locally consistent Riemann/area weights $\mu(y_j)$ . - Universality of MLPs in approximating continuous kernels (Li et al., 2023, Kovachki et al., 2021).

The universal approximation theorem for neural operators applies in this context: for any continuous operator and compact input set, there exists a GNO that approximates the operator to arbitrary precision (Kovachki et al., 2021). This theoretical basis supports mesh-agnostic super-resolution and zero-shot generalization.

4. Integration with Advanced Neural Operator and Physics-Informed Frameworks

GNOs are frequently combined with complementary neural operator primitives and physics-regularized training:

Fourier Neural Operator (FNO) Hybridization: Encoders and decoders based on GNOs allow the projection of irregular or point-cloud data to a regular latent grid, enabling efficient FNO processing for long-range/global effects. GNO decoders return grid outputs to arbitrary coordinates or meshes (Li et al., 2023).
Automatic Differentiation and Physics Losses: Mollified GNOs allow end-to-end training with physics-informed loss functions (e.g., PDE residuals, boundary conditions) on irregular meshes by supporting exact spatial derivatives via autograd, sidestepping finite differences or spectral differencing on unstructured data (Lin et al., 11 Apr 2025). This is crucial for inverse design and high-fidelity data-scarce regimes.
Multi-resolution and Spectral Fusions: Multi-scale schemes (MGNO) and spatio-spectral fusions (Sp $^2$ GNO) further enrich operator approximation capacity, reduce over-smoothing/over-squashing, and provide near-linear scaling with node count for large domains (Migus et al., 2022, Sarkar et al., 2024).

5. Performance, Benchmarking, and Practical Considerations

Extensive benchmarking on high-dimensional PDE datasets, including large-scale 3D CFD, elasticity, and multiscale lattice mechanics, confirms several key practical findings:

Model	DrivAerNet	Heat Sink	Darcy (2D)	Super-resolution	Parametric Fusion	Inverse Design
GNO	45.8%	5.69%	2.1e-2	Yes	+ Branch/Concat: ✓	Not native
FNO/Geo-FNO	9.42%	n/a	1.1e-2	Yes (on grid)	Yes (grid)	Not mesh-free
Sp $^2$ GNO	9.8e-3	-	9.0e-3	Yes	No data	Not native
GOLA	n/a	n/a	1.6e-1	Yes (irregular)	Fourier + Attention	Not native
LGN	-	-	-	Yes (lattice)	Multi-scale (LGN-ii)	Not native

GNO/EA-GNO achieve discretization- and mesh-invariance but can lag in accuracy compared to grid- or point-based models when long-range effects dominate or for very large (100k–500k) node problems unless deep layers or edge augmentation are used (Zhong et al., 7 Oct 2025, Li et al., 2023).
GINO's hybrid GNO/FNO outperforms UNet, MeshGraphNet, and FNO on complex 3D CFD (e.g., 8.31% test error for surface pressure, 2.6×10⁴ speedup vs. OpenFOAM) and achieves up to $0.5\%$ error variation under mesh refinement (super-resolution) (Li et al., 2023).
MGNO variants systematically halve test error over single-scale GNO on multiscale PDEs at the cost of increased computation (~1.1s/epoch for GNO, ~40s/epoch for W-MGNO) (Migus et al., 2022).
Mollified GNOs (mGNO) support exact physics losses, yielding 20×–1000× lower errors than finite-difference or baseline ML approaches in data-scarce, irregular, or unstructured domains, while delivering 20×–3000× speedups over classical solvers (Lin et al., 11 Apr 2025).
Attention-augmented and Fourier-encoded GNOs (GOLA) exhibit superior generalization for PDEs with very limited data and highly irregular sampling (Li et al., 25 May 2025).

6. Advantages, Limitations, and Future Directions

Strengths:

Mesh/geometry invariance, universal operator approximation, strong empirical and theoretical error control under mesh refinement.
Efficient surrogate modeling for PDEs with complex boundaries, arbitrary or evolving domains, or spatial heterogeneity.
Extension to hybrid and multi-scale architectures, supporting grid-graph interoperability and incorporation of physical priors (e.g., SDFs, physics losses).
Orders-of-magnitude acceleration over direct numerical solvers, enabling real-time and design optimization workflows (Li et al., 2023, Jain et al., 2024).

Limitations:

Local message passing in plain GNO restricts global context propagation; requires deep stacking, multiscale, or edge-augmentation for long-range effects (Zhong et al., 7 Oct 2025).
Computational cost per epoch and memory scale with node count and neighborhood size.
Static graph construction and choice of hyperparameters (e.g., radius, k, spectral truncation) can limit robustness and expressivity (Sarkar et al., 2024).
Formal convergence theorems exist under mild regularity, but task-specific uncertainty quantification remains challenging (e.g., for stability at large strains in lattice mechanics (Jain et al., 2024)).

Ongoing and proposed research avenues include:

Joint learning of optimal graph topology and adaptive hierarchies (Migus et al., 2022, Sarkar et al., 2024).
Integration of dynamic Fourier and spectral kernels for expressivity.
Improved attention and global message-passing schemes for highly nonlocal operators.
Rigorous theoretical investigation of stability, regularization, and generalization for nonlinear, nonlocal PDEs and multiphysics systems.

Graph Neural Operators thus constitute a central primitive in modern operator learning, providing flexible, principled, and efficient function-space surrogates for PDE-constrained modeling in science and engineering (Kovachki et al., 2021, Li et al., 2023, Sarkar et al., 2024, Lin et al., 11 Apr 2025).