Scale-Informed Neural Operator

Updated 29 July 2025

Scale-informed neural operators are neural network architectures that learn mappings between function spaces for efficient multiscale PDE simulations.
They compress high-dimensional PDE operators by encoding both fine-scale oscillatory features and coarse-scale surrogates with local-to-global assembly techniques.
This approach significantly reduces computational costs compared to classical upscaling, enabling rapid multi-query simulations and uncertainty quantification.

A scale-informed neural operator is a neural network architecture specifically designed to learn mappings between function spaces that account for multiscale structure in underlying partial differential equations (PDEs), operator compression, and surrogatization. These models are constructed to encode, compress, and efficiently evaluate the macroscopic or effective behavior of operators with coefficients that exhibit wide scale separation—such as in heterogeneous materials or multiscale diffusion—using neural approximations that inherit both local and global structures from numerical homogenization and finite element assembly. The methodology directly targets the operator-to-coefficient mapping at the surrogate (coarse) scale, facilitating efficient multi-query simulations and dramatically reduced computational costs compared to traditional upscaling. The following sections provide a comprehensive technical summary of the theoretical and practical framework established for scale-informed neural operators, with specific reference to (Kröpfl et al., 2021).

1. Multiscale Operator Compression and Surrogatization

Scale-informed neural operator frameworks aim to compress families of elliptic, heterogeneous PDE operators—such as $-\mathrm{div}(A \nabla \cdot)$ —whose (possibly high-dimensional) coefficients $A$ oscillate across broad and unresolved scales. The procedure begins by representing the fine-scale operator using a surrogate system matrix $S_{A}$ defined on a specified coarse scale $h$ (target discretization). The objective is to ensure $S_{A}$ encapsulates the effective macroscopic response even when $A$ is highly oscillatory or discontinuous at scales significantly below $h$ .

The assembly of $S_{A}$ utilizes a spatial (domain) decomposition consistent with standard finite element assembly, yielding

$S_{A} = \sum_{j} \Phi_{j}(S_{A,j}),$

where $S_{A,j}$ is a local sub-matrix capturing operator response on a patch/element indexed by $j$ , and $\Phi_{j}$ is the canonical local-to-global embedding (as in the assembly of element matrices to the global stiffness matrix).

The scale—in both physical and coefficient space—appears via:

Definition of the local neighborhood/patch for each $j$ , which must be sufficiently large to capture the unresolved fine-scale influences of $A$ .
Reduction operators $R_{j}$ that extract $r$ -dimensional localized features from $A$ to characterize its behavior within the patch, encompassing even subgrid oscillations.

2. Local Coefficient-to-Surrogate Map via Neural Networks

Rather than approximating the global mapping from $A$ to PDE solutions (parameter-to-solution map), the framework directly learns the mapping from localized coefficient features to the effective surrogate matrix on patches: $S_{A,j} \approx \Psi(R_{j}(A); \theta),$ where

$R_{j}(A)$ is a vectorized representation or local average of $A$ on the patch of element $j$ .
$\Psi(\cdot, \theta)$ is a neural network, typically a fully connected feedforward MLP (architecture described below), parametrized by $\theta$ , mapping from $\mathbb{R}^{r} \to \mathbb{R}^{s \times t}$ (e.g., $r=1600$ for a patch of $40 \times 40$ , $s=36, t=4$ for local matrix dimensions).

This approach explicitly supports multiscale data since the learned map operates on features encoding both macro- and microstructure. Once $\Psi$ is trained, new coefficients $A$ yield a surrogate $S_{A}$ through simple extraction of $R_{j}(A)$ and forward network evaluation—enabling orders of magnitude faster assembly than classical approaches, which require solving (often nonlinear) local cell problems for each sample of $A$ .

3. Feedforward Neural Network Architecture and Training

The core network $\Psi$ is constructed as

$\Psi(x) = W^{(8)} \rho(W^{(7)}(\dots \rho(W^{(2)}(\rho(W^{(1)}x + b^{(1)})) + b^{(2)}) \dots) + b^{(7)}) + b^{(8)},$

with standard ReLU ( $\rho$ ) activations, and layer widths conforming to the size of local inputs/outputs. Each training sample is a tuple $(A^{(i)}_j, S^{(i)}_{A, j})$ where $A^{(i)}_j$ is the reduced representation on patch $j$ of sample $i$ , and $S^{(i)}_{A, j}$ is the corresponding local matrix found via classical upscaling (e.g., LOD or Petrov–Galerkin local problems).

The loss function is

$\mathcal{J}(\theta) = \frac{1}{N |J|} \sum_{i=1}^N \sum_{j \in J} \frac{1}{2} \left\| \Psi(A^{(i)}_j; \theta) - S^{(i)}_{A,j} \right\|^{2},$

with $N$ the number of training samples, $J$ the set of patch indices.

Training is performed solely in an offline phase. The online phase—constructing the global surrogate $\widehat{S}_{A}$ and solving the coarse PDE—requires only fast local network inference and assembly.

4. Application to Heterogeneous Elliptic Diffusion Operators

The abstract framework is illustrated for second-order heterogeneous elliptic diffusion problems: $-\mathrm{div}(A \nabla u) = f.$ In modern numerical homogenization (e.g., Localized Orthogonal Decomposition, LOD), the surrogate $S_{A}$ is constructed as

$(S_{A})_{ij} = a_{A}\left((1 - Q^\ell_{A})\lambda_j, \lambda_i\right),$

where $Q^\ell_{A}$ is a localized corrector (computed via local PDE solves on oversampling patches determined by $\ell$ ), and $\{\lambda_i\}$ is the nodal FE basis. $S_A$ is then assembled from local matrices $S_{A,T}$ over elements $T$ .

Network-based compression replaces the expensive local solves with $\Psi$ evaluations, dramatically reducing computational complexity in multi-query or uncertainty quantification settings. Numerical experiments in the paper confirm that the learned operators maintain high accuracy for coarse-grid solutions, even in the presence of complex $A$ with fine-scale features.

5. Comparison to Classical Upscaling and Homogenization

Traditional approaches for compressing such PDE operators—numerical upscaling, classical homogenization—require, for each new $A$ , solving many local corrector problems to obtain $S_{A}$ . These local PDE solves are computationally expensive and must be solved anew with each instantiation of $A$ . This bottleneck is severe in multi-query (Bayesian inversion, optimal design, uncertainty quantification) and online simulation regimes.

The neural operator approach retains accuracy while offering:

High compression ratio: Each local map $\Psi$ replaces a numerically computed local operator, reducing storage and evaluation cost.
Fast online inference: The entire surrogate $S_{A}$ can be assembled in parallel by neural net evaluation, eliminating all fine-mesh solves required online.
Generality and reusability: Once trained (offline), the network can be used for an entire parametrized class of $A$ (provided the feature extraction covers all relevant local scales).

6. Implementation Considerations and Extensions

The architecture is modular: For more complex scenarios, the reduction operator $R_{j}$ and the network $\Psi$ can be adapted to different PDE types (time-dependent, nonlinear, wave propagation).
The approach can be generalized to stochastic homogenization by extending $R_{j}$ to encode relevant statistics for random $A$ .
Robustness to geometric variation can be improved by constructing reference patches with varying shapes in training.
Direct learning of the inverse operator—by learning mappings $\Psi$ to approximate entries of $S_{A}^{-1}$ rather than $S_{A}$ —is highlighted as a promising direction, as are theoretical studies of sample complexity and network expressivity for multiscale surrogates.

7. Numerical and Practical Impact

The resulting scale-informed neural operator architecture achieves significant speedups and compression, with numerical results demonstrating that for elliptic diffusion with highly oscillatory coefficients, the relative error of the coarse PDE solution induced by the neural operator is close to that of reference multiscale methods. The architecture supports orders-of-magnitude faster surrogate construction (in the online phase), favorable scaling with coefficient dimension and problem size, and broad applicability to domains demanding efficient multiscale PDE operator evaluation.

This framework thus establishes a generalizable and efficient paradigm for data-driven surrogatization of multiscale PDE operators on arbitrary scales, supporting many-query simulation, uncertainty quantification, and real-time applications where classical multiscale solvers are otherwise infeasible due to computational cost.

PDF Markdown Chat (Pro)

References (1)

Operator Compression with Deep Neural Networks (2021)

Follow Topic

Get notified by email when new papers are published related to Scale-Informed Neural Operator.