Self-Supervised Neural Operators

Updated 17 January 2026

Self-supervised neural operators are frameworks that learn mappings between function spaces using physics-informed losses, eliminating the need for paired ground-truth data.
Key architectures, including DeepONets, Fourier Neural Operators, and Transformer-based models, leverage tailored input representations for diverse scientific computing tasks.
They deliver scalable, data-efficient solutions for PDE-constrained optimization, simulation, and optimal control, achieving sub-1% errors and real-time performance gains.

Self-supervised neural operator frameworks constitute a rapidly evolving class of methodologies for learning mappings between infinite-dimensional function spaces, particularly those arising as solution operators of partial differential equations (PDEs), dynamical systems, and variational evolution problems—all through losses defined by the underlying physics or optimality criteria, without paired ground-truth data from numerical solvers. These frameworks leverage self-supervised training strategies, physics-informed inductive biases, and advanced neural architectures including operator networks, Transformers, and Fourier neural operators to enable scalable, data-efficient, and generalizable surrogates for a wide range of scientific computing tasks.

1. Foundations of Self-Supervised Neural Operators

Self-supervised neural operators (SNOs) generalize the classical operator-learning problem by approximating solution operators $\mathcal{G}$ that map parametric or functional inputs (such as boundary, initial, or source functions, geometries, or control signals) to functional outputs (such as solutions to PDEs or optimal controls). The core distinguishing feature is the absence of supervised input-output pairs: instead, the training is driven by self-supervised losses derived from the governing equations or optimality conditions, such as residuals of PDEs, variational principles, or optimal control objectives (You et al., 31 Aug 2025, Wang et al., 2021, Chen et al., 5 Dec 2025, Xu et al., 31 Dec 2025, Feng et al., 9 Jan 2026).

Let $u \in \mathcal{U}$ denote an input function (e.g., initial condition, source term), and $s \in \mathcal{S}$ its associated solution. The goal is to learn $\mathcal{G} : \mathcal{U} \rightarrow \mathcal{S}$ such that $\mathcal{G}(u)$ approximates the mapping defined implicitly by a PDE or optimization principle. Training datasets in SNO frameworks thus comprise only collections of admissible $u_i$ 's; for each, the candidate solution $\mathcal{G}_\theta(u_i)$ is evaluated by its fidelity to the physical or optimality constraint, using a loss $L(\theta)$ that depends on differential, boundary, or variational operators.

2. Core Architecture Types and Data Representations

Several neural architectures have been proposed and deployed for SNOs:

Deep Operator Networks (DeepONets): These use a branch network to embed sampled input function values at pre-specified sensor locations, and a trunk network to encode the evaluation point for the output function, merging via inner product to obtain the desired mapping (Wang et al., 2021).
Fourier Neural Operators (FNOs): These implement global convolutional layers in the Fourier domain and are naturally discretization-invariant, which is leveraged for resolution-agnostic generalization (Chen et al., 5 Dec 2025).
Transformer-based Operator Networks: Transformers operating on point cloud or set-valued representations of functions (e.g., densities or parameter ensembles) are used for highly flexible architectures (Feng et al., 9 Jan 2026).
Two-stage compositional architectures: For optimal control operator learning, function inputs such as obstacle parameterizations are first embedded via set-encoders (attention, pooling, or symmetries), then combined with state and time encodings for control generation (Xu et al., 31 Dec 2025).

Input representations are tuned to the application: functions may be represented via values at sensors, grid samples, or point clouds (samples plus values), and non-functional parameters (geometries, time, system parameters) are encoded accordingly.

3. Self-Supervised Training Objectives and Losses

Self-supervised neural operator training introduces losses that directly penalize violation of the physical or optimization constraints at sampled collocation points:

Physics-Informed Losses for PDEs: These sum squared residuals of the governing PDE and boundary/initial conditions evaluated on $\mathcal{G}_\theta(u_i)$ over batches of randomly sampled $u_i$ , boundary, and interior points (Wang et al., 2021):

$L_\mathrm{total}(\theta) = L_\mathrm{BC}(\theta) + L_\mathrm{PDE}(\theta)$

with

$L_\mathrm{PDE}(\theta) = \frac{1}{NQ} \sum_{i,j} \| N(\mathcal{G}_\theta(u_i))(y_{r}^{i,j}) \|^2$

and similar for $L_\mathrm{BC}(\theta)$ .

Variational/Meta-Optimization Losses: For time-integration or control problems, each operator update is obtained by minimizing a physically meaningful energy or optimal cost, often recast as an inner optimization loop over network parameters (Chen et al., 5 Dec 2025).
JKO Operator Losses for Wasserstein Gradient Flow: The loss enforces one-step optimality of the learned JKO operator using only the proximal energy and Wasserstein metric, evaluated on self-generated density samples (Feng et al., 9 Jan 2026).
Optimal Control Objectives: The network is trained by direct minimization of the control cost for its predicted trajectory, with differentiable integration of the system dynamics and automatic adjoint backpropagation (Xu et al., 31 Dec 2025).

No ground-truth outputs are needed; all objectives rely exclusively on the mathematical structure of the underlying forward or control problem.

4. Algorithmic Pipelines and Training Procedures

All recent frameworks share a two-stage workflow:

Stage	Purpose	Key Steps
Self-supervised Operator Training	Learn $\mathcal{G}_\theta$ as surrogate	Sample input functions; minimize physics/optimality loss
Downstream Rapid Inference/Control	Deploy trained $\mathcal{G}_{\theta^*}$ as surrogate	Solve new PDE/optimization problems via forward evaluation

Meta-learning variants, such as FNOpt (Chen et al., 5 Dec 2025), unfold an inner optimization loop at every time step of a dynamics rollout, where the neural operator iteratively proposes corrections toward energy minimization. Learn-to-Evolve JKO operator training (Feng et al., 9 Jan 2026) alternates between generating new trajectories using the current operator and retraining on this self-augmented dataset, gradually improving coverage and generalization.

Training typically employs Adam or similar optimizers, batch/mini-batch sampling of input functions or states, and, where required, differentiable ODE/PDE solvers for backpropagation through dynamical trajectories.

5. Numerical Results, Scalability, and Generalization

Empirical studies across a range of problems demonstrate the effectiveness and efficiency of self-supervised neural operator frameworks:

Physics-informed DeepONets (Wang et al., 2021) can achieve sub-1% test error for parametric PDE control and optimization tasks, with optimization performed in orders of magnitude less time than classical adjoint or gradient-based PDE solvers, especially as control-parameter dimensions increase.
FNOpt (Chen et al., 5 Dec 2025) achieves high-fidelity cloth simulation with per-frame runtime of $32$–$61$ ms (for $N=5$ or $N=10$ inner meta-optimization steps) and consistently stable rollouts, generalizing zero-shot from coarse training grids (e.g., $32 \times 32$ ) to much finer resolutions ( $64 \times 64$ , $100 \times 100$ ) without retraining, outperforming supervised baselines in off-distribution scenarios.
JKO Operator Learning (Feng et al., 9 Jan 2026) achieves accurate, stable approximation of Wasserstein gradient-flow dynamics in high-dimensional or nonlinear settings (e.g., aggregation equations, porous medium equation, high-dimensional Fokker–Planck flows), with explicit mass conservation and error scaling consistent with the step size. The Learn-to-Evolve bootstrapping strategy ensures robust generalization from limited initial conditions.
Self-supervised Optimal Control Operators (Xu et al., 31 Dec 2025) demonstrate rapid prediction of time-optimal control strategies in domains such as maze navigation, obstacle avoidance, and nonlinear vehicle dynamics. The use of amortized operator learning integrated into model predictive control (MPC) delivers real-time closed-loop adaptation, with compute time per control plan reduced from seconds (NLP solvers) to milliseconds.

The scaling of generalization error and model/sample complexity can be made explicit. As shown in (Xu et al., 31 Dec 2025), the overall learning complexity depends essentially on the sum of intrinsic dimensions ( $d$ for initial state, $k$ for environment parameterization) and the (Sobolev or Hölder) regularity of the true operator, giving rise to sample complexity rates of

$n^{-\frac{s+\alpha}{d+k+2(s+\alpha)}}$

where $(s,\alpha)$ are the regularity exponents.

6. Limitations, Assumptions, and Practical Guidance

The success of self-supervised neural operator frameworks is conditioned upon several key assumptions:

The input distributions for parametric functions or states must admit a low-intrinsic-dimensional manifold parametrization, as sample and model complexity are governed by this dimension (Xu et al., 31 Dec 2025).
The target solution operator must possess sufficient regularity for the chosen neural network architecture to approximate efficiently.
In high-intrinsic-dimensional settings, the curse of dimensionality is still present despite the function-space learning paradigm, and performance degrades accordingly.

Practical considerations include:

Proper sensor and collocation point selection for physics-informed loss stability (Wang et al., 2021).
For problems with symmetries or invariances, ensure network architectures and input encodings respect these structure (e.g., permutation-invariance for obstacle lists, positional encodings for spatial domains) (Xu et al., 31 Dec 2025).
For dynamic or partially observed systems, closed-loop (MPC-style) integrations of neural operators compensate for lack of perfect open-loop generalization.

A plausible implication is that these frameworks are best applied to scientific and engineering problems where the parametric variation can be described compactly (e.g., by a few basis functions, mixture components, or low-dimensional geometric descriptors), and the underlying operator is smooth with respect to these parameters.

7. Representative Applications and Future Directions

Self-supervised neural operator frameworks have been demonstrated in diverse domains:

PDE-constrained optimization, including optimal control of heat and diffusion equations and shape optimization in Stokes flow (Wang et al., 2021).
Physics-based simulation such as cloth dynamics, where resolution-agnostic learned optimizers surpass supervised simulators in generalization and computational efficiency (Chen et al., 5 Dec 2025).
Wasserstein gradient flows in high-dimensional density evolution and nonlinear Fokker–Planck dynamics, with operator learning completely avoiding the need for high-fidelity ground-truth trajectories (Feng et al., 9 Jan 2026).
Amortized optimal control for rapid mapping from environmental/initial conditions to optimal control plans, scalable to real-time closed-loop deployment (Xu et al., 31 Dec 2025).

Current limitations motivate hybrid solutions that combine operator learning with local corrective optimization, and point towards continued improvements in architectural expressivity, data augmentation strategies, and theoretically justified regularization and generalization analyses.

Key References:

"Self-supervised neural operator for solving partial differential equations" (You et al., 31 Aug 2025)
"Fast PDE-constrained optimization via self-supervised operator learning" (Wang et al., 2021)
"FNOPT: Resolution-Agnostic, Self-Supervised Cloth Simulation using Meta-Optimization with Fourier Neural Operators" (Chen et al., 5 Dec 2025)
"Self-Supervised Amortized Neural Operators for Optimal Control: Scaling Laws and Applications" (Xu et al., 31 Dec 2025)
"Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow" (Feng et al., 9 Jan 2026)