Papers
Topics
Authors
Recent
2000 character limit reached

Self-Supervised Neural Operators

Updated 7 January 2026
  • Self-supervised neural operators are models that learn infinite-dimensional mappings using physical constraints as self-supervision without paired input-output data.
  • They are applied to PDE regression, dynamical systems, optimal control, and simulations like cloth dynamics to enable rapid inference and cross-resolution generalization.
  • By leveraging physics-informed loss functions and adaptive NTK-based weighting, SS-NOs significantly reduce error while illustrating challenges such as the curse of dimensionality.

Self-supervised neural operators (SS-NOs) are operator learning architectures in which the learning process proceeds without reliance on explicit paired input–output data, instead enforcing physical constraints or end-task objectives as self-supervision. SS-NO frameworks have been successfully applied to partial differential equations (PDEs), dynamical systems, optimal control, and physics-based simulation, and have been shown to enable rapid inference, improved scalability, and cross-resolution generalization when compared to classical data-driven or numerical methods (Wang et al., 2021, Chen et al., 5 Dec 2025, Xu et al., 31 Dec 2025).

1. Definition and General Framework

Self-supervised neural operators (SS-NOs) are machine learning models that parameterize mappings between infinite-dimensional function spaces, G:US\mathcal{G}: \mathcal{U} \to \mathcal{S}, where typical input-output pairs correspond to (possibly parametric) boundary conditions, source terms, environmental configurations, and the corresponding solution fields—often for PDEs or control problems. In contrast to supervised neural operators, SS-NOs do not require ground-truth solution fields s(u)s(u) for training. Instead, they leverage physical laws (e.g., enforcing zero PDE residuals or minimal control cost), or solve meta-level energy minimization problems to supply a self-supervision signal. Training is typically conducted through minimization of task-appropriate loss functions, often via automatic differentiation through physical simulation or ODE/PDE solvers (Wang et al., 2021, Chen et al., 5 Dec 2025, Xu et al., 31 Dec 2025).

2. Self-Supervised Operator Learning for PDEs

Physics-informed self-supervised learning for neural operators is a cornerstone approach for PDE operator regression. In this setting, neural operators such as DeepONets are parameterized by two subnetworks—a branch network ϕbranch\phi_{\text{branch}} encoding the function input, and a trunk network ϕtrunk\phi_{\text{trunk}} encoding spatial or spatiotemporal query locations. The operator model is defined as

Gθ(u)(y)=k=1qbktk,G_\theta(u)(y) = \sum_{k=1}^{q} b_k t_k,

where b=ϕbranch(u)b = \phi_{\text{branch}}(u) and t=ϕtrunk(y)t = \phi_{\text{trunk}}(y).

The self-supervised training objective is expressed as

L(θ)=Loperator(θ)+Lphysics(θ),L(\theta) = L_{\text{operator}}(\theta) + L_{\text{physics}}(\theta),

where LoperatorL_{\text{operator}} encodes any available data supervision, and LphysicsL_{\text{physics}} penalizes deviations from PDE residuals and boundary/initial conditions:

Lphysics(θ)=1NbPbi=1Nj=1PbB(ui(xj),Gθ(ui)(yj))2+1NcQi=1Nr=1QN(ui(xi,r),Gθ(ui)(yi,r))2.L_{\text{physics}}(\theta) = \frac{1}{N_b P_b} \sum_{i=1}^N \sum_{j=1}^{P_b} \left| B(u_i(x_j), G_\theta(u_i)(y_j)) \right|^2 + \frac{1}{N_c Q} \sum_{i=1}^N \sum_{r=1}^Q \left| N(u_i(x_{i,r}), G_\theta(u_i)(y_{i,r})) \right|^2.

Here BB and NN represent the boundary and (differential) PDE residual operators, respectively. In the fully self-supervised regime, LoperatorL_{\text{operator}} is set to zero, and learning is driven purely by minimization of LphysicsL_{\text{physics}} (Wang et al., 2021).

Adaptive weighting based on Neural Tangent Kernel (NTK) diagonals is used to equalize convergence rates across collocation points, thereby mitigating magnitude bias and vanishing gradient pathologies. Algorithmic details are provided through per-step diagonal NTK computation and dynamic update of per-point loss weights (Wang et al., 2021). These innovations yield order-of-magnitude reductions in error for challenging nonlinear PDE benchmarks.

3. Meta-Optimization and Physical Simulation with FNOs

Resolution-agnostic, self-supervised simulation frameworks have been developed by formulating time integration as a meta-optimization problem. In the context of cloth simulation, FNOpt builds an SS-NO architecture where each time-step corresponds to an energy-based minimization that replaces per-step numerical solution (Chen et al., 5 Dec 2025).

Key workflow:

  • Energy-based loss: Each step solves for the cloth state by minimizing a “cloth loss” combining internal elastic energy, external forces, and inertial terms:

Lcloth(xt+1,vt+1;xt,vt)=Eint(xt+1)+Eext(xt+1)+Einertia(xt+1,vt+1;xt,vt).\mathcal{L}_{\text{cloth}}(x_{t+1}, v_{t+1}; x_t, v_t) = \mathcal{E}_{\text{int}}(x_{t+1}) + \mathcal{E}_{\text{ext}}(x_{t+1}) + \mathcal{E}_{\text{inertia}}(x_{t+1}, v_{t+1}; x_t, v_t).

  • Inner loop: A neural optimizer (parametrized as an FNO) performs several gradient- or residual-based updates, learning to drive the physics loss towards zero at each forward step.
  • Outer loop: Training meta-optimizes the FNO parameters through rollout loss accumulation, with gradients propagated through the inner loops.

By operating directly on physics losses and using Fourier Neural Operator backbones, this approach supports zero-shot deployment across resolutions, boundary conditions, and motion regimes without ground-truth data from a physical solver. The architecture employs spectral-domain convolutions, enabling generalization to arbitrary grid sizes (Chen et al., 5 Dec 2025).

4. Optimal Control and Solution-Operator Learning

Self-supervised neural operators have been extended to open-loop and closed-loop optimal control, where the operator G\mathcal G maps from environmental conditions and initial states to optimal trajectories or control signals. The self-supervised training objective minimizes the cost incurred by the control sequence produced by Gθ\mathcal G_\theta, with no direct supervision from optimal trajectories:

L(θ)=1ni=1n[0TLBi(t,xθi(t),uθi(t))dt+M(xθi(T))].\mathcal L(\theta) = \frac{1}{n} \sum_{i=1}^n \left[ \int_0^T L_{B^i}(t, x^i_\theta(t), u^i_\theta(t))\,dt + M(x^i_\theta(T)) \right].

Dynamical trajectories x(t)x(t) are generated on-the-fly by integrating the system with controls uθ=Gθ(B,x0)u_\theta=\mathcal G_\theta(B,x_0), and loss is backpropagated through the integration process (Xu et al., 31 Dec 2025).

For closed-loop control, SS-NOs are embedded within Model Predictive Control (MPC) loops, enabling rapid real-time replanning as the operator predicts optimal controls conditioned on evolving environmental input.

Scaling-law theorems have been established, quantifying how sample complexity, network size, and generalization error scale with the intrinsic dimension of the environment and state and the regularity of the operator:

  • To achieve generalization error ϵ\epsilon, the number of training samples nn must satisfy

nϵd+k+2(s+α)s+α,n \gtrsim \epsilon^{-\frac{d+k+2(s+\alpha)}{s+\alpha}},

where dd is state dimension, kk is environment dimension, and s+αs+\alpha is the regularity. This result explicitly highlights the curse of dimensionality for high-complexity operator learning in control (Xu et al., 31 Dec 2025).

5. Empirical Performance and Generalization Properties

SS-NOs have demonstrated substantial accuracy and generalization improvements in benchmark PDE, simulation, and control tasks. Notable empirical trends include:

  • For PDE operator regression (DeepONet with NTK-based weights and improved architecture), test errors decrease by up to a factor of 50, especially in problems with sharp gradients or multiscale features (Wang et al., 2021).
  • In cloth simulation, FNOpt achieves lower rollout error at higher mesh resolution despite training only at low resolution. Out-of-distribution scenarios—including novel boundary conditions and accelerated time scales—are handled robustly, with rollouts stable and accurate across all tested cases (Chen et al., 5 Dec 2025).
  • In optimal control, SS-NOs yield near-instantaneous solutions with cost and trajectory error within 1%1\% of ground-truth for low to moderate intrinsic dimensionality, and deliver two orders of magnitude speedup over direct solvers (Xu et al., 31 Dec 2025). However, error rates grow sharply as control and environment dimensionality increase, in line with theoretical predictions.

A summary of representative benchmarks is provided in the following table:

Domain Architecture In-Domain Test Error Cross-Regime Generalization
Parametric PDEs DeepONet+NTK/arch 1% (advection), 1.2% (Burgers) Robust to scale, geometry (Wang et al., 2021)
Cloth Simulation FNO (FNOpt) 6.7 (32×3232\times32), 4.8 (64×6464\times64) Stable rollouts, OOD boundary, speed (Chen et al., 5 Dec 2025)
Optimal Control ReLU/attention NO <<1% (2 bump, maze 5x5), <<5% (3 bump, maze 9x9) Near-instant inference, degradation with dimension (Xu et al., 31 Dec 2025)

6. Significance, Limitations, and Theoretical Insights

Self-supervised neural operators constitute a shift from reliance on ground-truth solutions towards operator learning directly constrained by underlying physical or task-specific structure. Theoretical analyses reveal that while SS-NOs are highly suitable for low- to moderate-dimensional operator learning, fundamental sample-complexity and regularity barriers emerge as intrinsic dimension grows, mirroring the curse of dimensionality (Xu et al., 31 Dec 2025). NTK-guided weighting and improved architectural choices are crucial for stable, unbiased convergence and for mitigating optimization pathologies (Wang et al., 2021).

A plausible implication is that SS-NOs can serve as foundation models or surrogates for expensive simulators in domains where ground-truth data is scarce but physical laws are well-understood, with adaptation possible via lightweight finetuning (Wang et al., 2021, Chen et al., 5 Dec 2025). However, for models involving high-dimensional input manifolds or irregular solution operators, the scalability remains fundamentally limited by sample efficiency constraints.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Self-Supervised Neural Operators.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube