Wavelet Neural Operator

Updated 2 May 2026

Wavelet Neural Operator is a framework that leverages wavelet-domain multiresolution analysis to learn nonlinear mappings between infinite-dimensional function spaces, ideal for parametric PDEs and structured data.
Its architecture integrates local lifting, wavelet-domain convolution with learnable kernels, and nonlinear activations to effectively capture both global trends and localized features.
Recent enhancements include physics-informed, multi-fidelity, spiking, and vision transformer variants that enhance accuracy, efficiency, and adaptability across various scientific applications.

The Wavelet Neural Operator (WNO) is a neural-operator architecture that leverages wavelet-domain multiresolution analysis to learn nonlinear mappings between infinite-dimensional function spaces, most notably the solution operators associated with parametric partial differential equations (PDEs) and structured data. WNO builds on the operator-learning paradigm by combining the spatial and frequency localization properties of wavelet transforms with learnable integral kernels, resulting in models with enhanced capacity for capturing both global and localized phenomena. Since its inception, WNO has been rapidly extended and applied to scientific machine learning, uncertainty quantification, multi-fidelity modeling, edge deployment, vision transformers, and foundational continual learning.

1. Mathematical Foundations and Operator Parameterization

WNO models the map $\mathcal{G}: \mathcal{A} \to \mathcal{U}$ between Banach spaces of functions by parameterizing a nonlinear integral operator, typically written as

$(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$

where $\kappa_\phi$ is a learnable kernel. Rather than parameterizing $\kappa_\phi$ in the physical domain, WNO projects functions into a wavelet basis $\{ \psi_{j,k} \}$ , allowing the integral convolution to be diagonalized or sparsified:

$(\mathcal{K}x)(s) = \mathcal{W}^{-1}\left[ \left( \mathcal{W}x \right) \cdot \left( \mathcal{W}k \right) \right](s)$

where $\mathcal{W}$ and $\mathcal{W}^{-1}$ denote the forward and inverse discrete wavelet transforms (DWT/IDWT), and multiplication is channel- or group-wise on the wavelet coefficient tensors (Tripura et al., 2022, Nekoozadeh et al., 2023). The compact support of wavelets furnishes spatial and frequency localizations, enabling accurate representation of sharp transitions, discontinuities, and multiscale patterns.

Each WNO block comprises:

Local linear lifting/projection on input/output fields.
Wavelet-domain convolution: DWT → small learnable kernel in coefficient space → IDWT.
Channel- or group-wise nonlinearities (typically GeLU or ReLU).
Skip/parallel pointwise convolutions for additional local mixing.

The multiresolution analysis is effected by stacking multiple such blocks, with wavelet decomposition levels (commonly Daubechies db4 or db6, or Haar for speed).

2. Architectural Variants and Recent Enhancements

Standard and Bi-fidelity WNO

The canonical WNO structure employs a shallow local network for input lifting, multiple wavelet-integral layers, and a shallow projector network for output (Tripura et al., 2022). The multi-fidelity extension (MF-WNO) supplements this with residual learning from a low-fidelity WNO surrogate, such that

$u_H(x) = \mathcal{L}_L(a)(x) + R(a)(x; \theta),$

with $R$ learned to map $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 0 to the high-fidelity residual (Thakur et al., 2022).

Physics-Informed WNO (PIWNO) and Gray-Box Augmentation

PIWNO eliminates the need for labeled data by enforcing a PDE-based residual loss directly. The model minimizes a sum of PDE residuals, boundary/initial conditions, and optionally available data supervision (N et al., 2023). Differentiable physics–augmented variants (DPA-WNO) further incorporate a fixed physics solver, with the WNO modeling only the missing physics using

$(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 1

and trained end-to-end (Tushar et al., 2023).

Spiking and Energy-Efficient Variants

Variable Spiking WNO (VS-WNO) replaces neural activations in WNO layers with variable spiking neurons supporting graded output, enabling sparse and event-driven computation. Each VSN integrates a leaky-membrane update, binary thresholding, and a graded output (Garg et al., 2023). Despite algorithmic sparsity, dense GPU deployment does not always reduce inference energy/latency due to lack of sparsity-aware runtimes; net energy savings are realized only on event-driven substrates or specifically optimized pipelines (Yoo et al., 18 Apr 2026).

Vision Transformers and Multiscale Attention

Multiscale Wavelet Attention (MWA) uses a WNO-inspired block to replace self-attention in vision transformers. MWA applies a 2D DWT to tokens, applies learnable convolutions per subband (emphasizing both edges and smooth regions), and reconstructs with IDWT, achieving linear complexity and improved accuracy on standard computer vision benchmarks compared to global Fourier-based attention (Nekoozadeh et al., 2023).

U-WNO and Foundational Operator Learning

U-WNO augments each wavelet layer with a U-Net block for spatial skip connections and an adaptive activation mechanism, enhancing fitting of high-frequency components and yielding up to 93.8% mean $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 2 error reduction versus standard WNO on challenging PDEs (Lei et al., 2024). Neural Combinatorial WNO (NCWNO) introduces local wavelet experts per layer and a gating mixture-of-experts architecture with memory-based ensembling, establishing a foundational operator model capable of continual learning across multiple physics without catastrophic forgetting (Tripura et al., 2023).

3. Learning, Training Regimes, and Theoretical Properties

WNO models are typically trained by minimizing empirical $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 3 or mean squared error over discretized (input, output) pairs, with standard weight decay and Adam optimizer. For data-scarce or noisy regimes, ensemble and randomized-prior variants (RP-WNO) provide efficient epistemic uncertainty quantification with negligible mean performance loss (Garg et al., 2023).

Wavelet domain parameterization underlies several important theoretical and practical strengths:

Spatial/frequency localization gives sharper representation of interfaces and multiscale features than global Fourier kernels.
Stackability and resolution-independence allow WNO to upsample or downsample across grid sizes with little loss of accuracy (modulo discretization-invariance limitations).
Universal approximation follows from the density of wavelet representations and neural operator theorems (Tripura et al., 2022).

A limitation is that standard WNOs trained at a fixed grid resolution lack true discretization invariance, unlike FNOs (Rashid et al., 2023). Additionally, training cost is higher due to repeated wavelet transforms.

4. Empirical Performance and Comparison

WNO and its variants consistently deliver strong or state-of-the-art performance across a broad range of parametric PDEs and vision tasks. Summarized results include:

Burgers, Darcy, Navier-Stokes, Allen–Cahn, Wave Advection, Poisson problems: WNO matches or exceeds DeepONet, FNO, and MWT in $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 4 error for most settings (Tripura et al., 2022, Rashid et al., 2023, Lei et al., 2024).
Computer vision: On Tiny-ImageNet, MWA achieves 81.40% Top-1 accuracy (ViT-S/4), vs. AFNO at 79.98% and GFN at 80.32% (Nekoozadeh et al., 2023).
Operator learning: In digital-composite strain prediction, WNO is more data-efficient and accurate around high-gradient interfaces than MWT/FNO (Rashid et al., 2023).
Multi-fidelity: MF-WNO reduces necessary high-fidelity samples by >20× compared to vanilla WNO for the same error (Thakur et al., 2022).
Continual/foundational transfer: NCWNO retains accuracy across multiple PDEs with minimal retraining and outperforms all baselines in both 1D/2D and out-of-distribution transfer (Tripura et al., 2023).
Energy/latency: On edge hardware, VS-WNO achieves $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 5 algorithmic sparsity; system-level power/latency savings require backend optimizations for sparsity-awareness (Garg et al., 2023, Yoo et al., 18 Apr 2026).
UQ: RP-WNO provides pointwise epistemic confidence intervals at the cost of training $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 6-model ensembles, with accuracy matching single WNO (Garg et al., 2023).

5. Implementation and Hyperparameter Choices

Representative hyperparameters across domains are:

Wavelet choice: Daubechies (db4, db6), Haar (for speed), dual-tree complex wavelets (for frequency orientation).
Decomposition levels: $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 7 (CV/ViT), $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 8– $(\mathcal{G}_\phi[a])(x) = \int_D \kappa_\phi(x, y) a(y) dy,$ 9 (PDEs).
Feature width: $\kappa_\phi$ 0– $\kappa_\phi$ 1.
Layers: typically 4–6 wavelet layers.
Optimizer: Adam, learning rate $\kappa_\phi$ 2– $\kappa_\phi$ 3 with decay.
Activations: GeLU, ReLU, linear, or adaptive slope (U-WNO).

Training uses batch sizes $\kappa_\phi$ 4– $\kappa_\phi$ 5, often on a single GPU for small- to medium-scale problems. Model parameter counts typically scale with kernel and group sizes; e.g., MWA uses grouped $\kappa_\phi$ 6 convolutions; VS-WNO implements VSNs via snnTorch or equivalent (Garg et al., 2023, Nekoozadeh et al., 2023, Lei et al., 2024).

6. Applications and Domains of Deployment

WNO has been deployed and benchmarked in:

Scientific ML: Solution operators for parametric PDEs (Burgers’, Navier–Stokes, Allen–Cahn, Nagumo, Poisson, Darcy flow).
Uncertainty quantification: Surrogate-accelerated Monte Carlo, RP-WNO for epistemic confidence intervals (Garg et al., 2023, Thakur et al., 2022).
Multi-fidelity modeling: Surrogate refinement with minimal high-fidelity data (Thakur et al., 2022).
Computer vision: Multiscale attention in vision transformers, improved expressiveness over Fourier-based global mixers (Nekoozadeh et al., 2023).
Edge/neuromorphic computing: Sparse spiking variants for energy-efficient inference, contingent on hardware/runtimes (Garg et al., 2023, Yoo et al., 18 Apr 2026).
Continual/foundation operator models: Transfer across diverse physics, rapid adaptation without forgetting (Tripura et al., 2023).
Physics-informed learning: PDE-governed systems, “gray-box” DPA-WNO with differentiable solvers (N et al., 2023, Tushar et al., 2023).
Real-time control: Offshore structure response, digital twins for climate/temperature inference (Tripura et al., 2022, Cao et al., 2023).

7. Limitations and Ongoing Directions

WNO's main strengths—modular multiscale design, spatial-frequency localization, and parameter efficiency—are balanced by several open challenges and ongoing research:

Discretization invariance is limited compared to FNO; pixel- and material-grid super-resolution require further refinement (Rashid et al., 2023).
Training cost can be significant due to repeated DWT/IDWT operations.
Choosing optimal wavelet families and decomposition levels is problem-dependent and impacts both expressivity and efficiency.
Hardware deployment: Realizing energy or latency savings with sparse spiking variants depends on backend support for sparsity-aware execution (Yoo et al., 18 Apr 2026).
Integration of physics-informed loss terms, adaptive or learnable wavelet bases, and deployment on irregular or graph-centric meshes remain active research directions (Lei et al., 2024, N et al., 2023, Tripura et al., 2023).

The Wavelet Neural Operator framework constitutes a foundational tool for multiscale operator learning, bridging numerical analysis, scientific computing, and modern deep learning (Tripura et al., 2022, Nekoozadeh et al., 2023, Lei et al., 2024). Its continuing evolution trends toward integration with physics- and data-driven paradigms, continual/foundational learning architectures, and deployment across a broad spectrum of scientific and engineering applications.