Walsh-Hadamard Neural Operators (WHNO)
- WHNO is a spectral neural operator that uses the Walsh-Hadamard basis to capture sharp discontinuities in PDE solutions.
- It employs learnable spectral weights on low-sequency coefficients followed by convolutional decoding to achieve high fidelity in benchmark PDE problems.
- Ensembling WHNO with Fourier operators leverages complementary strengths, yielding up to 35% error reductions for discontinuous phenomena.
The Walsh-Hadamard Neural Operator (WHNO) is a spectral neural operator constructed to approximate solution operators of partial differential equations (PDEs) characterized by discontinuous coefficients or sharp solution features. Unlike standard spectral neural operators based on Fourier transforms, which are highly effective for smooth fields but susceptible to the Gibbs phenomenon around discontinuities, the WHNO leverages the Walsh-Hadamard transform—a basis of orthonormal, piecewise-constant rectangular functions—enabling high-fidelity representation of abrupt jumps and interfaces without spectral ringing. The architecture comprises learnable spectral weights acting on low-sequency Walsh coefficients to capture nonlocal dependencies, followed by a convolutional decoder. Empirical results demonstrate WHNO’s superiority over Fourier-based neural operators when sharp material interfaces are present and further reveal that ensembles combining WHNO and FNO exploit complementary representational properties, achieving substantial error reductions for a suite of benchmark PDEs with discontinuities (Cavallazzi et al., 10 Nov 2025).
1. Mathematical Foundations
1.1 Walsh–Hadamard Basis and Transform
The Walsh functions constitute an orthonormal basis on , each function a rectangular wave taking values in . Unlike sinusoids, Walsh functions are sequency-ordered: has zero-crossings, correlating low with broad, constant regions and high with rapid alternation.
For vectors , the (normalized) Hadamard matrix (with ) underpins the discrete Walsh-Hadamard Transform (WHT). Key definitions: - 0 - 1 - 2 (orthonormalization) - One-dimensional WHT: 3, with 4.
The WHT for 5 (continuous, 6):
7
For discrete 8 on 9 grid points 0:
1
The two-dimensional (2D) transform uses WHT along each axis. The Fast Walsh-Hadamard Transform (FWHT) computes this in 2 time.
1.2 Relationship to PDE Discontinuities
Walsh basis functions are uniquely suited to representing piecewise-constant features common in heterogeneous PDEs. The presence of sharp jumps or interfaces yields a sparse Walsh spectrum, supporting efficient low-sequency truncation without significant interface distortion. In contrast, the Fourier basis incurs oscillatory artifacts (Gibbs phenomenon) near discontinuities—requiring orders of magnitude more modes for comparable sharpness.
2. Operator Architecture
2.1 High-Level Pipeline
Given a coefficient field 3 on a 4 grid (5, 6 powers of 2), the WHNO workflow:
- Input Lifting: Construct 7.
- Spectral Layers (typically two):
- Forward 2D WHT: 8
- Spectral Truncation: Retain only 9 lowest-sequency coefficients: 0
- Learnable Spectral Weights: Affine mixing in spectral domain:
1
- Zero Padding: Expand to 2
- Inverse WHT: 3
- Spatial Mixing & Skip Connections: First layer, no skip; second layer, residual: 4.
- Decoder: Several dilated 2D convolutions act on 5 to yield output 6.
2.2 Spectral-Layer Formulae
Let 7 indicate layer index:
8
2.3 Forward Pass Pseudocode
03
2.4 Parameterization
All learnable weights: 9. Typical model: 0 parameters (1 spectral, 2 decoder).
3. Training Regimes and Experimental Setup
3.1 Loss and Optimization
Training minimizes mean squared error (MSE) across the spatial domain:
3
Optimization: AdamW, learning rate 4 (cosine decay/step), weight decay 5, batch size 4 (heat, Darcy), 1–2 (Burgers); 400 epochs.
3.2 Data Generation for Discontinuous PDEs
- Darcy flow: Binary 6 with 4 random rectangles (7). Solve 8 with mixed Dirichlet/Neumann boundary conditions.
- Heat conduction: 9 matrix 0, inclusions 1 or 2. 3 in central 4 region, Dirichlet 5 on boundary, quasi-steady integration.
- 2D Burgers: 6, 7. Three 8 blocks with 9 at 0, periodic boundary, 1, 2 steps, 3.
3.3 Spectral Truncation and Channelization
Typical spectral truncation: 4 (i.e., 5 low-sequency block), 16 encoder channels, 64 decoder channels.
4. Empirical Evaluation and Benchmarks
4.1 Steady-State Darcy Flow
In binary permeability with four obstacles, WHNO achieves 6 relative error in pressure 7 on the 8 test set, with maximal errors localized at obstacle boundaries.
4.2 Heat Conduction with Discontinuous Conductivity
Under identical architectures and training, WHNO outperforms the Fourier Neural Operator (FNO) in all primary error metrics for heat conduction with discontinuous 9. Summary:
| Method | MSE | Mean Rel. Err. | Max Abs Error |
|---|---|---|---|
| WHNO | 0 | 1 | 2 |
| FNO | 3 | 4 | 5 |
| Advantage | 6 lower | 7 lower | 8 lower |
Weighted ensemble (9) combining WHNO and FNO further reduces MSE by 0 and max error by 1.
4.3 2D Burgers Equation with Discontinuous Initial Conditions
For Burgers’ equation (2), WHNO demonstrates 3 lower MSE and 4 lower mean absolute error versus FNO. Ensemble with 5 realizes 6 MSE and 7 MAE reduction:
| Method | MSE | MAE | Max Error |
|---|---|---|---|
| WHNO | 8 | 9 | 0 |
| FNO | 1 | 2 | 3 |
| Ensemble | 4 | 5 | 6 |
Across all tasks, the WHNO+FNO ensemble consistently achieves 7 lower MSE relative to WHNO alone (up to 8 over FNO), and reduces error variance (Cavallazzi et al., 10 Nov 2025).
5. Discussion and Design Rationale
5.1 Mitigating the Gibbs Phenomenon
The rectangular Walsh basis inherently represents step-like or piecewise-constant functions exactly, precluding the overshoot or ringing that afflicts Fourier (oscillatory) bases near discontinuities. For a field with explicit jump discontinuities, the Walsh spectrum is sparse; low-sequency truncation preserves interface sharpness, unlike Fourier truncation which requires many retained modes.
5.2 Representation Complementarity
WHNO captures discontinuities and sharp interfaces, while FNO is optimal for smooth oscillatory or gradient-dominated fields. In many physical PDEs, both features coexist—the ensemble leverages the strengths of each: WHNO dominates near interfaces, FNO in smooth interiors. The optimal ensemble weight depends on the proportion of discontinuous versus smooth features (e.g., 9 for heat conduction, 00 for Burgers).
5.3 Computational Trade-offs
Both WHNO and FNO have 01 spectral layer complexity. While an ensemble doubles inference time, no extra training is required and the 02 error reduction may be warranted in applications with critical requirements for interface resolution (e.g., composite material design, subsurface flow in fractured media).
5.4 Recommended Usage Patterns
- Use WHNO alone when discontinuities predominate and inference speed is a constraint.
- Use WHNO+FNO ensemble for maximal accuracy and broad robustness across discontinuous and smooth regions.
- Use FNO alone for strongly smooth (e.g., Gaussian random field) coefficients or when discontinuities are absent.
6. Summary
The Walsh-Hadamard Neural Operator is a spectral neural operator targeting PDEs with discontinuous coefficients or sharp local features. Its rectangular wave basis and low-sequency spectral weights enable efficient, direct learning of sharp interfaces, eliminating Fourier-induced artifacts. For heterogeneous PDEs, ensembling WHNO with FNO exploits complementary basis properties, delivering state-of-the-art accuracy and robustness with moderate computational overhead (Cavallazzi et al., 10 Nov 2025).