Walsh-Hadamard Neural Operators (WHNO)

Updated 7 April 2026

WHNO is a spectral neural operator that uses the Walsh-Hadamard basis to capture sharp discontinuities in PDE solutions.
It employs learnable spectral weights on low-sequency coefficients followed by convolutional decoding to achieve high fidelity in benchmark PDE problems.
Ensembling WHNO with Fourier operators leverages complementary strengths, yielding up to 35% error reductions for discontinuous phenomena.

The Walsh-Hadamard Neural Operator (WHNO) is a spectral neural operator constructed to approximate solution operators of partial differential equations (PDEs) characterized by discontinuous coefficients or sharp solution features. Unlike standard spectral neural operators based on Fourier transforms, which are highly effective for smooth fields but susceptible to the Gibbs phenomenon around discontinuities, the WHNO leverages the Walsh-Hadamard transform—a basis of orthonormal, piecewise-constant rectangular functions—enabling high-fidelity representation of abrupt jumps and interfaces without spectral ringing. The architecture comprises learnable spectral weights acting on low-sequency Walsh coefficients to capture nonlocal dependencies, followed by a convolutional decoder. Empirical results demonstrate WHNO’s superiority over Fourier-based neural operators when sharp material interfaces are present and further reveal that ensembles combining WHNO and FNO exploit complementary representational properties, achieving substantial error reductions for a suite of benchmark PDEs with discontinuities (Cavallazzi et al., 10 Nov 2025).

1. Mathematical Foundations

1.1 Walsh–Hadamard Basis and Transform

The Walsh functions $\{w_k(x)\}_{k=0}^{\infty}$ constitute an orthonormal basis on $[0,1]$ , each function a rectangular wave taking values in $\pm1$ . Unlike sinusoids, Walsh functions are sequency-ordered: $w_k$ has $k$ zero-crossings, correlating low $k$ with broad, constant regions and high $k$ with rapid alternation.

For vectors $f \in \mathbb{R}^n$ , the (normalized) Hadamard matrix $H_n \in \{\pm1\}^{n \times n}$ (with $n=2^m$ ) underpins the discrete Walsh-Hadamard Transform (WHT). Key definitions: - $[0,1]$ 0 - $[0,1]$ 1 - $[0,1]$ 2 (orthonormalization) - One-dimensional WHT: $[0,1]$ 3, with $[0,1]$ 4.

The WHT for $[0,1]$ 5 (continuous, $[0,1]$ 6):

$[0,1]$ 7

For discrete $[0,1]$ 8 on $[0,1]$ 9 grid points $\pm1$ 0:

$\pm1$ 1

The two-dimensional (2D) transform uses WHT along each axis. The Fast Walsh-Hadamard Transform (FWHT) computes this in $\pm1$ 2 time.

1.2 Relationship to PDE Discontinuities

Walsh basis functions are uniquely suited to representing piecewise-constant features common in heterogeneous PDEs. The presence of sharp jumps or interfaces yields a sparse Walsh spectrum, supporting efficient low-sequency truncation without significant interface distortion. In contrast, the Fourier basis incurs oscillatory artifacts (Gibbs phenomenon) near discontinuities—requiring orders of magnitude more modes for comparable sharpness.

2. Operator Architecture

2.1 High-Level Pipeline

Given a coefficient field $\pm1$ 3 on a $\pm1$ 4 grid ( $\pm1$ 5, $\pm1$ 6 powers of 2), the WHNO workflow:

Input Lifting: Construct $\pm1$ 7.
Spectral Layers (typically two):

Forward 2D WHT: $\pm1$ 8
Spectral Truncation: Retain only $\pm1$ 9 lowest-sequency coefficients: $w_k$ 0
Learnable Spectral Weights: Affine mixing in spectral domain:

$w_k$ 1
Zero Padding: Expand to $w_k$ 2
Inverse WHT: $w_k$ 3

Spatial Mixing & Skip Connections: First layer, no skip; second layer, residual: $w_k$ 4.
Decoder: Several dilated 2D convolutions act on $w_k$ 5 to yield output $w_k$ 6.

2.2 Spectral-Layer Formulae

Let $w_k$ 7 indicate layer index:

$w_k$ 8

2.3 Forward Pass Pseudocode

$[0,1]$ 03

2.4 Parameterization

All learnable weights: $w_k$ 9. Typical model: $k$ 0 parameters ( $k$ 1 spectral, $k$ 2 decoder).

3. Training Regimes and Experimental Setup

3.1 Loss and Optimization

Training minimizes mean squared error (MSE) across the spatial domain:

$k$ 3

Optimization: AdamW, learning rate $k$ 4 (cosine decay/step), weight decay $k$ 5, batch size 4 (heat, Darcy), 1–2 (Burgers); 400 epochs.

3.2 Data Generation for Discontinuous PDEs

Darcy flow: Binary $k$ 6 with 4 random rectangles ( $k$ 7). Solve $k$ 8 with mixed Dirichlet/Neumann boundary conditions.
Heat conduction: $k$ 9 matrix $k$ 0, inclusions $k$ 1 or $k$ 2. $k$ 3 in central $k$ 4 region, Dirichlet $k$ 5 on boundary, quasi-steady integration.
2D Burgers: $k$ 6, $k$ 7. Three $k$ 8 blocks with $k$ 9 at $k$ 0, periodic boundary, $k$ 1, $k$ 2 steps, $k$ 3.

3.3 Spectral Truncation and Channelization

Typical spectral truncation: $k$ 4 (i.e., $k$ 5 low-sequency block), 16 encoder channels, 64 decoder channels.

4. Empirical Evaluation and Benchmarks

4.1 Steady-State Darcy Flow

In binary permeability with four obstacles, WHNO achieves $k$ 6 relative error in pressure $k$ 7 on the $k$ 8 test set, with maximal errors localized at obstacle boundaries.

4.2 Heat Conduction with Discontinuous Conductivity

Under identical architectures and training, WHNO outperforms the Fourier Neural Operator (FNO) in all primary error metrics for heat conduction with discontinuous $k$ 9. Summary:

Method	MSE	Mean Rel. Err.	Max Abs Error
WHNO	$f \in \mathbb{R}^n$ 0	$f \in \mathbb{R}^n$ 1	$f \in \mathbb{R}^n$ 2
FNO	$f \in \mathbb{R}^n$ 3	$f \in \mathbb{R}^n$ 4	$f \in \mathbb{R}^n$ 5
Advantage	$f \in \mathbb{R}^n$ 6 lower	$f \in \mathbb{R}^n$ 7 lower	$f \in \mathbb{R}^n$ 8 lower

Weighted ensemble ( $f \in \mathbb{R}^n$ 9) combining WHNO and FNO further reduces MSE by $H_n \in \{\pm1\}^{n \times n}$ 0 and max error by $H_n \in \{\pm1\}^{n \times n}$ 1.

4.3 2D Burgers Equation with Discontinuous Initial Conditions

For Burgers’ equation ( $H_n \in \{\pm1\}^{n \times n}$ 2), WHNO demonstrates $H_n \in \{\pm1\}^{n \times n}$ 3 lower MSE and $H_n \in \{\pm1\}^{n \times n}$ 4 lower mean absolute error versus FNO. Ensemble with $H_n \in \{\pm1\}^{n \times n}$ 5 realizes $H_n \in \{\pm1\}^{n \times n}$ 6 MSE and $H_n \in \{\pm1\}^{n \times n}$ 7 MAE reduction:

Method	MSE	MAE	Max Error
WHNO	$H_n \in \{\pm1\}^{n \times n}$ 8	$H_n \in \{\pm1\}^{n \times n}$ 9	$n=2^m$ 0
FNO	$n=2^m$ 1	$n=2^m$ 2	$n=2^m$ 3
Ensemble	$n=2^m$ 4	$n=2^m$ 5	$n=2^m$ 6

Across all tasks, the WHNO+FNO ensemble consistently achieves $n=2^m$ 7 lower MSE relative to WHNO alone (up to $n=2^m$ 8 over FNO), and reduces error variance (Cavallazzi et al., 10 Nov 2025).

5. Discussion and Design Rationale

5.1 Mitigating the Gibbs Phenomenon

The rectangular Walsh basis inherently represents step-like or piecewise-constant functions exactly, precluding the overshoot or ringing that afflicts Fourier (oscillatory) bases near discontinuities. For a field with explicit jump discontinuities, the Walsh spectrum is sparse; low-sequency truncation preserves interface sharpness, unlike Fourier truncation which requires many retained modes.

5.2 Representation Complementarity

WHNO captures discontinuities and sharp interfaces, while FNO is optimal for smooth oscillatory or gradient-dominated fields. In many physical PDEs, both features coexist—the ensemble leverages the strengths of each: WHNO dominates near interfaces, FNO in smooth interiors. The optimal ensemble weight depends on the proportion of discontinuous versus smooth features (e.g., $n=2^m$ 9 for heat conduction, $[0,1]$ 00 for Burgers).

5.3 Computational Trade-offs

Both WHNO and FNO have $[0,1]$ 01 spectral layer complexity. While an ensemble doubles inference time, no extra training is required and the $[0,1]$ 02 error reduction may be warranted in applications with critical requirements for interface resolution (e.g., composite material design, subsurface flow in fractured media).

5.4 Recommended Usage Patterns

Use WHNO alone when discontinuities predominate and inference speed is a constraint.
Use WHNO+FNO ensemble for maximal accuracy and broad robustness across discontinuous and smooth regions.
Use FNO alone for strongly smooth (e.g., Gaussian random field) coefficients or when discontinuities are absent.

6. Summary

The Walsh-Hadamard Neural Operator is a spectral neural operator targeting PDEs with discontinuous coefficients or sharp local features. Its rectangular wave basis and low-sequency spectral weights enable efficient, direct learning of sharp interfaces, eliminating Fourier-induced artifacts. For heterogeneous PDEs, ensembling WHNO with FNO exploits complementary basis properties, delivering state-of-the-art accuracy and robustness with moderate computational overhead (Cavallazzi et al., 10 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Walsh-Hadamard Neural Operators for Solving PDEs with Discontinuous Coefficients (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Walsh-Hadamard Neural Operators (WHNO).

Walsh-Hadamard Neural Operators (WHNO)

1. Mathematical Foundations

1.1 Walsh–Hadamard Basis and Transform

1.2 Relationship to PDE Discontinuities

2. Operator Architecture

2.1 High-Level Pipeline

2.2 Spectral-Layer Formulae

2.3 Forward Pass Pseudocode

2.4 Parameterization

3. Training Regimes and Experimental Setup

3.1 Loss and Optimization

3.2 Data Generation for Discontinuous PDEs

3.3 Spectral Truncation and Channelization

4. Empirical Evaluation and Benchmarks

4.1 Steady-State Darcy Flow

4.2 Heat Conduction with Discontinuous Conductivity

4.3 2D Burgers Equation with Discontinuous Initial Conditions

5. Discussion and Design Rationale

5.1 Mitigating the Gibbs Phenomenon

5.2 Representation Complementarity

5.3 Computational Trade-offs

5.4 Recommended Usage Patterns

6. Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Walsh-Hadamard Neural Operators (WHNO)

1. Mathematical Foundations

1.1 Walsh–Hadamard Basis and Transform

1.2 Relationship to PDE Discontinuities

2. Operator Architecture

2.1 High-Level Pipeline

2.2 Spectral-Layer Formulae

2.3 Forward Pass Pseudocode

2.4 Parameterization

3. Training Regimes and Experimental Setup

3.1 Loss and Optimization

3.2 Data Generation for Discontinuous PDEs

3.3 Spectral Truncation and Channelization

4. Empirical Evaluation and Benchmarks

4.1 Steady-State Darcy Flow

4.2 Heat Conduction with Discontinuous Conductivity

4.3 2D Burgers Equation with Discontinuous Initial Conditions

5. Discussion and Design Rationale

5.1 Mitigating the Gibbs Phenomenon

5.2 Representation Complementarity

5.3 Computational Trade-offs

5.4 Recommended Usage Patterns

6. Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research