Hadamard-Domain Convolution

Updated 27 November 2025

Hadamard-domain convolution is a transform-based operation that uses the ±1 Hadamard transform to convert convolution into elementwise multiplication, offering speed and energy efficiency.
It reduces computational complexity to O(n log n) by diagonalizing convolution operations, making it ideal for low-power hardware and scalable deep neural network implementations.
Recent studies demonstrate its versatility in enhancing neural network layers, model compression, time series feature extraction, and even fractional calculus computations.

Hadamard-domain convolution refers to a class of convolution-like operations performed using the Hadamard (or Walsh–Hadamard) transform or, in a distributional analysis context, to multiplicative convolution (in contrast to the classical additive convolution). This approach leverages the algebraic and computational properties of the Hadamard transform—an orthogonal linear transform with elements ±1—which enables highly efficient, multiplication-free, and in some cases non-parametric implementations of convolutional and pointwise operators in a variety of domains, including signal processing, deep neural networks, time series analysis, and fractional calculus. Both the foundational aspects and practical algorithmic constructions of Hadamard-domain convolution are rich and technically diverse.

1. Mathematical Foundations and Definitions

The Hadamard transform, denoted as $H_N$ for size- $N=2^k$ , is a real orthogonal matrix with entries in $\{\pm1\}$ . It satisfies $H_N H_N = N I_N$ , and is constructed recursively. The Walsh–Hadamard transform (WHT) of a vector $x \in \mathbb{R}^N$ is $X = H_N x$ , and the inverse is $x = \frac{1}{N} H_N X$ .

There are two major senses of "convolution" in the Hadamard domain:

Multiplicative (Hadamard) Convolution on Temperate Distributions: For $S, T \in S'(\mathbb{R}^d)$ (the space of temperate distributions), the *-product is defined by duality as

$(S * T)(\phi) := \langle S(x),\, \langle T(y),\, \phi(x \cdot y) \rangle_y \rangle_x$

where $x \cdot y = (x_1 y_1, ..., x_d y_d)$ . This operation generalizes classical convolution to multiplicative group actions (Vogt, 2018).

Transform-Domain Convolution (Signal and Neural Networks): The Hadamard (Walsh) transform diagonalizes the dyadic (Walsh) or circulant convolution:
- Dyadic convolution: For $a, x \in \mathbb{R}^N$ ,
$(a *_n x)_\ell = \sum_{m=0}^{N-1} a_m x_{\ell \oplus m}$

with $\oplus$ being bitwise XOR. By the Hadamard convolution theorem,

$\mathcal{H}(a *_n x) = \mathcal{H}(a) \circ \mathcal{H}(x)$

where $\circ$ denotes elementwise multiplication (Pan et al., 2023). - Pointwise and 2D Convolutional Analogues: By transforming inputs and kernels to the Hadamard domain, convolution becomes elementwise multiplication, analogous to the Fourier convolution theorem but using real-valued $\{\pm1\}$ basis (Mannam, 2022, Pan et al., 2022, Pan et al., 2021, Jeong et al., 2019).

2. Hadamard-Convolution Operators on Distribution Spaces

Hadamard-type operators are linear maps $L: S'(\mathbb{R}^d) \to S'(\mathbb{R}^d)$ such that every monomial $x^\alpha$ is an eigenvector. Vogt's Theorem establishes that $L$ is of Hadamard type if and only if it is convolution (in the multiplicative sense above) with a unique $T \in O_H(\mathbb{R}^d)$ , the space of distributions with "θ–rapid decay" both at infinity and near the coordinate hyperplanes (Vogt, 2018). Explicitly, this means $T$ is a finite sum of derivatives of functions $t_{\beta,k}(x)$ with

$(|x|^{2k} + |x|^{-2k})\, t_{\beta,k}(x) \in L^\infty(\mathbb{R}^d)$

for all multi-indices $\beta$ and integers $k \ge 0$ . The exponential transform maps the multiplicative convolution structure on the positive quadrant $Q = (0, \infty)^d$ to the usual additive convolution on a Fréchet space of exponentially decreasing functions, revealing deep connections between these operator classes and classical translation-invariant convolution (Vogt, 2018).

3. Hadamard-Domain Convolution in Neural Networks

Hadamard-domain convolutional layers arise in several neural network constructions:

1D/2D Hadamard Layers: Input tensors are transformed along the channel or spatial axes via FWHT, nonlinearity (e.g., smooth-thresholding or soft-thresholding), then inverse FWHT (Pan et al., 2021, Pan et al., 2022).
Hadamard-Perceptron/HT-Block: In 2D, $\mathcal{H}_{2D}$ is applied across rows and columns, followed by trainable scaling, channel mixing (1×1 convolutions), and soft-thresholding in the transform domain, then inverse $\mathcal{H}_{2D}$ (Pan et al., 2023).
DWHT Pointwise Convolution: Replaces 1×1 learned convolution by a fixed DWHT of the channel vector, with zero learnable parameters and only $O(n \log n)$ additions per spatial location (Jeong et al., 2019).
Block WHT for Arbitrary Channel Sizes: Applies 1D WHTs to blocks of channels with overlapping windows, avoiding wasteful zero padding, and is especially useful when channel counts are not powers of two (Pan et al., 2022).
Comparison to Fourier-Domain Convolution: Analogous to FFT-based acceleration, spatial convolution becomes trivial elementwise multiplication in the transform domain. For Hadamard, the kernels and activations remain real, and the transform involves no multiplications, yielding significant speed and parameter advantages (Crasmaru, 2018).

These approaches are particularly effective when:

Hardware requires low latency and energy (e.g., IoT, edge devices) (Mannam, 2022).
Model compression and runtime are prioritized over marginal accuracy losses.
Multiplication-free operations are essential, as on low-cost DSP, FPGA, or ASIC platforms (Pan et al., 2021, Jeong et al., 2019).

4. Computational Properties and Empirical Performance

Hadamard-domain convolution leverages the following computational features:

Convolution Type	Complexity per Location	Parameters	Multiplies
Standard 1×1 conv	$O(C_{in} C_{out})$	$C_{in} C_{out}$	$O(C_{in} C_{out})$
Hadamard/WHT-based 1D	$O(C \log C)$	$O(C)$ or $0$	$0$ (adds only)
2D WHT-based	$O(HW \log H + HW \log W)$	$O(2^{p+q})$	$0$ (adds only)
FFT-based conv	$O(N^2 \log N)$	$N^2$	$O(N^2 \log N)$

Speed and Memory: For typical MobileNet or ResNet blocks, time per forward pass with WHT layer is up to $24\times$ faster and uses up to $20\%$ less RAM (batch $10\times8\times8\times1024$ ), e.g., $1.886$ s for $3\times3$ conv vs.\ $0.077$ s for 2D WHT (Pan et al., 2022).
Parameter Reduction: Replacing 1/3 of 1×1 convs in MobileNet-V2 yields a $77.8\%$ reduction in trainable parameters with only $1.75$ percentage points accuracy loss (CIFAR-10) (Pan et al., 2022).
Energy Consumption: For kernel size $F \ll N$ , energy per pass for Hadamard method is lower than direct convolution when $N$ not too large, due to vastly reduced multiplies—only the elementwise product in the transform domain requires multiplications (Mannam, 2022).
Empirical Results: On MNIST, Hadamard-domain and classical convolution achieve near-identical test accuracy; on CIFAR-10/100 with multi-channel images, the Hadamard method underperforms by 2–9 points depending on architecture and layer replacement ratio (Mannam, 2022, Pan et al., 2021, Jeong et al., 2019).

5. Variant Constructions and Practical Implementations

Several practical variants and algorithmic improvements have been developed:

Smooth-Thresholding and Denoising: Nonlinearities such as $S_\tau(x) = \tanh(x)(|x|-\tau)_+$ are applied in the transform domain, often with per-coefficient (but not per-connection) learned thresholds (Pan et al., 2021, Pan et al., 2022). These serve as efficient, channel-wise regularizers.
Multiplication-Free Depthwise Convolution: Local block operations, such as $w \oplus x = \mathrm{sign}(w x)(|w|+|x|)$ and related operators, realize depthwise convolution using only sign and addition operations (Pan et al., 2021).
Learned Permutations and Generalized Bases: One can replace the fixed Hadamard basis by learned orthogonal $\{\pm1\}$ matrices for data-dependent transforms (Mannam, 2022).
Quantum Implementation: The Hadamard transform is realizable by Hadamard gates in quantum circuits; hybrid quantum-classical schemes thereby perform some or all of the convolutional transform steps on a quantum processor (Pan et al., 2023).
Time Series Feature Extraction: Columns of Hadamard matrices are used as orthogonal, ±1-valued convolution kernels in the ROCKET feature-extraction framework, yielding state-of-the-art noise-robustness and at least 2× faster training compared to random-kernel approaches (Hao et al., 3 Nov 2025).

6. Extensions, Limitations, and Theoretical Implications

Extensions:

Fractional Calculus: Hadamard convolution appears in Caputo–Hadamard fractional derivative discretization. Convolution quadrature methods for subdiffusion equations are extended to the logarithmic (Hadamard) case, with high-order correction for initial singularities (Yin et al., 2023).
Full Frequency-Domain Networks: Expressing all layers as transform-domain operations (e.g., via DFT or DWT) using DFT/Hadamard convolution theorems allows backpropagation entirely in the frequency domain, with specialized activations and loss (Crasmaru, 2018).

Limitations:

Accuracy-Complexity Tradeoff: On high-dimensional, multi-channel, or highly textured inputs, Hadamard-based methods exhibit increased error relative to learned spatial convolution unless more sophisticated transforms or hybrid approaches are employed (Mannam, 2022, Pan et al., 2021).
Power-of-Two Constraints: Direct (F)WHT requires padding of non-power-of-two channel/spatial sizes, potentially introducing inefficiency or output truncation (Pan et al., 2022).
Factored vs. Locality: Walsh (dyadic) or circulant convolution differs structurally from spatial ( $K \times K$ ) convolution, affecting feature localization—HT-blocks only approximate true spatial relationships (Pan et al., 2023).
Backward Compatibility and Layer Cascade: Cascading multiple Hadamard-domain layers often requires repeated forward/inverse transforms, unless staying exclusively in the transform domain (Mannam, 2022, Crasmaru, 2018). A plausible implication is that end-to-end Hadamard networks may benefit from trainable or hybrid frequency-domain operations.

Theoretical context: In the distributional setting, Hadamard-domain convolution provides a complete classification of all continuous linear operators on $S'(\mathbb{R}^d)$ for which monomials are eigenvectors, revealing connections to both classical analysis and functional-analytic structure (Vogt, 2018).

7. Summary and Significance

Hadamard-domain convolution unifies a broad spectrum of transform-based operations across mathematical analysis, signal processing, neural network architecture, and numerical methods for differential equations. Its defining property—the transformation of convolution or correlation to elementwise operations in the Hadamard basis—enables orders-of-magnitude reductions in memory, energy, and computational complexity, especially in resource-constrained environments. Recent works illustrate both rigorous mathematical classification for operator theory (Vogt, 2018) and diverse algorithmic proposals for deep learning (Pan et al., 2022, Pan et al., 2021, Jeong et al., 2019), low-power vision (Mannam, 2022), quantum-classical hybrid systems (Pan et al., 2023), and robust, efficient time-series feature engineering (Hao et al., 3 Nov 2025). The breadth of methods and empirical successes underline the Hadamard domain's enduring centrality in efficient and theoretically grounded convolutional computation.