Hadamard-Perceptron (HT-Block) for Efficient Neural Networks

Updated 23 December 2025

Hadamard-Perceptron/HT-Block is a neural network module that replaces standard convolutions using the Walsh–Hadamard Transform to perform linear, nonlinear, and channel-mixing operations.
It employs fast, butterfly-style algorithms with trainable spectral scaling and soft-thresholding, significantly reducing arithmetic complexity, parameter count, and memory footprint.
HT-Blocks have been effectively integrated into CNNs for image classification, medical imaging, and quantum-classical hybrid systems, highlighting improvements in efficiency and hardware acceleration.

Hadamard-Perceptron / HT-Block

A Hadamard-Perceptron, also widely known as an HT-Block, is a neural network module that replaces conventional convolutional layers with operations based primarily on the Walsh–Hadamard Transform (WHT). These blocks leverage the computational efficiency and orthogonality of the Hadamard basis, inserting nonlinearity and trainable domain-specific filtering to realize full perceptronic transformations—linear, nonlinear, and channel-mixing—within the Hadamard domain. Modern HT-Block designs achieve significant reductions in parameter count, arithmetic complexity, and memory footprint relative to classic spatial-domain convolutions, with minimal or no loss of accuracy and in some regimes, accuracy improvements. These blocks have been systematically integrated into CNN architectures for tasks ranging from image classification to complex medical image correction and have also been implemented in quantum-classical hybrid and in-memory compute frameworks (Pan et al., 2022, Pan et al., 2023, Hamdan et al., 27 Sep 2025, Pan et al., 2023, Zhu, 23 Jun 2025).

1. Mathematical Foundations and Transform Architecture

HT-Block design centers on the Walsh–Hadamard Transform, a binary, orthogonal, separable linear transform:

The $N \times N$ Hadamard matrix $H_N$ (for $N=2^m$ ) is constructed recursively:

$H_1 = [1],\qquad H_2 = \begin{bmatrix} 1 & 1 \ 1 & -1 \end{bmatrix},\qquad H_N = \begin{bmatrix} H_{N/2} & H_{N/2} \ H_{N/2} & -H_{N/2} \end{bmatrix}.$

For vector $x\in\mathbb{R}^N$ , the forward transform is $y=H_N x$ , inverse $x=(1/N)H_N y$ .

Extending to tensors, two main structural variants are encountered:

1D (Block) WHT: The channel dimension is partitioned into blocks of size $s=2^k$ . Each block is transformed independently.
2D WHT: Each channel undergoes a 2D WHT via $Y^c=H_N X^c H_N$ (no normalization in forward; $1/N^2$ scaling in inverse).

When input shapes are non-power-of-two, spatial or channel padding and/or blockwise overlap schemes are employed to ensure WHT compatibility (Pan et al., 2022, Hamdan et al., 27 Sep 2025, Zhu, 23 Jun 2025).

The convolution theorem for the WHT—critical for quantum and classical acceleration—states that dyadic convolution in the spatial domain maps to pointwise multiplication in the Hadamard domain, up to index ordering (Pan et al., 2023).

2. Internal Block Structure, Nonlinearities, and Trainable Parameters

A standard HT-Block pipeline comprises:

(Blockwise) forward 1D/2D WHT on input feature maps.
Spectral filtering: Elementwise multiplication in the Hadamard domain with location-specific trainable scaling maps ( $W\in\mathbb{R}^{H\times W}$ or per-block for channel).
Channel mixing: A $1\times 1$ Conv2D (fully connected) operation across transformed channels; in some hardware-aware variants, this is replaced by a multiplication-avoiding (MF) in-memory operator (Hamdan et al., 27 Sep 2025).
Nonlinearity: Soft-thresholding (ST) or semi-soft-thresholding (SST) in the transform domain. Standard ST is $S_\theta(z)=\mathrm{sign}(z)\max(|z|-\theta,0)$ ; advanced variants combine ST with $\tanh$ smoothing for better gradient flow (Pan et al., 2022), or use SST to favor smoother shrinkage of sub-threshold frequencies (Zhu, 23 Jun 2025).
(Optionally, for quantum or classical multi-path expansions) Concatenate or sum over P independent spectral branches/pathways, each with distinct scaling/thresholding.
Inverse 1D/2D WHT to recover the filtered, nonlinearly processed feature map.

Residual configurations (adding input to block output) are common, especially when replacing convolutions in ResNet bottlenecks or before global average pooling (Pan et al., 2023, Hamdan et al., 27 Sep 2025).

Trainable parameters per HT-Block are:

Spectral scaling maps: $N^2\times P$ (spatial frequency channels, with $P$ parallel paths).
Thresholds: $N^2\times P$ (elementwise per-frequency per-path, can be merged with scaling).
Channel mixers: $C^2\times P$ ( $1\times1$ or MF-projection layer). Total parameter count per block is thus $2PN^2 + PC^2$ .

3. Integration into Network Architectures

HT-Blocks have been systematically inserted at multiple points in modern CNNs:

Replacing 1x1 convolutions: Use 1D BWHT blocks for channel-to-channel mapping, especially effective for non-power-of-two channel sizes (Pan et al., 2022).
Replacing 3x3 convolutions and Squeeze-and-Excite (SE): Insert 2D FWHT layers with appropriate scaling and thresholding, matching the input and output channelization. In multi-branch designs (P-path), outputs are summed before inverse WHT (Pan et al., 2023, Pan et al., 2023).
Pre-GAP blocks: Residual HT-Blocks before flattening, providing global spatial mixing to improve classification accuracy (empirically +0.3–0.5 pp) (Pan et al., 2022).
Encoder/decoder integration in U-Nets: Channelwise 2D-HT-based blocks in encoder for frequency decomposition, inverse-HT transformer blocks in decoder for global, frequency-domain self-attention (Zhu, 23 Jun 2025).
Quantum-classical hybrid pipelines: Run HT on quantum hardware via Hadamard gates, perform soft-threshold scaling and classical channel mixing afterwards. These designs leverage the dyadic convolution theorem for efficient quantum execution (Pan et al., 2023).

Practical integration strategies involve replacing only a subset (e.g., every second) of the convolutions in the network, placing a single residual HT-Block before dense classifiers, and selecting $P$ , block sizes, and scaling/threshold initializations to maximize parameter and computation savings while minimizing accuracy loss (Pan et al., 2022, Pan et al., 2023).

4. Computational and Memory Complexity

HT-Blocks exhibit favorable theoretical and empirical computational profiles:

Arithmetic complexity: Each 2D HT block (per channel/path) requires $O(N^2\log N)$ additions, replacing $K^2N^2C^2$ multiplies in a $K\times K$ Conv2D with $N^2 C^2$ (plus $N^2 C$ for scaling/thresholding), yielding up to 66% multiply count savings for $C\gg N$ (Pan et al., 2023, Pan et al., 2023).
Parameter efficiency: 3-path HT-Blocks (commonly $P=3$ ) have $6N^2+3C^2$ vs. $9C^2$ for a 3x3 Conv2D block, so are advantageous when $N^2\ll C^2$ , and can reduce parameters up to 77.8% in classification models (Pan et al., 2022).
Memory requirements: 2D-FWHT blocks have lower peak RAM usage; for example, on Jetson Nano, 2D-FWHT peaks at 1.45 GB vs. 1.49 GB for 3x3 conv (Pan et al., 2022).
Inference throughput: On embedded hardware, 2D-FWHT layers provide order-of-magnitude acceleration (up to 24x) compared to standard convolutions (Pan et al., 2022).
Multiplication-avoiding blocks: Designs using SRAM-based in-memory compute eliminate all explicit multiplications in $1\times 1$ channel mixing, replacing each MAC with two additions plus sign logic, further reducing total arithmetic requirements by over 50% in hybrid HTMA-Nets (HT + MF) (Hamdan et al., 27 Sep 2025).

5. Empirical Performance and Use-Cases

HT-Blocks have demonstrated competitive or superior results to baseline CNNs across image recognition, quantum-classical computing, and medical imaging:

Architecture / Task	Params Reduced	Multiplies Reduced	Accuracy Impact	Reference
MobileNet-V2 (CIFAR-10, HT-Block)	77.8% ↓	—	–1.75 pp	(Pan et al., 2022)
MobileNet-V3-L (CIFAR-100)	48.6% ↓	—	–0.33 pp	(Pan et al., 2022)
ResNet-20 (CIFAR-10, partial HT)	51.3% ↓	—	–1.48 pp	(Pan et al., 2022)
ResNet-50 (ImageNet, 3-path)	11.5% ↓	12.6% ↓	+0.30 pp	(Pan et al., 2023)
ResNet-18 (CIFAR-10, HTMA-Net)	11.1 → 9.86M (↓11%)	555M → 254M (↓54.4%)	–0.02 pp	(Hamdan et al., 27 Sep 2025)
VHU-Net (MRI, HT blocks)	N/A	N/A	SOTA bias correction	(Zhu, 23 Jun 2025)

The inclusion of nonlinearity (ST or SST) in the Hadamard domain is essential; simple ReLU severely underperforms. Weighted thresholds provide further minor gains. In quantum pipelines, the HT implementation is native and efficient, leveraging the Hadamard gate's action on tensor-product Hilbert spaces (Pan et al., 2023).

In medical imaging, stacking ConvHTBlocks with semi-soft thresholding suppresses high-frequency bias artifacts during encoding, while inverse-HT transformer blocks in the decoder enable global feature recovery with frequency-aware attention, resulting in superior MRI bias correction (Zhu, 23 Jun 2025).

6. Hardware Acceleration and In-Memory Computing

HT-Blocks are naturally suited to both software and hardware acceleration:

Fast transform implementation: The WHT admits a butterfly-style fast algorithm, requiring only additions and subtractions. This property is key to both CPU and embedded inference speedups (Pan et al., 2022).
Multiplication-avoiding (MA) compute: In HTMA-Nets, all channel mixing in the Hadamard domain is performed via SRAM-based in-memory addition, using MF-operators $w^T \odot_{\mathrm{MF}} x = \sum_k [\mathrm{sign}(x_k)|w_k| + \mathrm{sign}(w_k)|x_k|]$ , eliminating multiplicative logic (Hamdan et al., 27 Sep 2025). This approach yields up to 52% reduction in multiplies with negligible accuracy impact.
Quantum processing: HT can be implemented on quantum hardware at $O(1)$ circuit depth per patch; all power-of-two spatial blocks are efficiently mapped to quantum circuits via $H^{\otimes m}$ gates (Pan et al., 2023).

A plausible implication is that HT-Block–based designs will see increasing hardware specialization, especially for edge and energy-constrained platforms, due to the transform’s binary structure and compatibility with in-memory architectures.

7. Advanced Variants and Domain-Specific Extensions

Recent work has generalized HT-Block design in several directions:

Hybrid architectures (e.g., ConvHTBlock in VHU-Net): Interleaving convolutional, spectral (Hadamard), and transformer blocks to combine local and global, spatial and spectral, and channel and token interactions (Zhu, 23 Jun 2025).
Variational and probabilistic objectives: ELBO-based objectives, imposing sparsity on Hadamard coefficients in medical applications for robust bias removal (Zhu, 23 Jun 2025).
Quantum–classical hybrid models: HT layers run on quantum circuits, enabling dyadic convolution acceleration (Pan et al., 2023).

The Hadamard-Perceptron paradigm continues to gain traction due to its analytical tractability, hardware synergy, and empirical efficiency. Empirical evidence consistently shows that careful integration of HT-Blocks—controlling for block size, branch count, and level of replacement—yields strong accuracy–efficiency trade-offs in both standard and application-specific CNNs.

References

(Pan et al., 2022) Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural Networks
(Pan et al., 2023) Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets
(Hamdan et al., 27 Sep 2025) HTMA-Net: Towards Multiplication-Avoiding Neural Networks via Hadamard Transform and In-Memory Computing
(Pan et al., 2023) A Hybrid Quantum-Classical Approach based on the Hadamard Transform for the Convolutional Layer
(Zhu, 23 Jun 2025) VHU-Net: Variational Hadamard U-Net for Body MRI Bias Field Correction