Hybrid QKAN: Quantum-Inspired KANs

Updated 12 December 2025

Hybrid QKAN (HQKAN) is a hybrid neural architecture that combines quantum variational activations with classical Kolmogorov–Arnold Networks to achieve exponential expressivity per parameter.
It utilizes single-qubit DARUAN modules and latent bottlenecking to significantly reduce parameter counts while supporting scalable architectures like Transformers and LSTMs.
HQKAN offers tractable gradient-based training through analytic gradients and parameter-shift rules, ensuring compatibility with both classical simulations and near-term quantum devices.

Hybrid Quantum-inspired Kolmogorov–Arnold Networks (HQKANs) are a class of hybrid neural architectures that integrate quantum variational activation functions—realized as single-qubit Data Re-Uploading Activation (DARUAN) modules—within classical Kolmogorov–Arnold Networks (KANs). This hybridization aims to combine quantum-inspired spectral expressivity with classical scalability, yielding models that achieve exponential expressivity per parameter, parameter efficiency, and tractable training even at large scale. The HQKAN framework subsumes feedforward, hierarchical, and recurrent architectures, and is implementable on both classical hardware (via analytic simulation) and near-term quantum devices (Jiang et al., 17 Sep 2025, Hsu et al., 4 Dec 2025, Werner et al., 27 Jun 2025).

1. Mathematical Foundation and Motivation

The Kolmogorov–Arnold representation theorem establishes that any continuous multivariate function $f: [0,1]^d\to\mathbb{R}$ can be decomposed into a sum of univariate functions. Classical KANs operationalize this by representing layer outputs as

$\Phi_{\rm KAN}(x) = \sum_{p=1}^\alpha \varphi_p(w_p\cdot x + b_p)$

where each $\varphi_p$ is a learnable univariate activation.

Quantum extensions exploit parameterized variational quantum circuits (VQCs) to realize quantum variational activation functions (QVAFs), which exponentially expand the frequency spectrum accessible for function approximation. The single-qubit Data Re-Uploading Activation (DARUAN) circuit takes a scalar $x$ and, via repeated parameterized rotations and data re-encoding, outputs the expectation value

$f_{\rm QVAF}(x;\theta,\omega) = \langle 0| U(x;\theta,\omega)^\dagger Z U(x;\theta,\omega) |0\rangle$

for a parameterized unitary $U(x;\theta,\omega)$ . With trainable Fourier frequencies $\omega_\ell$ and depths $r$ , the spectrum of representable frequencies grows as $|\Omega_B|\leq 3^r-1$ (with frequencies $\sum_\ell m_\ell \omega_\ell$ , $m_\ell\in\{-1,0,1\}$ ). Thus, for target error $\varepsilon$ , only $O(\log(1/\varepsilon))$ circuit parameters are required, representing an exponential reduction relative to classical Fourier activations (Jiang et al., 17 Sep 2025).

Naïve QKAN layers embedded directly in MLP-scale architectures face impractically high parameter counts (scaling quadratically in input/output dimension). HQKAN addresses this via dimensionality bottlenecking combined with a quantum-inspired latent core, maintaining exponential expressivity while drastically reducing total parameterization.

2. HQKAN Architectural Design

The core HQKAN module inserts a low-dimensional (latent) bottleneck between two classical linear maps, sandwiching a compact QKAN (QVAF) core. For an input dimension $D_{\rm in}$ , output $D_{\rm out}$ , and latent dimension $d_\text{latent}\ll D_{\rm in},D_{\rm out}$ , a single HQKAN layer is:

Down-projection: $z = A x$ , $A\in\mathbb{R}^{d_\text{latent}\times D_\text{in}}$
QKAN core: apply QVAF to each $z_i$ via DARUAN: $\phi_i=f_\text{QVAF}(z_i;\theta_Q^i,\omega_Q^i)$
Up-projection: $y = B \phi$ , $B\in\mathbb{R}^{D_\text{out}\times d_\text{latent}}$

Pseudocode:

z = A @ x                      # Down-project
φ = [QVAF_DARUAN(z_i, θ_Q[i], ω_Q[i]) for i in range(d_latent)]
y = B @ φ                      # Up-project
return y

This pattern generalizes to replacing any MLP or conventional activation module with a HQKAN block, allowing drop-in deployment in deep networks, Transformers, ResNets, and LSTMs (Jiang et al., 17 Sep 2025, Hsu et al., 4 Dec 2025).

For hierarchical and sequence models, the HQKAN latent core can be wrapped within encoder–decoder structures (JHCG Net), supporting stacked or modular abstraction. In recurrent settings such as HQKAN-LSTM, each LSTM gate’s affine map is replaced by a sum of QVAF modules, with optional latent bottlenecking for further compression (Hsu et al., 4 Dec 2025).

3. Training, Initialization, and Optimization

HQKAN models are optimized using standard gradient-based routines (e.g., Adam, L-BFGS), with gradients for quantum circuit parameters obtained via the parameter-shift rule: $\frac{\partial f_{\rm QVAF}}{\partial\theta_k} = \frac{1}{2}\big[ f_{\rm QVAF}(\theta_k+\frac{\pi}{2}) - f_{\rm QVAF}(\theta_k-\frac{\pi}{2}) \big]$ When simulated classically, QVAF operations can leverage closed-form analytic gradients, yielding compatibility with PyTorch-style autodiff (Hsu et al., 4 Dec 2025). The classical maps $A,B$ are initialized via Glorot procedures; quantum circuit angles are initialized near identity (small-angle rotations) to mitigate barren plateaus; QVAF encoding weights $\omega_\ell$ may be set geometrically ( $\omega_\ell=2^{\ell-1}$ ) or trained.

Losses are task-specific—MSE for regression, cross-entropy for classification, perplexity for autoregressive language modeling. HQKANs offer favorable parameter complexity:

HQKAN core: $O(d_\text{latent}^2 r)$ parameters (per QVAF depth $r$ )
Classical maps: $O(d_\text{latent}(D_\text{in}+D_\text{out}))$ When $d_\text{latent}\ll D_\text{in},D_\text{out}$ , total parameters are significantly reduced compared to MLPs of similar output dimension.

4. Theoretical Properties and Guarantees

Exponential spectrum expansion: Given geometric weights $\omega_\ell=2^{\ell-1}$ in a depth- $r$ QVAF, the accessible frequencies in the activation spectrum fill the integer interval $[-(2^r-1),(2^r-1)]$ (Jiang et al., 17 Sep 2025).
Parameter efficiency: For $f\in C^{k+1}$ and error target $\epsilon$ in $C^m$ -norm, a QVAF(DARUAN) with depth $r=O(\log(1/\epsilon))$ suffices, yielding logarithmic scaling in total quantum parameters, exponentially reducing parameter count compared to Fourier-based KANs (Jiang et al., 17 Sep 2025).
Hierarchical abstraction and interpretability: HQKANs embedded in encoder–decoder stacks admit explicit frequency-resolved decomposition per layer; in LSTM variants, gate subfunctions can be interpreted as tracking distinct frequency bands (Hsu et al., 4 Dec 2025).
NISQ-compliance and classical simulability: Single-qubit (non-entangling) circuits allow efficient classical emulation, facilitating deployment on classical hardware or as a backend for future NISQ/QPU hardware (Jiang et al., 17 Sep 2025, Hsu et al., 4 Dec 2025).

5. Empirical Results and Benchmarks

Qualitative and quantitative performance of HQKANs is established via a broad set of benchmarks:

Task/Model	HQKAN	KAN	MLP	QKAN
Regression (RMSE, Feynman 66)	Lowest RMSE, ~30% fewer params	Lower RMSE than MLP	–	–
Image Classification (CIFAR-10 Top-1, Top-5)	71.6%, 97.9% (14,370 params)	–	68.4%, 97.4% (41,802 params)	68.8%, 97.0% (21,280 params)
Language Modeling (GPT-2 Ppl@41M params)	Lower perplexity, faster convergence	–	Ppl@124M params, same training time	–
Sequential (LSTM Cell, Damped SHM MAE/MSE/R²⁾	Test MSE 4.32e-4 / R² 0.9903	1.33e-3 / 0.9701	–	1.02e-3 / 0.9771

These results consistently show HQKANs either outperforming or matching conventional baselines while utilizing significantly lower parameter counts. Scalability is empirically demonstrated: HQKAN-based models (e.g., KANsformer with Flash-attention) maintain memory usage and iteration times comparable to, or better than, MLP-type architectures when scaling to large batch sizes and distributed hardware (Jiang et al., 17 Sep 2025).

6. Extensions: Hierarchical, Recurrent, and Quantum-Classical Hybrids

Hierarchical HQKAN/JHCG Net: Implementation of HQKAN as the latent core of encoder–decoder (autoencoder) structures yields stacked abstraction and improved interpretability. Each latent block acts as a KAN with QVAF activations, enabling smooth, additive decompositions that decompose the feature spectrum (Hsu et al., 4 Dec 2025).
HQKAN-LSTM: Embedding HQKAN gates inside recurrent LSTM structures replaces affine transformations in input, forget, output, and update gates by KAN-style sums of QVAF modules. This achieves lower parameter counts and better R², particularly in regimes dominated by nonlinear temporal dependencies (Hsu et al., 4 Dec 2025).
Variants (QuKAN, EVQKAN): Related frameworks include QuKAN (Werner et al., 27 Jun 2025), which utilizes quantum circuit Born machines for spline-based KAN residuals within classical networks, and EVQKAN (Wakaura et al., 28 Mar 2025), which constructs variational quantum networks with layerwise tiling for neuromorphic quantum applications.

7. Interpretability, Scalability, and Hardware Deployment

HQKANs are explicitly designed to maintain the interpretability advantage of KANs: each QVAF activation corresponds to a smooth Fourier-enriched subfunction whose spectral content can be analyzed. Additive decomposition permits frequency tracking per layer, and parameter mapping between quantum amplitudes and classical spline coefficients (Werner et al., 27 Jun 2025, Hsu et al., 4 Dec 2025).

Scalability is inherent—non-entangling single-qubit circuits have negligible simulation overhead and are easily vectorized on GPUs. All operations are compatible with classical autodiff frameworks, and swapping in hardware quantum backends (for true quantum estimation) is straightforward at the single-qubit layer level (Jiang et al., 17 Sep 2025, Hsu et al., 4 Dec 2025).

Overall, HQKANs unify the Kolmogorov–Arnold functional decomposition, quantum-inspired variational spectral enrichment, and modern autoencoding hierarchies into a modular and efficient family of architectures. They achieve superior expressivity-per-parameter, reduction in memory and computational requirements, while offering interpretability and extensibility to sequential, hierarchical, and large-scale settings (Jiang et al., 17 Sep 2025, Hsu et al., 4 Dec 2025, Werner et al., 27 Jun 2025).