Quantum-inspired Kolmogorov-Arnold LSTM

Updated 12 December 2025

The paper presents QKAN-LSTM, an architecture that integrates quantum-inspired variational activation functions into LSTM gates to enhance temporal modeling and reduce parameters by up to 79%.
It replaces static activation functions with DARUAN modules that use single-qubit quantum circuits to generate adaptive, frequency-rich representations for improved spectral expressivity.
Empirical evaluations on tasks like urban telecommunication forecasting and oscillatory systems demonstrate that QKAN-LSTM (and its hybrid extension) outperforms classical LSTMs with enhanced interpretability and scalability.

The Quantum-inspired Kolmogorov-Arnold Long Short-Term Memory (QKAN-LSTM) is a recurrent neural network (RNN) architecture designed to combine the frequency-adaptive expressivity of quantum-inspired Kolmogorov-Arnold networks with the proven temporal modeling capabilities of LSTM. At its core, QKAN-LSTM introduces Data Re-Uploading ARtive Activation (DARUAN) modules—quantum variational activation functions—into the standard LSTM gating structure, enabling a parameter-efficient, highly expressive representation of complex temporal and nonlinear dependencies. The model retains full compatibility with classical hardware and provides enhanced interpretability, scalability, and performance in real-world sequential modeling tasks (Hsu et al., 4 Dec 2025).

1. Classical LSTM Structure and Limitations

Standard LSTM cells process an input sequence $\{x_t\}$ using three gates (forget $f_t$ , input $i_t$ , output $o_t$ ) and an internal cell state $C_t$ , evolving as follows: \begin{align} &f_t = \sigma\bigl(W_f [h_{t-1}, x_t] + b_f\bigr)\ &i_t = \sigma\bigl(W_i [h_{t-1}, x_t] + b_i\bigr)\ &\tilde{C}t = \tanh\bigl(W_C [h{t-1}, x_t] + b_C\bigr)\ &C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}t\ &o_t = \sigma\bigl(W_o [h{t-1}, x_t] + b_o\bigr)\ &h_t = o_t \odot \tanh(C_t) \end{align} where $\sigma$ is the sigmoid function, $\odot$ denotes element-wise multiplication, and $[h_{t-1}, x_t]$ is the gate input. Conventional LSTMs exhibit high parameter redundancy due to redundant weight matrices ( $W_{f,i,C,o}$ ) and limited spectral expressivity, as both $\sigma$ and $\tanh$ are static nonlinearities. Interpretability with respect to frequency content is also poor.

2. Data Re-Uploading ARtive Activation (DARUAN) and Quantum Variational Activation Functions

The DARUAN module constitutes the quantum variational activation function (QVAF) at the core of QKAN-LSTM. Unlike traditional activations, a DARUAN implements a Kolmogorov–Arnold decomposition using single-qubit data re-uploading, establishing an adaptive spectral basis for activation.

Kolmogorov–Arnold Representation

Classically, a $K$ -node activation for input $x$ is: $\phi(x;\Theta) = \sum_{k=1}^K a_k(\Theta)\,\sigma(b_k(\Theta)x + c_k(\Theta))$ In QKAN-LSTM, each $\sigma$ block is replaced by a quantum subfunction $\phi_p(x;\theta_p)$ , which is evaluated as the expected value of an observable after applying a parameterized single-qubit circuit to the input encoding.

Quantum Variational Activation Function Construction

Each QVAF is: $\phi_p(u;\theta_p) = \langle 0|U(u;\theta_p)^\dagger\,M\,U(u;\theta_p)|0\rangle$ where $U(u;\theta_p)$ is: $U(u;\theta) = W^{(L+1)} \prod_{\ell=1}^L \left[\exp(-i(a^{(\ell)}u + b^{(\ell)})H/2)\,W^{(\ell)}(\theta^{(\ell)})\right]$ with $H$ a Hermitian operator ( $H = \sigma_z$ ), $M$ typically $\sigma_z$ , and $W^{(\ell)}$ a trainable single-qubit gate. Stacking $L$ re-uploading blocks introduces expanding frequency content, achieving rich Fourier decompositions.

Parameter updates for the quantum parameters employ the parameter-shift rule: $\frac{\partial \phi_p(u;\theta)}{\partial \theta_k} = \frac{1}{2}\left[\phi_p(u;\theta+\frac{\pi}{2}e_k) - \phi_p(u;\theta-\frac{\pi}{2}e_k)\right]$

3. Integration of DARUAN (QVAF) in LSTM Gates

QKAN-LSTM replaces each affine transformation and activation in the LSTM gates with a KAN sum of QVAFs: $\Phi_g(v_t;\Theta_g) = \sum_{p=1}^\alpha \phi_{g,p}(v_t;\theta_{g,p})$ where $g\in\{f,i,C,o\}$ and $v_t = [h_{t-1}; x_t]$ .

The QKAN-LSTM updates adopt the canonical form: \begin{align} &f_t = \sigma(\Phi_f(v_t)) \ &i_t = \sigma(\Phi_i(v_t)) \ &\tilde{C}t = \tanh(\Phi_C(v_t)) \ &C_t = f_t \odot C{t-1} + i_t \odot \tilde{C}_t \ &o_t = \sigma(\Phi_o(v_t)) \ &h_t = o_t \odot \tanh(C_t) \end{align} Each gate thus learns adaptive, frequency-rich nonlinearities, enhancing the model’s ability to represent oscillatory and nonlinear dependencies efficiently.

4. Spectral Expressivity and Parameter Efficiency

The spectral expressivity of QKAN-LSTM derives from stacking $L$ re-uploading blocks, each introducing additional Fourier modes through $\cos(a^{(\ell)}u)$ and $\sin(a^{(\ell)}u)$ components. The additive KAN form ensures universal approximation with exponentially many Fourier modes. No multiqubit entanglement is involved, simplifying classical simulation and gradient computation.

Parameter counts on the Urban Telecom task:

Model	Classical params	Quantum params	Total
LSTM	277	—	277
QLSTM	5	100	105
QKAN-LSTM	26	32	58
HQKAN-LSTM	36	53	89

QKAN-LSTM achieves a 79% reduction in trainable parameters relative to a classical LSTM: $\text{Reduction} = 1 - \frac{58}{277} \approx 0.79\;\text{(79%)}$

5. Empirical Results

QKAN-LSTM and its hybrid extension HQKAN-LSTM have been evaluated on three real and synthetic datasets: Damped Simple Harmonic Motion, Bessel Function ( $J_2(x)$ ), and urban telecommunication forecasting.

Damped SHM Test Performance (Epoch 30):

Model	Test MSE	$R^2$
LSTM	$1.33 \times 10^{-3}$	0.9701
QLSTM	$1.24 \times 10^{-4}$	0.9972
QKAN-LSTM	$1.02 \times 10^{-3}$	0.9771
HQKAN-LSTM	$4.32 \times 10^{-4}$	0.9903

Bessel $J_2$ Test Performance (Epoch 30):

Model	Test MSE	$R^2$
LSTM	$7.69 \times 10^{-4}$	0.9673
QLSTM	$7.53 \times 10^{-4}$	0.9679
QKAN-LSTM	$3.27 \times 10^{-4}$	0.9861
HQKAN-LSTM	$3.21 \times 10^{-4}$	0.9863

Urban Telecom (MAE / MSE) across Sequence Lengths:

Model	Len 4	Len 8	Len 12	Len 16	Len 32	Len 64
LSTM	1.0633/4.7135	1.0757/4.7011	1.0799/4.6085	1.0914/4.7020	1.1211/4.8381	1.1597/4.8853
QLSTM	1.0322/4.5217	1.0324/4.5307	1.0466/4.5715	1.0456/4.6244	1.0634/4.5953	1.0933/4.7194
QKAN-LSTM	1.0292/4.4377	1.0399/4.5441	1.0443/4.5570	1.0418/4.5485	1.0534/4.5647	1.1103/4.7311
HQKAN-LSTM	1.0045/4.5471	1.0249/4.6166	1.0361/4.5241	1.0189/4.5985	1.0378/4.4970	1.0848/4.6749

In all cases, QKAN-LSTM and HQKAN-LSTM closely match or outperform classical and quantum-enhanced baselines, despite vastly reduced parameter counts.

6. JHCG Network and Hybrid QKAN Extension

The Jiang–Huang–Chen–Goan (JHCG) network generalizes Kolmogorov–Arnold Networks (KAN) to encoder-decoder architectures with a KAN latent processor. The encoding-decoding process is: \begin{align} &\text{Encoder $E: \mathbb{R}^d \to \mathbb{R}^p$ } \ &z' = \sum_{j=1}^\alpha \phi_j(z_j;\theta_j),\;\; z = E(x)\in\mathbb{R}^p \ &\text{Decoder $D: \mathbb{R}^p \to \mathbb{R}^d$ } \end{align} Replacing the latent KAN with QKAN gives the Hybrid QKAN (HQKAN). The mapping is: $\hat{x} = D\left(\sum_{j=1}^\alpha \langle 0| U(z_j) M U(z_j)^\dagger |0\rangle\right)$ HQKAN serves as a drop-in replacement for multilayer perceptrons (MLPs) in deep models, such as Transformers and Diffusion models, providing exponential Fourier expressivity with low parameter requirements.

7. Interpretability, Scalability, and Application Domains

QKAN-LSTM and HQKAN-LSTM architectures benefit from the one-dimensional decomposition of the Kolmogorov-Arnold theorem, leading to interpretable contributions from individual activation subfunctions, each controlled by explicit frequency parameters. The use of single-qubit QVAF modules eliminates the need for multiqubit entanglement, thus each activation admits a closed-form, analytic expression and is fully compatible with classical optimization frameworks (e.g., PyTorch autograd).

Application domains demonstrated include urban telecommunication forecasting, oscillatory physical systems (mechanical, electromagnetic), weather prediction, and anomaly detection—any sequential setting where spectral complexity and low-parameter budgets are required.

QKAN-LSTM thus augments conventional LSTM architectures with quantum-inspired spectral representational power, delivering performance advantages and interpretability in parameter-scarce, frequency-rich sequential modeling scenarios (Hsu et al., 4 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

QKAN-LSTM: Quantum-inspired Kolmogorov-Arnold Long Short-term Memory (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Quantum-inspired Kolmogorov-Arnold Long Short-Term Memory (QKAN-LSTM).