Papers
Topics
Authors
Recent
2000 character limit reached

Quantum-inspired Kolmogorov-Arnold LSTM

Updated 12 December 2025
  • The paper presents QKAN-LSTM, an architecture that integrates quantum-inspired variational activation functions into LSTM gates to enhance temporal modeling and reduce parameters by up to 79%.
  • It replaces static activation functions with DARUAN modules that use single-qubit quantum circuits to generate adaptive, frequency-rich representations for improved spectral expressivity.
  • Empirical evaluations on tasks like urban telecommunication forecasting and oscillatory systems demonstrate that QKAN-LSTM (and its hybrid extension) outperforms classical LSTMs with enhanced interpretability and scalability.

The Quantum-inspired Kolmogorov-Arnold Long Short-Term Memory (QKAN-LSTM) is a recurrent neural network (RNN) architecture designed to combine the frequency-adaptive expressivity of quantum-inspired Kolmogorov-Arnold networks with the proven temporal modeling capabilities of LSTM. At its core, QKAN-LSTM introduces Data Re-Uploading ARtive Activation (DARUAN) modules—quantum variational activation functions—into the standard LSTM gating structure, enabling a parameter-efficient, highly expressive representation of complex temporal and nonlinear dependencies. The model retains full compatibility with classical hardware and provides enhanced interpretability, scalability, and performance in real-world sequential modeling tasks (Hsu et al., 4 Dec 2025).

1. Classical LSTM Structure and Limitations

Standard LSTM cells process an input sequence {xt}\{x_t\} using three gates (forget ftf_t, input iti_t, output oto_t) and an internal cell state CtC_t, evolving as follows: \begin{align} &f_t = \sigma\bigl(W_f [h_{t-1}, x_t] + b_f\bigr)\ &i_t = \sigma\bigl(W_i [h_{t-1}, x_t] + b_i\bigr)\ &\tilde{C}t = \tanh\bigl(W_C [h{t-1}, x_t] + b_C\bigr)\ &C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}t\ &o_t = \sigma\bigl(W_o [h{t-1}, x_t] + b_o\bigr)\ &h_t = o_t \odot \tanh(C_t) \end{align} where σ\sigma is the sigmoid function, \odot denotes element-wise multiplication, and [ht1,xt][h_{t-1}, x_t] is the gate input. Conventional LSTMs exhibit high parameter redundancy due to redundant weight matrices (Wf,i,C,oW_{f,i,C,o}) and limited spectral expressivity, as both σ\sigma and tanh\tanh are static nonlinearities. Interpretability with respect to frequency content is also poor.

2. Data Re-Uploading ARtive Activation (DARUAN) and Quantum Variational Activation Functions

The DARUAN module constitutes the quantum variational activation function (QVAF) at the core of QKAN-LSTM. Unlike traditional activations, a DARUAN implements a Kolmogorov–Arnold decomposition using single-qubit data re-uploading, establishing an adaptive spectral basis for activation.

Kolmogorov–Arnold Representation

Classically, a KK-node activation for input xx is: ϕ(x;Θ)=k=1Kak(Θ)σ(bk(Θ)x+ck(Θ))\phi(x;\Theta) = \sum_{k=1}^K a_k(\Theta)\,\sigma(b_k(\Theta)x + c_k(\Theta)) In QKAN-LSTM, each σ\sigma block is replaced by a quantum subfunction ϕp(x;θp)\phi_p(x;\theta_p), which is evaluated as the expected value of an observable after applying a parameterized single-qubit circuit to the input encoding.

Quantum Variational Activation Function Construction

Each QVAF is: ϕp(u;θp)=0U(u;θp)MU(u;θp)0\phi_p(u;\theta_p) = \langle 0|U(u;\theta_p)^\dagger\,M\,U(u;\theta_p)|0\rangle where U(u;θp)U(u;\theta_p) is: U(u;θ)=W(L+1)=1L[exp(i(a()u+b())H/2)W()(θ())]U(u;\theta) = W^{(L+1)} \prod_{\ell=1}^L \left[\exp(-i(a^{(\ell)}u + b^{(\ell)})H/2)\,W^{(\ell)}(\theta^{(\ell)})\right] with HH a Hermitian operator (H=σzH = \sigma_z), MM typically σz\sigma_z, and W()W^{(\ell)} a trainable single-qubit gate. Stacking LL re-uploading blocks introduces expanding frequency content, achieving rich Fourier decompositions.

Parameter updates for the quantum parameters employ the parameter-shift rule: ϕp(u;θ)θk=12[ϕp(u;θ+π2ek)ϕp(u;θπ2ek)]\frac{\partial \phi_p(u;\theta)}{\partial \theta_k} = \frac{1}{2}\left[\phi_p(u;\theta+\frac{\pi}{2}e_k) - \phi_p(u;\theta-\frac{\pi}{2}e_k)\right]

3. Integration of DARUAN (QVAF) in LSTM Gates

QKAN-LSTM replaces each affine transformation and activation in the LSTM gates with a KAN sum of QVAFs: Φg(vt;Θg)=p=1αϕg,p(vt;θg,p)\Phi_g(v_t;\Theta_g) = \sum_{p=1}^\alpha \phi_{g,p}(v_t;\theta_{g,p}) where g{f,i,C,o}g\in\{f,i,C,o\} and vt=[ht1;xt]v_t = [h_{t-1}; x_t].

The QKAN-LSTM updates adopt the canonical form: \begin{align} &f_t = \sigma(\Phi_f(v_t)) \ &i_t = \sigma(\Phi_i(v_t)) \ &\tilde{C}t = \tanh(\Phi_C(v_t)) \ &C_t = f_t \odot C{t-1} + i_t \odot \tilde{C}_t \ &o_t = \sigma(\Phi_o(v_t)) \ &h_t = o_t \odot \tanh(C_t) \end{align} Each gate thus learns adaptive, frequency-rich nonlinearities, enhancing the model’s ability to represent oscillatory and nonlinear dependencies efficiently.

4. Spectral Expressivity and Parameter Efficiency

The spectral expressivity of QKAN-LSTM derives from stacking LL re-uploading blocks, each introducing additional Fourier modes through cos(a()u)\cos(a^{(\ell)}u) and sin(a()u)\sin(a^{(\ell)}u) components. The additive KAN form ensures universal approximation with exponentially many Fourier modes. No multiqubit entanglement is involved, simplifying classical simulation and gradient computation.

Parameter counts on the Urban Telecom task:

Model Classical params Quantum params Total
LSTM 277 277
QLSTM 5 100 105
QKAN-LSTM 26 32 58
HQKAN-LSTM 36 53 89

QKAN-LSTM achieves a 79% reduction in trainable parameters relative to a classical LSTM: $\text{Reduction} = 1 - \frac{58}{277} \approx 0.79\;\text{(79%)}$

5. Empirical Results

QKAN-LSTM and its hybrid extension HQKAN-LSTM have been evaluated on three real and synthetic datasets: Damped Simple Harmonic Motion, Bessel Function (J2(x)J_2(x)), and urban telecommunication forecasting.

Damped SHM Test Performance (Epoch 30):

Model Test MSE R2R^2
LSTM 1.33×1031.33 \times 10^{-3} 0.9701
QLSTM 1.24×1041.24 \times 10^{-4} 0.9972
QKAN-LSTM 1.02×1031.02 \times 10^{-3} 0.9771
HQKAN-LSTM 4.32×1044.32 \times 10^{-4} 0.9903

Bessel J2J_2 Test Performance (Epoch 30):

Model Test MSE R2R^2
LSTM 7.69×1047.69 \times 10^{-4} 0.9673
QLSTM 7.53×1047.53 \times 10^{-4} 0.9679
QKAN-LSTM 3.27×1043.27 \times 10^{-4} 0.9861
HQKAN-LSTM 3.21×1043.21 \times 10^{-4} 0.9863

Urban Telecom (MAE / MSE) across Sequence Lengths:

Model Len 4 Len 8 Len 12 Len 16 Len 32 Len 64
LSTM 1.0633/4.7135 1.0757/4.7011 1.0799/4.6085 1.0914/4.7020 1.1211/4.8381 1.1597/4.8853
QLSTM 1.0322/4.5217 1.0324/4.5307 1.0466/4.5715 1.0456/4.6244 1.0634/4.5953 1.0933/4.7194
QKAN-LSTM 1.0292/4.4377 1.0399/4.5441 1.0443/4.5570 1.0418/4.5485 1.0534/4.5647 1.1103/4.7311
HQKAN-LSTM 1.0045/4.5471 1.0249/4.6166 1.0361/4.5241 1.0189/4.5985 1.0378/4.4970 1.0848/4.6749

In all cases, QKAN-LSTM and HQKAN-LSTM closely match or outperform classical and quantum-enhanced baselines, despite vastly reduced parameter counts.

6. JHCG Network and Hybrid QKAN Extension

The Jiang–Huang–Chen–Goan (JHCG) network generalizes Kolmogorov–Arnold Networks (KAN) to encoder-decoder architectures with a KAN latent processor. The encoding-decoding process is: \begin{align} &\text{Encoder E:RdRpE: \mathbb{R}^d \to \mathbb{R}^p} \ &z' = \sum_{j=1}\alpha \phi_j(z_j;\theta_j),\;\; z = E(x)\in\mathbb{R}p \ &\text{Decoder D:RpRdD: \mathbb{R}^p \to \mathbb{R}^d} \end{align} Replacing the latent KAN with QKAN gives the Hybrid QKAN (HQKAN). The mapping is: x^=D(j=1α0U(zj)MU(zj)0)\hat{x} = D\left(\sum_{j=1}^\alpha \langle 0| U(z_j) M U(z_j)^\dagger |0\rangle\right) HQKAN serves as a drop-in replacement for multilayer perceptrons (MLPs) in deep models, such as Transformers and Diffusion models, providing exponential Fourier expressivity with low parameter requirements.

7. Interpretability, Scalability, and Application Domains

QKAN-LSTM and HQKAN-LSTM architectures benefit from the one-dimensional decomposition of the Kolmogorov-Arnold theorem, leading to interpretable contributions from individual activation subfunctions, each controlled by explicit frequency parameters. The use of single-qubit QVAF modules eliminates the need for multiqubit entanglement, thus each activation admits a closed-form, analytic expression and is fully compatible with classical optimization frameworks (e.g., PyTorch autograd).

Application domains demonstrated include urban telecommunication forecasting, oscillatory physical systems (mechanical, electromagnetic), weather prediction, and anomaly detection—any sequential setting where spectral complexity and low-parameter budgets are required.

QKAN-LSTM thus augments conventional LSTM architectures with quantum-inspired spectral representational power, delivering performance advantages and interpretability in parameter-scarce, frequency-rich sequential modeling scenarios (Hsu et al., 4 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Quantum-inspired Kolmogorov-Arnold Long Short-Term Memory (QKAN-LSTM).