Quantum-inspired Kolmogorov-Arnold LSTM
- The paper presents QKAN-LSTM, an architecture that integrates quantum-inspired variational activation functions into LSTM gates to enhance temporal modeling and reduce parameters by up to 79%.
- It replaces static activation functions with DARUAN modules that use single-qubit quantum circuits to generate adaptive, frequency-rich representations for improved spectral expressivity.
- Empirical evaluations on tasks like urban telecommunication forecasting and oscillatory systems demonstrate that QKAN-LSTM (and its hybrid extension) outperforms classical LSTMs with enhanced interpretability and scalability.
The Quantum-inspired Kolmogorov-Arnold Long Short-Term Memory (QKAN-LSTM) is a recurrent neural network (RNN) architecture designed to combine the frequency-adaptive expressivity of quantum-inspired Kolmogorov-Arnold networks with the proven temporal modeling capabilities of LSTM. At its core, QKAN-LSTM introduces Data Re-Uploading ARtive Activation (DARUAN) modules—quantum variational activation functions—into the standard LSTM gating structure, enabling a parameter-efficient, highly expressive representation of complex temporal and nonlinear dependencies. The model retains full compatibility with classical hardware and provides enhanced interpretability, scalability, and performance in real-world sequential modeling tasks (Hsu et al., 4 Dec 2025).
1. Classical LSTM Structure and Limitations
Standard LSTM cells process an input sequence using three gates (forget , input , output ) and an internal cell state , evolving as follows: \begin{align} &f_t = \sigma\bigl(W_f [h_{t-1}, x_t] + b_f\bigr)\ &i_t = \sigma\bigl(W_i [h_{t-1}, x_t] + b_i\bigr)\ &\tilde{C}t = \tanh\bigl(W_C [h{t-1}, x_t] + b_C\bigr)\ &C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}t\ &o_t = \sigma\bigl(W_o [h{t-1}, x_t] + b_o\bigr)\ &h_t = o_t \odot \tanh(C_t) \end{align} where is the sigmoid function, denotes element-wise multiplication, and is the gate input. Conventional LSTMs exhibit high parameter redundancy due to redundant weight matrices () and limited spectral expressivity, as both and are static nonlinearities. Interpretability with respect to frequency content is also poor.
2. Data Re-Uploading ARtive Activation (DARUAN) and Quantum Variational Activation Functions
The DARUAN module constitutes the quantum variational activation function (QVAF) at the core of QKAN-LSTM. Unlike traditional activations, a DARUAN implements a Kolmogorov–Arnold decomposition using single-qubit data re-uploading, establishing an adaptive spectral basis for activation.
Kolmogorov–Arnold Representation
Classically, a -node activation for input is: In QKAN-LSTM, each block is replaced by a quantum subfunction , which is evaluated as the expected value of an observable after applying a parameterized single-qubit circuit to the input encoding.
Quantum Variational Activation Function Construction
Each QVAF is: where is: with a Hermitian operator (), typically , and a trainable single-qubit gate. Stacking re-uploading blocks introduces expanding frequency content, achieving rich Fourier decompositions.
Parameter updates for the quantum parameters employ the parameter-shift rule:
3. Integration of DARUAN (QVAF) in LSTM Gates
QKAN-LSTM replaces each affine transformation and activation in the LSTM gates with a KAN sum of QVAFs: where and .
The QKAN-LSTM updates adopt the canonical form: \begin{align} &f_t = \sigma(\Phi_f(v_t)) \ &i_t = \sigma(\Phi_i(v_t)) \ &\tilde{C}t = \tanh(\Phi_C(v_t)) \ &C_t = f_t \odot C{t-1} + i_t \odot \tilde{C}_t \ &o_t = \sigma(\Phi_o(v_t)) \ &h_t = o_t \odot \tanh(C_t) \end{align} Each gate thus learns adaptive, frequency-rich nonlinearities, enhancing the model’s ability to represent oscillatory and nonlinear dependencies efficiently.
4. Spectral Expressivity and Parameter Efficiency
The spectral expressivity of QKAN-LSTM derives from stacking re-uploading blocks, each introducing additional Fourier modes through and components. The additive KAN form ensures universal approximation with exponentially many Fourier modes. No multiqubit entanglement is involved, simplifying classical simulation and gradient computation.
Parameter counts on the Urban Telecom task:
| Model | Classical params | Quantum params | Total |
|---|---|---|---|
| LSTM | 277 | — | 277 |
| QLSTM | 5 | 100 | 105 |
| QKAN-LSTM | 26 | 32 | 58 |
| HQKAN-LSTM | 36 | 53 | 89 |
QKAN-LSTM achieves a 79% reduction in trainable parameters relative to a classical LSTM: $\text{Reduction} = 1 - \frac{58}{277} \approx 0.79\;\text{(79%)}$
5. Empirical Results
QKAN-LSTM and its hybrid extension HQKAN-LSTM have been evaluated on three real and synthetic datasets: Damped Simple Harmonic Motion, Bessel Function (), and urban telecommunication forecasting.
Damped SHM Test Performance (Epoch 30):
| Model | Test MSE | |
|---|---|---|
| LSTM | 0.9701 | |
| QLSTM | 0.9972 | |
| QKAN-LSTM | 0.9771 | |
| HQKAN-LSTM | 0.9903 |
Bessel Test Performance (Epoch 30):
| Model | Test MSE | |
|---|---|---|
| LSTM | 0.9673 | |
| QLSTM | 0.9679 | |
| QKAN-LSTM | 0.9861 | |
| HQKAN-LSTM | 0.9863 |
Urban Telecom (MAE / MSE) across Sequence Lengths:
| Model | Len 4 | Len 8 | Len 12 | Len 16 | Len 32 | Len 64 |
|---|---|---|---|---|---|---|
| LSTM | 1.0633/4.7135 | 1.0757/4.7011 | 1.0799/4.6085 | 1.0914/4.7020 | 1.1211/4.8381 | 1.1597/4.8853 |
| QLSTM | 1.0322/4.5217 | 1.0324/4.5307 | 1.0466/4.5715 | 1.0456/4.6244 | 1.0634/4.5953 | 1.0933/4.7194 |
| QKAN-LSTM | 1.0292/4.4377 | 1.0399/4.5441 | 1.0443/4.5570 | 1.0418/4.5485 | 1.0534/4.5647 | 1.1103/4.7311 |
| HQKAN-LSTM | 1.0045/4.5471 | 1.0249/4.6166 | 1.0361/4.5241 | 1.0189/4.5985 | 1.0378/4.4970 | 1.0848/4.6749 |
In all cases, QKAN-LSTM and HQKAN-LSTM closely match or outperform classical and quantum-enhanced baselines, despite vastly reduced parameter counts.
6. JHCG Network and Hybrid QKAN Extension
The Jiang–Huang–Chen–Goan (JHCG) network generalizes Kolmogorov–Arnold Networks (KAN) to encoder-decoder architectures with a KAN latent processor. The encoding-decoding process is: \begin{align} &\text{Encoder } \ &z' = \sum_{j=1}\alpha \phi_j(z_j;\theta_j),\;\; z = E(x)\in\mathbb{R}p \ &\text{Decoder } \end{align} Replacing the latent KAN with QKAN gives the Hybrid QKAN (HQKAN). The mapping is: HQKAN serves as a drop-in replacement for multilayer perceptrons (MLPs) in deep models, such as Transformers and Diffusion models, providing exponential Fourier expressivity with low parameter requirements.
7. Interpretability, Scalability, and Application Domains
QKAN-LSTM and HQKAN-LSTM architectures benefit from the one-dimensional decomposition of the Kolmogorov-Arnold theorem, leading to interpretable contributions from individual activation subfunctions, each controlled by explicit frequency parameters. The use of single-qubit QVAF modules eliminates the need for multiqubit entanglement, thus each activation admits a closed-form, analytic expression and is fully compatible with classical optimization frameworks (e.g., PyTorch autograd).
Application domains demonstrated include urban telecommunication forecasting, oscillatory physical systems (mechanical, electromagnetic), weather prediction, and anomaly detection—any sequential setting where spectral complexity and low-parameter budgets are required.
QKAN-LSTM thus augments conventional LSTM architectures with quantum-inspired spectral representational power, delivering performance advantages and interpretability in parameter-scarce, frequency-rich sequential modeling scenarios (Hsu et al., 4 Dec 2025).