Papers
Topics
Authors
Recent
2000 character limit reached

LSTM-Enhanced Deep Koopman Model

Updated 13 November 2025
  • The paper introduces a dictionary-free LSTM-enhanced deep Koopman model that learns linear latent representations to capture nonlinear dynamics with input delays.
  • The model integrates LSTM-based encoding and meta-learning to robustly handle history-dependent delays and short time-series data.
  • Empirical benchmarks show reduced prediction errors compared to classical eDMD, achieving performance close to fully-informed models.

The LSTM-Enhanced Deep Koopman Model is a neural architecture for learning linear representations of nonlinear dynamical systems that exhibit input delays or are defined by short time series. Rather than relying on manually engineered dictionaries as in classic extended Dynamic Mode Decomposition (eDMD), these models leverage deep learning components—most notably Long Short-Term Memory (LSTM) layers—to encode delayed inputs or time-series properties directly into a compact latent space where the evolution of the system admits a linear dynamics governed by an approximate Koopman operator. This facilitates prediction, control, and spectral analysis of systems with unknown or complex nonlinearities.

1. Mathematical Foundations of Koopman Operator Approximation

For a general discrete-time nonlinear control system xk+1=F(xk,uk)x_{k+1}=F(x_k,u_k) with xkRnx_k\in\mathbb{R}^n and ukRmu_k\in\mathbb{R}^m, the Koopman operator K\mathcal{K} acts linearly on observables: (Kϕ)(xk,uk)=ϕ(F(xk,uk))(\mathcal{K}\phi)(x_k,u_k)=\phi(F(x_k,u_k)). Since K\mathcal{K} is infinite-dimensional, practical applications seek a finite-dimensional approximation by learning mappings g:(xk,hk)zkRdg: (x_k,h_k)\mapsto z_k\in\mathbb{R}^d and g1:zkxkg^{-1}: z_k\mapsto x_k, as well as matrices AKRd×dA_\mathcal{K}\in\mathbb{R}^{d\times d} and BKRd×mB_\mathcal{K}\in\mathbb{R}^{d\times m} so that

zk+1AKzk+BKuk.z_{k+1} \approx A_\mathcal{K}z_k + B_\mathcal{K}u_k.

The identification task thus reduces to minimizing reconstruction and prediction errors in this lifted space with respect to the Frobenius norm over dataset snapshots:

minAK,BK k=0M1zk+1(AKzk+BKuk)F2.\min_{A_\mathcal{K},B_\mathcal{K}}\ \sum_{k=0}^{M-1} \| z_{k+1} - (A_\mathcal{K}z_k + B_\mathcal{K}u_k) \|_F^2.

This approach accommodates input delays by defining the lifted state zkz_k as a function of both the instantaneous state xkx_k and an encoded history hkh_k.

2. LSTM-Based Encoding for Input Delay and Short Series Dynamics

Input delay and the effect of incomplete state observations are addressed by embedding history windows as fixed-length vectors using an LSTM architecture. For length-ηH\eta_H history HkH_k comprising prior states and inputs, the LSTM recursively updates its hidden and cell states: \begin{align*} f_t &= \sigma(W_fH_{k,t} + U_fh_{t-1} + b_f) \ i_t &= \sigma(W_iH_{k,t} + U_ih_{t-1} + b_i) \ o_t &= \sigma(W_oH_{k,t} + U_oh_{t-1} + b_o) \ \tilde c_t &= \tanh(W_cH_{k,t} + U_ch_{t-1} + b_c) \ c_t &= f_t\odot c_{t-1} + i_t\odot \tilde c_t \ h_t &= o_t\odot \tanh(c_t) \end{align*} After completing the history window, hkhηHh_k\equiv h_{\eta_H} encodes all delayed and historical effects. In dictionary-free Koopman models, this encoding is concatenated with the current state before being passed to the encoder network that yields the lifted representation zkz_k.

In meta-learning extensions for short time-series (as in (Iwata et al., 2021)), a bidirectional LSTM is used to produce a task-dependent representation zz from the support sequence, and all embedding and decoding networks are conditioned on zz so as to adapt the Koopman space uniquely for each series.

3. Dictionary-Free Network Architectures

The architecture centers on three principal blocks:

  • LSTM block: Encodes historical state and input information into hkh_k.
  • Encoder block: Processes [xk;hk][x_k; h_k] through two fully connected layers (dimension e.g. 10 → 60 → 40) to produce zkz_k.
  • Linear Koopman dynamics: Evolves zkz_k via learned matrices: AKR40×40A_\mathcal{K}\in\mathbb{R}^{40\times40} and BKR40×1B_\mathcal{K}\in\mathbb{R}^{40\times1} for the benchmark problem.
  • Decoder block: Recovers the predicted measurement xk+1x_{k+1} from the lifted zk+1z_{k+1} via two mirrored fully connected layers.

No external dictionary of nonlinear observables is required; the lifting and decoding networks learn basis functions end-to-end, resulting in a dictionary-free realization. In meta-learning models for short time-series, the encoder and decoder are further conditioned on the bidirectional LSTM-derived representation zz.

4. Training Objectives, Losses, and Meta-Learning

The end-to-end loss for the dictionary-free deep Koopman model combines the following components:

  • Reconstruction loss: Lrec=xkg1(g(xk,hk))22L_{rec} = \| x_k - g^{-1}(g(x_k, h_k)) \|_2^2
  • One-step prediction loss: Lstep=xk+1g1(AKzk+BKuk)22L_{step} = \| x_{k+1} - g^{-1}(A_\mathcal{K}z_k + B_\mathcal{K}u_k) \|_2^2
  • Multi-step output prediction loss over horizon NLN_L: Lpred=i=1NLxk+ig1(z^k+i)22L_{pred} = \sum_{i=1}^{N_L} \| x_{k+i} - g^{-1}(\hat{z}_{k+i}) \|_2^2
  • Latent trajectory loss: Llpred=i=1NLzk+iz^k+i22L_{lpred} = \sum_{i=1}^{N_L} \| z_{k+i} - \hat{z}_{k+i} \|_2^2

These are combined with task-relevant weights to yield the overall objective:

L=wrecLrec+wstepLstep+wpredLpred+wlpredLlpred.L = w_{rec} L_{rec} + w_{step} L_{step} + w_{pred} L_{pred} + w_{lpred} L_{lpred}.

In meta-learning models for short series, the objective is the average prediction loss over episodically sampled tasks. System parameters (e.g., hidden and output dimensions, LSTM size, learning rate, batch size) are tuned for convergence and generalization.

5. Algorithmic Implementation and Hyperparameter Choices

For systems with input delay, such as the two-tank water system \begin{align*} dh_1/dt &= q(t-\tau) - (k_1/F_1)\sqrt{h_1},\ dh_2/dt &= (k_1/F_2)\sqrt{h_1} - (k_2/F_2)\sqrt{h_2}, \end{align*} data are simulated with parameters (sampling Ts=10T_s=10\,s, delay τ=20Ts\tau=20T_s, noise σ=0.1\sigma=0.1), generating N=4×105N=4\times 10^5 samples. The data is split evenly for training and testing.

During training, each input window HkH_k is processed by the LSTM to produce hkh_k, concatenated with xkx_k for encoding, and used to optimize the fourfold loss LL with Adam (lr=0.001\mathrm{lr}=0.001, batch size =100=100, epochs =1500=1500).

For meta-learning applications to short time-series, datasets include synthetic, Van-der-Pol, Lorenz, and cylinder-wake systems. The bidirectional LSTM hidden size is K=32K=32, with MLPs (four layers of $128$ units) used for both embedding and decoding. Training proceeds for up to $10,000$ epochs.

6. Enforcement of Linear Latent Evolution and Koopman Spectral Analysis

Linear evolution in latent space is explicitly enforced via the parameterized matrices AKA_\mathcal{K} and BKB_\mathcal{K}:

zk+t=AKtzk+i=0t1AKt1iBKuk+i.z_{k+t} = A_\mathcal{K}^t\, z_k + \sum_{i=0}^{t-1} A_\mathcal{K}^{t-1-i} B_\mathcal{K} u_{k+i}.

Minimizing multi-step losses LpredL_{pred} and LlpredL_{lpred} constrains the learned embedding to reside near an invariant subspace of the Koopman operator. The eigenvalues λj\lambda_j from the spectral decomposition AKvj=λjvjA_\mathcal{K} v_j = \lambda_j v_j approximate modes of growth, decay, and oscillation in the original system.

In meta-learning models, the Koopman matrix K(z)K(z) is estimated by least-squares from embedded time-series via the Moore–Penrose pseudoinverse. Conditioning on the LSTM-derived representation enables the model to adapt its encoding and decoding functions dynamically per time-series, supporting robust spectral and future prediction even for short data.

7. Empirical Benchmarks and Performance Assessment

Empirical evaluation uses mean absolute error (MAE) on prediction of system states as the main metric. For the two-tank water system, results are:

Model MAE [m] relative [%]
eDMD (true dynamics in dictionary) 0.175 100
LSTM-enhanced Deep Koopman 0.185 106
eDMD (unknown true nonlinearity) 0.606 346

When true nonlinearities are unknown (the typical real-world scenario), the LSTM-enhanced Deep Koopman model reduces MAE by a factor of \sim3 compared to eDMD without the correct dictionary. Compared to eDMD with full prior knowledge, it incurs only a 6% loss in accuracy, while remaining completely dictionary-free and utilizing a more compact 40-dimensional embedding.

Meta-learning LSTM-enhanced models for short time-series (Iwata et al., 2021) demonstrate superior eigenvalue estimation and future prediction RMSE across synthetic, chaotic, and fluid dynamics datasets, with ablation studies showing significant performance declines when bidirectional LSTM representations or meta-episodic training are omitted.

Eigenvalue spectra from the learned AKA_\mathcal{K} matrices closely match those of the true system, validating the efficacy of the dictionary-free lifting. In practice, models require on the order of $10$–$50$ timesteps for accurate adaptation to each new series and enable immediate spectral analysis and prediction without further fine-tuning.

8. Context and Significance

The LSTM-Enhanced Deep Koopman Model addresses two major limitations in Koopman-based modeling:

  • The need for expert-designed, system-specific dictionaries in eDMD, which are often infeasible when system nonlinearities are unknown.
  • The fragility of neural Koopman methods in applications with input delays or limited data, where history-dependent representations and meta-learning conditionals are essential.

A plausible implication is that the theory can be extended to noisy real-world systems or online learning by adjusting the LSTM history or meta-learning structure. The architecture’s capacity for compact, accurate, and adaptive linearization of nonlinear, delayed, or short time-series systems has implications for control, prediction, and spectral analysis across dynamical systems, robotics, and scientific computing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to LSTM-Enhanced Deep Koopman Model.