Deep Koopman-Layered Models

Updated 19 March 2026

Deep Koopman-layered models are data-driven surrogate models that use neural network lifts to transform nonlinear system dynamics into a structured latent space with near-linear behavior.
They combine nonlinear encoders, linear latent propagators, and decoders to achieve accurate long-term predictions and efficient control in complex dynamical environments.
Their design, training objectives, and empirical benchmarks demonstrate versatility in handling noise, control inputs, and high-dimensional dynamics.

Deep Koopman-layered models are a class of data-driven surrogate models that leverage neural-network-parameterized nonlinear lifts to map complex dynamical systems into a latent space where (often high-dimensional) dynamics are enforced to be finite-dimensional linear, approximately linear, or, in certain variants, convex or otherwise structurally constrained. The central architectural motif is a composition of a nonlinear encoder, a (typically) linear latent propagator corresponding to the Koopman operator, and a nonlinear or linear decoder reconstructing states or outputs from the latent. These models are motivated by Koopman operator theory, which states that any (suitably regular) nonlinear dynamical system admits an infinite-dimensional linear propagation of observables; deep Koopman-layered frameworks aim to learn finite-dimensional, expressive parametrizations that capture long-term and global dynamics while enabling fast prediction and control.

1. Architectural Principles and Network Structures

Koopman-layered models universally consist of three “layers”:

Encoder (Lifting map): A neural network φ that maps the physical state x (or possibly input-output history) into a latent coordinate z ∈ ℝ^r, intended to represent a set of observables that, ideally, span a Koopman-invariant subspace. Common choices include fully connected MLPs, ResNets, CNNs for structured data (Yeung et al., 2017, Millard et al., 2024), and, in advanced variants, Kolmogorov–Arnold Networks (KANs) (Nehma et al., 2024).
Latent-space propagator (Koopman layer): The lifted variable is propagated linearly: z_{k+1} = A z_k + B u_k, where (A, B) are trainable (or structured) matrices. However, model variants include Wiener-type (encoder–linear–decoder), input-affine, bilinear, convex-ICNN, and extended/parametric forms (e.g., with invertible control transforms, time-varying Toeplitz layers, or innovation noise) (Schulze et al., 2022, Spanbauer et al., 2020, Hashimoto et al., 2024, Iacob et al., 13 Jul 2025).
Decoder: Maps z back to the physical or observation space. It may be a linear map, a fully connected network, or a structured operator projecting only the original state components (Zhang et al., 30 Mar 2025, Valábek et al., 6 Nov 2025).

These models can be summarized diagrammatically (with φ as encoder, ψ as decoder, A as Koopman matrix):

xₖ —[φ]→ zₖ —[A,B dynamics]→ zₖ₊₁ —[ψ]→ x̂ₖ₊₁

The key to expressivity and generalization is the selection of the encoder architecture and the precise mathematical constraints imposed on A and/or B during optimization.

2. Training Objectives and Loss Formulations

The prevailing goal in deep Koopman-layered models is to enforce accurate linear latent propagation and (where needed) reconstructability. Typical loss functions include:

Latent linearity loss: Measures the agreement between the predicted next latent and the encoder’s output on the actual next state:

$L_{\mathrm{lin}} = \sum_{k} \|\phi(x_{k+1}) - A \phi(x_k) - B u_k\|^2$

Reconstruction loss: Enforces that the encoder–decoder pair approximates an autoencoder:

$L_{\mathrm{rec}} = \sum_{k} \|x_k - \psi(\phi(x_k))\|^2$

Multi-step prediction loss: Penalizes deviation between simulated (unrolled) model predictions and ground truth over a specified horizon:

$L_\mathrm{multi} = \sum_{k,\,h=1}^T \|x_{k+h} - \psi(A^h \phi(x_k) + \cdots)\|^2$

Application-specific terms: Physics-informed acceleration losses, Lyapunov regularization for latent-stability, innovation-form noise modeling, or information bottleneck objectives for controlling simplicity/expressiveness tradeoff (Zhang et al., 30 Mar 2025, Sun et al., 2024, Cheng et al., 14 Oct 2025).

These losses are typically combined, possibly with weight regularization (||W||_1 or ||W||_2^2), in a weighted sum tailored to the task and dataset.

3. Structural and Theoretical Extensions

Deep Koopman frameworks are extended in several key directions:

Wiener- and Hammerstein-Block Structures: Linear latent propagation sandwiched between nonlinear encoder/decoder networks; enables extremely low-dimensional (r=1–3) surrogates with high accuracy, especially beneficial for model reduction and control (Schulze et al., 2022, Iacob et al., 2021).
Convex and Extended Koopman Models: Convex dynamics in latent space (implemented via input convex neural networks, ICNNs), and invertible reparameterizations of the control input, lead to improved long-horizon predictivity and robustness to control uncertainty (Spanbauer et al., 2020).
Toeplitz-Matrix and Krylov Subspace Construction: For systems with periodic or nonautonomous structure, deep Koopman models with Toeplitz-structured latent maps and matrix exponentials (computed via Arnoldi iterations) ensure universality and scalability (universality theorems and Rademacher-complexity bounds are provided) (Hashimoto et al., 2024).
Probabilistic and Variational Koopman Models: Incorporate uncertainty via variational inference over the latent, yielding ensembles of confidence-aware, linearized surrogates. The Deep Variational Koopman and Deep Probabilistic Koopman variants employ stochastic autoencoders and parameterize time-varying predictive distributions (Morton et al., 2019, Mallen et al., 2021).
Information-regularized Koopman Networks: Explicitly balance simplicity (mutual information minimization with the input) and expressiveness (maximizing von Neumann entropy to prevent mode collapse) with Lagrangian objectives, empirically improving stability and representational coverage (Cheng et al., 14 Oct 2025).

4. Modeling with Inputs, Control, and Innovation Noise

Modern deep Koopman-layered surrogates handle controlled and noisy systems by incorporating:

Input-affine and Bilinear terms: Latent update laws such as z_{k+1} = A z_k + ∑i B^{{(i)} z_k u}{k,i}, or more generally B(z_k, u_k) u_k, generalizing the finite-dimensional (control-affine) Koopman theory to practical settings (Iacob et al., 2021, Schulze et al., 2022, Iacob et al., 13 Jul 2025).
Innovation form noise models: Innovation noise is incorporated in the latent update, i.e., z_{k+1} = A z_k + B(z_k, u_k) u_k + K(z_k, u_k, e_k) e_k, and the initial latent is successfully reconstructed from initial I/O history via a deep encoder (Iacob et al., 13 Jul 2025).
Adaptive and online updates: Real-time adaptation of A, B in the lifted space, e.g., via sliding window least squares, robustifies models to parameter drift or unmodeled disturbances without the need to retrain the encoder (Zhang et al., 30 Mar 2025).
Integration with control algorithms: Deep Koopman surrogates are now routinely deployed within MPC architectures (both economic and tracking/control), often outperforming classical subspace methods (such as N4SID) by a significant margin in both prediction error and closed-loop economic objectives (Valábek et al., 6 Nov 2025, Abtahi et al., 4 Mar 2025).

5. Empirical Performance and Benchmarks

Deep Koopman-layered models have been quantified on diverse tasks:

Model Type	Benchmark Domain	Horizon/Metric	Key Results
Standard Deep Koopman AE	Glycolytic oscillator, Power grid	up to 400 steps	≤1% one-step error, accurate long-horizon
Wiener-type Koopman-AE	Chemical reactor, distillation	NMSE, trajectory tracking	r=1–2, lowest NMSE, strongest reduction
Convex/Extended Koopman	Double-well, quadruped	Trajectory/rollout error	~4–5× lower error vs. linear benchmarks
Physics-informed DK	Autonomous vehicle, CarSim	RMSE (wheel/vel/yaw)	Up to 95% lower error after adaptation
Probabilistic Koopman	Electricity, Chem, NeuroScience	Negative log-likelihood	Outperforms all 177 domain-specific models
Toeplitz-layered Koopman	Van der Pol, time-varying vortex	Eigenvalue estimation, MSE	Unit-circle spectra where appropriate
Information-regularized	Lorenz-63, Kármán vortex	NRMSE/SSIM, mode diversity	Best NRMSE and eigenvalue spread

Almost everywhere, deep Koopman-layered models outperform shallow, fixed-dictionary, and purely linear control techniques, especially for nonlinear, multi-time-scale, and high-dimensional phenomena (Yeung et al., 2017, Schulze et al., 2022, Iacob et al., 13 Jul 2025, Valábek et al., 6 Nov 2025, Cheng et al., 14 Oct 2025).

6. Implementation Practices and Challenges

Best practices in model development include:

Encoder/decoder design: Prefer moderate-width MLPs (two or three layers, 20–100 units) for generic problems; adopt CNNs or ResNets for spatial or image data; hybridize with Kolmogorov–Arnold Networks (KANs) for parameter efficiency and fast convergence (Nehma et al., 2024, Millard et al., 2024).
Latent dimension selection: Empirical grid search is usually required to select the smallest nr that preserves accuracy; oversizing risks overfitting, undersizing can miss critical modes (Iacob et al., 2021).
Multi-step losses and multiple-shooting: Batch-parallel, truncated unrolls (“multiple shooting”) balance memory efficiency and capture long-time structure, especially when combined with early stopping (Iacob et al., 2021, Iacob et al., 13 Jul 2025).
Spectral, Lyapunov, and stability penalties: Enforcing spectral radius, Lyapunov, and orthogonality constraints improves robustness and generalization, especially in long rollout or control settings (Forootani et al., 4 Aug 2025).
Generalization and universality: Theoretical guarantees on expressivity are available for specific structured networks (e.g., Toeplitz-based models with exponentials of banded matrices), but tuning remains empirical and can scale poorly in deep/broad settings (Hashimoto et al., 2024).
Pitfalls: Models can fail when B(z, u) is extrapolated far outside training data, or if the LR/spectral radius is not controlled, leading to latent drift or instability. Regularization, excitation design, and careful validation are essential.

7. Research Directions and Open Questions

Active research frontiers include:

Mode diversity and representation compression: Information-theoretic regularization (mutual information, von Neumann entropy) improves latent code coverage and avoids mode collapse, delivering more interpretable and robust Koopman subspaces (Cheng et al., 14 Oct 2025).
Adaptive, online, and nonautonomous learning: Designs that allow for windowed or sequential updating of the Koopman operator address parameter drift and system nonstationarity.
Probabilistic and uncertainty-aware extensions: Explicitly integrating variational Bayesian inference and distributional prediction into Koopman-layered models, as in DVK/DPK, allows for quantification of epistemic/model uncertainty (Morton et al., 2019, Mallen et al., 2021).
Integration of physical constraints: Time-reversibility, stochasticity, conservation laws (e.g., detailed balance) are now incorporated through architectural and optimization constraints, providing domain-specific guarantees and helping in sample-inefficient regimes (Mardt et al., 2019).
Architectures for high-dimensional and structured domains: CNN-, graph-, and Transformer-based encoders are now embedded in Koopman-layered pipelines for environmental, financial, or visual-control contexts, driving performance on synthetic and real-world benchmarks (Forootani et al., 4 Aug 2025, Millard et al., 2024).
Scalability and numerical efficiency: Toeplitz-matrix factorization, Krylov subspace methods, batched or parallel implementation, and compressed spectral layers are opening up applications in high-dimensional, large-scale dynamics while preserving mathematical properties (Hashimoto et al., 2024).