Koopman-Based Models in Data-Driven Dynamics

Updated 14 August 2025

Koopman-based models are operator-theoretic techniques that transform nonlinear system dynamics into linear representations in a suitable function space, enabling analysis and control.
They use finite-dimensional approximations, including EDMD and neural network feature maps, to capture complex behaviors and support robust control and prediction.
Applications span robotics, fluid dynamics, and generative modeling, offering efficient reduced-order models with uncertainty quantification and adaptive control.

Koopman-based models are data-driven dynamical systems models that employ operator-theoretic techniques to represent nonlinear system evolution as a linear process in a suitably defined function space. Central to this approach is the Koopman operator, an infinite-dimensional linear operator acting on observables of the system state, whose approximation yields linear models for analysis, prediction, and control, even when the original system is highly nonlinear. Over the last decade, these models have found applications in control theory, fluid dynamics, robotics, uncertainty quantification, generative modeling, and complex multi-agent systems.

1. Mathematical Foundations of Koopman Operator Theory

The Koopman operator $\mathcal{K}^t$ advances scalar or vector-valued observables $\psi$ along the flow of a dynamical system. For a nonlinear ordinary differential equation,

$\dot{x}(t) = f(x(t)), \qquad x \in \mathbb{R}^n,$

the operator is defined for any function $\psi:\mathbb{R}^n \to \mathbb{C}$ by

$\mathcal{K}^t \psi(x) = \psi\big(x(t;x)\big),$

where $x(t;x)$ denotes the solution initialized at $x$ at $t=0$ . This mapping is linear in $\psi$ , as

$\mathcal{K}^t (\alpha\psi_1 + \beta\psi_2) = \alpha \mathcal{K}^t\psi_1 + \beta \mathcal{K}^t\psi_2.$

Eigenfunctions $\phi$ of $\mathcal{K}^t$ with eigenvalues $\lambda$ satisfy

$\mathcal{K}^t \phi(x) = e^{\lambda t} \phi(x),$

which connects spectral properties of the operator to system behavior. For autonomous and non-autonomous systems, and for systems with control, related generator forms and lifted representations can systematically represent both the system and control input influences, often yielding linear or bilinear models in the observable coordinates (Abraham et al., 2017, Bevanda et al., 2021, Iacob et al., 2022). A spectral decomposition of the operator enables a modal representation: $\psi(x,t) = \sum_{j=1}^\infty a_j e^{\lambda_j t} \phi_j(x),$ providing a foundation for reduced-order modeling and control synthesis.

2. Data-Driven Approximation and Model Construction

The infinite-dimensionality of the Koopman operator necessitates finite-dimensional approximations for practical applications. The standard approach involves selecting a dictionary of basis functions $\{\psi_1, \ldots, \psi_N\}$ and using dynamic mode decomposition (DMD) or its extended variants (EDMD) to approximate the operator from trajectory data (Abraham et al., 2017, Bevanda et al., 2021). In EDMD, for snapshot pairs $(x_k, y_k)$ , where $y_k$ is the evolution of $x_k$ after time $\delta$ , the finite-dimensional operator $K$ satisfies

$\Psi(y_k) \approx K \Psi(x_k),$

where $\Psi(x) = [\psi_1(x), \ldots, \psi_N(x)]^\top$ . The least-squares solution is

$K = G^\dagger A,$

with $G = \sum \Psi(x_k)\Psi(x_k)^\top$ and $A = \sum \Psi(x_k)\Psi(y_k)^\top$ (Abraham et al., 2017, Bold et al., 2023).

Recent works utilize neural networks to learn data-driven feature maps for the observables, leading to increased expressive power and potentially approximating Koopman-invariant subspaces over wider operating regions (Uchida et al., 4 Dec 2024, Folkestad et al., 2021). Model averaging and Bayesian ensemble techniques further combine multiple linear models learned from different data subsets, weighting them by predictive performance to improve generalization, smooth modeling errors, and remain robust on unseen data (Uchida et al., 4 Dec 2024).

Bilinear models, which arise naturally for control-affine systems, exploit the structure of the generator: $\dot{z} = F z + \sum_{i=1}^m G_i z u_i,$ with $z$ the lifted state, allowing for effective modeling and control of input-coupled nonlinear systems (Folkestad et al., 2021, Folkestad et al., 2021, Otto et al., 2022).

3. Model Selection, Error Analysis, and Theoretical Properties

A critical aspect of Koopman-based modeling is determining appropriate observables ("basis functions") that yield an invariant (or nearly invariant) subspace under the Koopman operator, providing an accurate and efficient finite-dimensional approximation. The invariance proximity metric gives a tight upper bound on the worst-case relative prediction error,

$I_K(S) = \max_{f\in S,\; \|Kf\|\neq 0} \frac{\|Kf - P_S Kf\|}{\|Kf\|},$

where $S$ is the model subspace, $K$ the Koopman operator, and $P_S$ the orthogonal projection. This bound can be expressed in closed-form as the sine of the largest Jordan principal angle between $S$ and $K S$ (Haseli et al., 2023). This metric allows one to assess and optimize the fidelity of EDMD or neural network–based approximations by subspace learning.

Other works introduce heuristics for mode selection in reduced-order models, for example, by monitoring residual normality (Shapiro–Wilkes test) and balancing the number of deterministic modes with the requirement that the residuals (modal noise) become approximately Gaussian (Mohr et al., 2022).

Error bounds for approximating linear parameter-varying (LPV) Koopman models with linear time-invariant (LTI) surrogates are quantitatively derived, specifying conditions under which the finite-dimensional approximation is valid and guiding the choice between LTI and LPV model structures (Iacob et al., 2022, Iacob et al., 2022).

4. Applications and Empirical Results

Koopman-based models have been applied successfully in a variety of contexts:

Robotics: Data-driven Koopman models have enabled robust model-based control (open- and closed-loop) in robotic systems, including underactuated pendulum systems, mobile differential-drive robots, and quadrotors subjected to complex nonlinearities and challenging terrains (Abraham et al., 2017, Folkestad et al., 2021, Folkestad et al., 2021). Incorporating richer basis functions improves performance, especially for systems with strong periodicity or nonlinear coupling.
Fluid Dynamics: Koopman operator-based reduced-order models achieve significant rank compression and computational savings in PDE-constrained problems (e.g., Burgers, Navier–Stokes, Rayleigh–Bénard convection, and Saint-Venant equations) while maintaining prediction and control fidelity (Peitz et al., 2018, Markmann et al., 10 May 2024, Bistrian et al., 5 Sep 2024). The use of advanced architectures (e.g., LRAN) allows accurate modeling even in highly turbulent flows, outperforming fixed-dictionary approaches like KDMD in non-periodic regimes (Markmann et al., 10 May 2024).
Stochastic and Agent-Based Systems: In high-dimensional, stochastic ABMs, generator-based Koopman surrogates allow for rapid multi-objective optimization, accurately approximating Pareto optimal controls with a dramatic reduction in computational expense relative to direct simulations (Niemann et al., 2023).
Nonlinear State Estimation and Uncertainty: Deep variational Koopman models (DVK) infer distributions over latent observables, yielding ensembles of linear models that enable uncertainty-aware prediction and robust control (Morton et al., 2019).
Generative Modeling and Distillation: The Koopman Distillation Model (KDM) distills the multi-step, nonlinear generative trajectories of diffusion models into a single linear step in Koopman latent space, granting substantial speedups and FID improvements without sacrificing semantic fidelity (Berman et al., 19 May 2025). Theoretical results guarantee that proximity in the latent space corresponds to semantic similarity in outputs.

Application Area	Koopman Model Implementation	Notable Results/Properties
Robotics	Polynomial/Fourier dictionaries, DNN basis	Improved stabilization/tracking, adaptation
Fluid Dynamics	DMD/EDMD, LRAN, KDMD	Rank reduction, accurate prediction
ABMs / Socio-dynamics	Koopman generator surrogate (gEDMD)	Efficient multi-objective optimization
Generative Models	Koopman-latent distillation (KDM)	One-step generation, FID/IS improvements
Uncertainty/Filtering	DVK, ensemble & Bayesian methods	Explicit uncertainty quantification, robust

5. Model-Based Control and Adaptation

Koopman-based models are particularly advantageous for model-based control, transforming nonlinear and control-affine systems into linear or bilinear systems in the lifted space, thus enabling the direct application of LQR, MPC, and other linear control design tools (Abraham et al., 2017, Bevanda et al., 2021, Folkestad et al., 2021). Bilinear lifted models, such as those produced by the Koopman canonical transform, are especially effective for highly coupled nonlinear actuation, significantly reducing prediction and trajectory tracking error compared to standard linearizations (Folkestad et al., 2021, Folkestad et al., 2021).

Adaptive Koopman models, such as recursive EDMD with online updating and variable forgetting factors, allow tracking of time-varying dynamics and support holistic adaptation for both controller and observer design. In robotic test rigs, such adaptive architectures maintain low control error in the face of large abrupt system changes, significantly outperforming traditional gain-scheduling approaches (Junker et al., 2022).

Strategies for online adaptivity and continual learning of the Koopman model from sensor data also support robust operation in changing environments or in the presence of unpredictable disturbances (Peitz et al., 2018, Junker et al., 2022).

6. Recent Architectures, Universality, and Future Directions

Recent works introduce architectures with strong theoretical guarantees, such as deep Koopman-layered models using learnable Toeplitz matrices for universal approximation (i.e., ability to represent any nonlinear dynamical system to arbitrary accuracy in the chosen basis) (Hashimoto et al., 3 Oct 2024). These models leverage the universal property of Toeplitz matrix products and RKHS reproducing properties to ensure both approximation capacity and controlled generalization, even for nonautonomous and temporally switching systems.

Krylov subspace methods enable efficient computation of actions by large matrix exponentials (arising from Koopman layers), bridging operator learning with classic numerical linear algebra for scalable training (Hashimoto et al., 3 Oct 2024). Surrogate modeling advances—such as model averaging and ensemble Bayesian approaches—address the challenge of learning near-invariant observables and managing epistemic uncertainty, further improving robustness and control reliability (Uchida et al., 4 Dec 2024).

Theoretical advances include explicit quantitative error bounds for subspace approximation (invariance proximity), optimal LTI synthesis from LPV Koopman models (via LMIs for H₂ and ℓ₂-gain minimization), and semantic preservation guarantees in generative modeling (Haseli et al., 2023, Iacob et al., 2022, Berman et al., 19 May 2025).

Active directions include:

Joint learning of observable dictionaries and model parameters, especially with neural network parameterizations.
Embedding physics-based symmetries directly into mappings and loss functions to accelerate learning and improve extrapolation (Markmann et al., 10 May 2024).
Quantitative performance guarantees for data-driven control based on subspace proximity metrics and uncertainty-aware model selection.
Extensions to controlled or actuated high-dimensional systems, non-autonomous and switching systems, and real-time control architectures.
Development of rigorous heuristics for model order and mode selection in high-dimensional settings.

7. Practical Considerations, Limitations, and Outlook

Koopman-based models offer significant promise for unifying nonlinear system modeling, control, and prediction within an operator-theoretic, data-driven framework. However, several challenges remain:

The identification of invariant (or sufficiently expressive) observable subspaces is generally problem-dependent and nontrivial. Data-driven dictionary learning, ensemble methods, and universal architectures are active research directions to ameliorate this (Uchida et al., 4 Dec 2024, Hashimoto et al., 3 Oct 2024).
Finite data and finite basis sets inherently approximate only part of the infinite-dimensional dynamics, leading to residual errors that must be carefully quantified and, where possible, bounded (Haseli et al., 2023, Mohr et al., 2022).
For systems with strong nonlinearity in control input or high turbulence (e.g., at large Rayleigh numbers), flexibility in observable representation (learned dictionaries, neural architectures) becomes critical for maintaining prediction quality (Markmann et al., 10 May 2024).
For model-based control, ensuring computational tractability necessitates compact lifted models or efficient solvers (e.g., warm-started SQP, Krylov methods) compatible with real-time constraints (Folkestad et al., 2021, Hashimoto et al., 3 Oct 2024).
Uncertainty quantification, robust adaptation, and safe deployment in real-world systems require careful assessment and integration of epistemic/modeling uncertainties, often handled via Bayesian averaging, variational inference, or confidence-bounded residual modeling (Uchida et al., 4 Dec 2024, Morton et al., 2019, Mohr et al., 2022).

The operator-theoretic paradigm championed in Koopman-based models continues to unify advances in system identification, predictive modeling, estimation, and feedback control, with ongoing theoretical and algorithmic developments expected to broaden the applicability and reliability of these models in complex, high-dimensional, and safety-critical systems.