Koopman Learning: Data-Driven Dynamics

Updated 28 October 2025

Koopman learning is a method that approximates the infinite-dimensional Koopman operator using finite data to linearize nonlinear dynamical systems.
Neural autoencoders, convex optimization, and bi-level frameworks are key techniques that extract invariant observables and facilitate multi-step prediction.
Practical applications include fluid dynamics, trajectory forecasting, and control systems, with model selection driven by reconstruction and prediction errors.

Koopman learning is the data-driven identification, approximation, and application of Koopman operators for the analysis and prediction of nonlinear dynamical systems. The Koopman operator is an infinite-dimensional linear operator that acts on observables of the system, enabling the paper of nonlinear dynamics via linear methods when the system is suitably "lifted" into a higher-dimensional function space. The core challenge in Koopman learning is to construct finite, tractable representations of the Koopman operator from data, with sufficient fidelity to the true system, such that linear techniques can be leveraged for prediction, control, and understanding—even in highly nonlinear regimes.

1. Koopman Operator Theory and Its Role in Learning

The Koopman operator $\mathcal{K}$ associated with a dynamical system $x_{k+1} = f(x_k)$ is the linear operator acting on a space of observables $g: X \to \mathbb{C}$ , given in discrete time by

$\mathcal{K} g(x) := g(f(x))$

and in continuous time by

$[\mathcal{K}_t g](x) := g(\phi^t(x))$

where $\phi^t(x)$ denotes the flow at time $t$ .

The main premise of Koopman theory is that, even if $f$ is nonlinear, the evolution of observables $g$ can be described linearly in an appropriately chosen (infinite-dimensional) function space. If a finite-dimensional set of observables is invariant under $\mathcal{K}$ , then the system can be fully linearized in that subspace. In practice, Koopman learning is concerned with discovering an encoder $g:\mathbb{R}^n\to \mathbb{R}^m$ and finite-dimensional operator $K$ such that $g(x_{k+1}) \approx K g(x_k)$ (or in continuous time, $d g(x_t)/dt \approx K g(x_t)$ ).

Koopman learning is thus a form of representation learning, seeking coordinate transformations or embeddings where nonlinear dynamics appear linear, enabling spectral analysis, prediction, and control using linear methods (Mezic, 2020).

2. Neural and Optimization-based Architectures for Learning Koopman Operators

Several methodological approaches have been developed for data-driven Koopman learning:

Deep Koopman Autoencoders: Neural architectures simultaneously learn (i) an encoding $g(x)$ (the observable space), (ii) a decoding $g^{-1}(y)$ , and (iii) a linear Koopman operator $K$ in the latent space. During training, reconstruction, linearity, and multi-step prediction losses are optimized (Dey et al., 2022). In DLKoopman, for example, both state and trajectory prediction modes are supported, with the Koopman matrix learned via singular value decomposition (SVD) or as a linear layer. This enables the modeling of systems previously outside the scope of linear DMD (Dynamic Mode Decomposition) or manually constructed dictionaries.
Optimization-based Programmatic Methods: Approaches based on convex optimization, particularly semi-definite programming (SDP), cast Koopman learning as a rank minimization or reduced-rank regression problem. These infer both the embedding and the operator directly from data, determining also the necessary dimension of the lifted space and system memory/order (Sznaier, 2021, Estornell et al., 25 Apr 2025). Chordal sparsity and nuclear-norm relaxations enable scalable computation. In hybrid frameworks, initial structure and Koopman operator extraction are performed using SDP, followed by neural network-based learning for mappings into and out of the observable space.
Bi-level Optimization: For systems with control, bi-level optimization frameworks separate the learning of the embedding and the Koopman dynamics. The inner optimization solves, in closed-form, for the best-fit linear operators over a trajectory (often using an integral/continuous-time formulation), while the outer loop updates the autoencoder (Huang et al., 2023). This approach reduces error accumulation and improves training efficiency, especially for long-horizon prediction and low-rate data.
Extensions to Hybrid, Distributed, and Online Settings: For hybrid systems or multi-agent contexts, Koopman learning can be extended to allow for distributed or partial observations (Bakker et al., 2020, Hao et al., 17 Sep 2024, Chen et al., 7 Jul 2025). For online/streaming scenarios, novelty detection via Grassmannian distance and adaptive basis updating enable the model to incorporate new dynamical regimes efficiently (Loya et al., 18 Jul 2024).

3. Practical Algorithms and Model Selection

Koopman learning involves both the problem of operator construction and model/hyperparameter selection. Key algorithmic components include:

Loss Functions: Typical losses include reconstruction (autoencoder) error, linearity (how well the latent space evolves linearly), and prediction (one-step or multi-step decoding accuracy). DLKoopman introduced the Average Normalized Absolute Error (ANAE), a scale-independent, human-readable metric for prediction and reconstruction accuracy, facilitating model selection (Dey et al., 2022).
Hyperparameter Search: Integrated search modules automate selection over embedding size, network architecture, operator rank, and training settings (e.g., in DLKoopman), streamlining the development of robust Koopman models (Dey et al., 2022).
Training Regimes: Snapshot-based, trajectory-based, or episodic (with memory) training modalities are supported, enabling versatility and generalization across application domains (Dey et al., 2022, Redman et al., 2023). Bi-level and hybrid frameworks decouple operator estimation from encoding learning, improving computational efficiency and stability (Huang et al., 2023, Estornell et al., 25 Apr 2025).
Benchmarking and Evaluation: Case studies (e.g., CFD airfoil pressure prediction, polynomial ODE systems) demonstrate that accurate Koopman models can be learned from snapshot or trajectory data alone, with test prediction errors (e.g., ANAE $\approx 7\%$ ) competitive against other methods (Dey et al., 2022).

Architecture/class	Embedding selection	Operator estimation	Model selection
Neural autoencoder	Learned (NN)	Linear/param (NN/SVD)	Hyperparameter search
SDP/programmatic	Learned (SDP)	Explicit (convex/SDP)	SDP structure, error certs
Hybrid (SDP+NN)	SDP informs AE design	Both	Guided by SDP and AE

4. Challenges, Tradeoffs, and Extensions

Several issues complicate Koopman learning:

Finite-dimensionality and Spectrum: Not all nonlinear systems admit finite-dimensional Koopman-invariant subspaces; for those that do not, the observable space may need to be large, involve memory (history), or produce significant model error. Hybrid approaches use programmatic stages (e.g., SDP) to estimate the requisite embedding dimension and delay/memory order (Estornell et al., 25 Apr 2025).
Noise, Partial Observation, and Privacy: When learning from noisy or incomplete data, regularized loss functions and robust optimization are required (Hao et al., 26 May 2024, Chen et al., 7 Jul 2025). Federated learning, combined with privacy-preserving state estimation (e.g., Kalman filtering), allows collaborative linearization without sharing raw data, mitigating data scarcity and privacy issues (Chen et al., 7 Jul 2025).
Computational Complexity: Methods relying on SDPs or large neural architectures can be computationally expensive; sparsity, rank relaxation, streaming updates, and bi-level formulations all address computational bottlenecks in various regimes (Sznaier, 2021, Loya et al., 18 Jul 2024, Huang et al., 2023).
Model Verification: Existing standard algorithms can be unstable, produce non-convergent or nonverifiable spectra, especially with improper or fixed dictionaries. Algorithms with verifiable error control and optimal limit structure have been devised for rigorous applications (Colbrook et al., 8 Jul 2024).
Regime Change and Memory: For non-stationary or episodic data, incorporating memory mechanisms (e.g., episodic memory modules, attention to past spectral modes) can significantly improve prediction and robustness for time series with regime switching or repeated patterns (Redman et al., 2023).

5. Applications and Empirical Results across Scientific Domains

Koopman learning has enabled advances in a variety of application domains:

Fluid mechanics and CFD: Predicts high-dimensional fields (e.g., airfoil surface pressure) using snapshot data, facilitating prediction at unmeasured states (Dey et al., 2022).
Trajectory forecasting and control: Produces accurate, long-horizon rollout predictions for nonlinear ODE systems (Dey et al., 2022, Huang et al., 2023).
Reinforcement learning and control: Embeds systems in spaces where linear or bilinear model predictive control is tractable (Folkestad et al., 2021) and can enforce spectral properties (e.g., stability, periodicity) as explicit control objectives (Ohnishi et al., 2021).
Online and multi-agent estimation: Enables streaming, privacy-preserving, and distributed learning for multi-agent or cyber-physical systems (Hao et al., 17 Sep 2024, Loya et al., 18 Jul 2024, Chen et al., 7 Jul 2025).
Non-stationary/episodic prediction: Dramatically reduces forecast error in systems with recurring or regime-switching behavior using Koopman memory modules (Redman et al., 2023).
Automated observable discovery: Active learning and neural approaches automate the selection of observable functions for efficient data collection and system identification (Abraham et al., 2019).

6. Future Directions and Impact

Koopman learning has advanced significantly via the integration of deep learning, optimization, active and distributed learning, robust estimation, and practical software infrastructure. Notable present and future directions include:

Hybrid learning frameworks: Further blending programmatic structure discovery (SDP, convex optimization) with neural/machine learning for explicit, scalable, and robust Koopman representations, as demonstrated in recent hybrid methodologies (Estornell et al., 25 Apr 2025).
Operator-theoretic control: Expansion of Koopman-based control to high-dimensional, nonlinear, and hybrid systems, with formal guarantees on performance and stability (Ohnishi et al., 2021, Folkestad et al., 2021).
Verification-Critical Systems: Use of weighted RKHS and contractive operators in stability and safety-critical applications, including data-driven Lyapunov and Zubov function estimation with probabilistic error certification (Tang, 30 Sep 2024).
Limits of Learnability: Clarification of the boundaries of what operator-theoretic properties can and cannot be learned from data, depending on system geometry and complexity, and the necessary algorithmic structure to achieve reliable learning (Colbrook et al., 8 Jul 2024).

Overall, Koopman learning constitutes a central technique in contemporary data-driven dynamical systems analysis, providing a bridge between nonlinear dynamics and the tractable world of linear operator theory. With algorithmic, computational, and practical advances, it is increasingly being adopted in scientific computing, control, and engineering workflows.