Operator-Theoretic and Data-Driven Methods

Updated 3 April 2026

Operator-theoretic and data-driven methods are a unified framework that employs Koopman and Perron–Frobenius operators to linearize and analyze complex nonlinear dynamics.
These methods leverage techniques like DMD, EDMD, and neural operator architectures to extract key spectral features and causal interactions from data.
They offer practical insights for applications in fluid dynamics, power grids, robotics, and climate modeling, enabling robust predictive control and dimensionality reduction.

Operator-theoretic and data-driven methods form a cohesive framework for understanding, modeling, predicting, controlling, and reducing complex dynamical systems. Operator theory, primarily through the Koopman and Perron–Frobenius formalisms, enables the global linearization of nonlinear dynamics by acting on observables or densities instead of states. Data-driven approaches—especially various forms of dynamic mode decomposition (DMD), extended DMD (EDMD), and neural operator architectures—provide efficient and scalable approximations of these operators directly from empirical or simulated data. Hybridizing these perspectives yields powerful practical algorithms for dynamical systems, causal inference, control, autonomous and partially observed systems, and operator learning.

1. Operator-Theoretic Foundations

The central object in operator-theoretic dynamical systems is the infinite-dimensional Koopman operator, which acts linearly on a space of scalar observables $g$ even for nonlinear state-space dynamics $x_{k+1} = F(x_k)$ (Mezić, 2023, Snyder et al., 2021). Koopman operator evolution is given by

$(\mathcal{K}g)(x) = g(F(x)).$

The spectral decomposition of $\mathcal{K}$ characterizes the dominant dynamical features—eigenfunctions $\phi_j$ and eigenvalues $\lambda_j$ , with spectral expansions such as

$g(F^k(x)) = (\mathcal{K}^k g)(x) = \sum_j \lambda_j^k \phi_j(x) v_j.$

Model reduction is achieved by truncating to the dominant (discrete) spectrum, yielding finite-dimensional approximations where possible. The Perron–Frobenius operator provides a dual view, propagating densities under the system flow (Klus et al., 2017).

This formalism generalizes to stochastic, controlled, and delayed systems (Nandanoori et al., 2019, Gutiérrez et al., 2020). In the presence of noise or controls, operator evolution naturally extends to conditional expectations (Koopman) or nonparametric evolution of densities (Perron–Frobenius), encoded in Fokker–Planck or Kolmogorov generators (Vaidya et al., 2022).

2. Data-Driven Operator Approximation

Data-driven realization of operator-theoretic models employs time-series or simulation snapshots to approximate the action of the Koopman or transfer operator.

Dynamic Mode Decomposition (DMD): Seeks a best-fit linear operator in state-space ( $x_{k+1} \approx A x_k$ ) (Snyder et al., 2021).
Extended DMD (EDMD): Lifts data to a dictionary of observables ( $\Psi(x)$ ), enabling a higher-order closure for Koopman approximation via least squares:

$K = \underset{A}{\arg\min} \| Y' - A Y \|_F^2,$

where $x_{k+1} = F(x_k)$ 0 consist of lifted snapshots (Mezić, 2023, Sharma et al., 2019, Gutiérrez et al., 2020).

Kernel and Neural Extensions: RKHS-based kernels, neural-network feature maps, and invertible neural architectures enhance expressivity and stability (Li et al., 2024, Jin et al., 25 Mar 2025).

Robustness is addressed through regularization (e.g., $x_{k+1} = F(x_k)$ 1 sparsity in EDMD), robust optimization under input/output noise or rank deficiencies (Sharma et al., 2019, Sinha et al., 2020, Sinha et al., 2020). Streaming and recursive approaches (rEDMD) enable efficient real-time updating with per-step complexity $x_{k+1} = F(x_k)$ 2, well-suited to online monitoring (Sinha et al., 2020, Sinha et al., 2019).

Operator-theoretic causal analysis, such as Linear Operator Causality Analysis (LOCA), leverages the structure of the system generator $x_{k+1} = F(x_k)$ 3 (or its approximation) to quantify the direct and indirect propagation of perturbations between state components (Srivastava et al., 9 Jun 2025). The LOCA causality metric at horizon $x_{k+1} = F(x_k)$ 4 is defined as

$x_{k+1} = F(x_k)$ 5

revealing both immediate (direct, $x_{k+1} = F(x_k)$ 6) and accumulated (indirect, via the matrix exponential series) causal influences.

LOCA generalizes and clarifies data-driven methods such as Granger causality and transfer entropy: for linear systems, these statistically motivated measures asymptotically approximate the squared matrix exponential weights, but suffer from model-order truncation and spurious correlation detection. The operator-theoretic approach is invariant to mere signal correlation and correctly identifies dynamical pathways, direct and indirect, especially in high-dimensional systems (e.g., linearized Navier–Stokes for fluid flows).

Reduced-order projection (POD, balanced truncation) is central in practically extracting and preserving causal structure—a task balanced truncation accomplishes with superior fidelity over energy-based subspace methods in non-normal systems (Srivastava et al., 9 Jun 2025).

4. Data-Driven Operator-Theoretic Control

Embedding operator approximations into control pipelines enables nonlinear predictive control, optimal feedback synthesis, and even differential games.

Koopman-MPC and Predictive Control: Data-driven predictive control (Koopman–DeePC) leverages Willems’ fundamental lemma, the Hankel matrix structure, and nonlinear lifting (EDMD) to synthesize multi-step predictors. A bilevel formulation allows uncertainty-aware (Wasserstein robust) optimization leveraging Bayesian learnable features, enabling both accurate prediction and robust control within a unified framework (Lian et al., 2021).
Operator-theoretic Differential Games: Both continuous-time resolvent-based (global feedback from the Koopman generator via the resolvent operator) and discrete, data-driven EDMDc-MCP (mixed complementarity problem for open-loop saddle points) approaches have been developed for zero-sum games. These methods handle arbitrary nonlinearities, control and state constraints, and are competitive with analytic solutions in benchmark problems (Bakker et al., 2 Jul 2025).
Stochastic Optimal Control: Coupling Perron–Frobenius duals with Koopman-based policy iteration yields convex and iterative algorithms for feedback design in stochastic systems, with finite-dimensional approximations data-driven via naturally-structured DMD and Galerkin projection over nonnegative bases (Vaidya et al., 2022).

Key trade-offs involve global versus trajectory-wise feedback, scalability, and computational cost. Operator-theoretic methods enable quantitative design and analysis in systems where model-based approaches are infeasible.

5. Operator Learning and Uncertainty Quantification

Learning maps between infinite-dimensional spaces—arising in PDEs and functional regression—admits both neural and Gaussian process operator learning paradigms.

Neural operators (e.g., FNO, DeepONet, IKNO): Parameterize the lifting, evolution, and reconstruction stages to approximate the action of operators between function spaces. Invertible network architectures, such as IKNO, guarantee bijective mappings in latent space and yield reconstruction-free error metrics with resolution invariance due to mode-wise Fourier truncation in the Koopman-inspired latent space (Jin et al., 25 Mar 2025).
Gaussian process operator learning: Operator-valued Gaussian processes approximate the real-valued bilinear forms associated with function-valued operators. Mean functions may be neural-operator parameterizations, and the framework admits robust maximum-likelihood training, uncertainty quantification via posterior variance, and zero-shot prediction through kernel-based interpolation (Mora et al., 2024).

Operator learning for general Lipschitz classes faces severe lower bounds: the metric ε-entropy grows exponentially in ε^{-1} for the full class, forcing curse-of-dimensionality in bit encoding and limiting uniform approximability irrespective of neural architecture. Polynomial parametric complexity is only achieved in subspaces of analytic or Barron-type operators with favorable decay or regularity (Lanthaler, 2024).

6. Extensions: Model Reduction, Global Phase Exploration, and Integration with Modern ML

Perturbation expansions of the Koopman operator yield explicit reduced models for slow–fast, weakly coupled, or partially observed systems, connecting with established empirical closure and Markovian SDE frameworks (Gutiérrez et al., 2020).

Operator-theoretic phase space analysis applies spectral features (invariant subspace decomposition, eigenfunction level sets, global phase stitching) to discover, partition, and fuse models from disparate experimental or simulation regimes. Symmetry and topological conjugacy can be leveraged to lift local models to global ones, guiding experimental design and robust identification in large state spaces (Nandanoori et al., 2021, Nandanoori et al., 2019).

The operator-theoretic perspective is also integrally related to sequential modeling and transformer architectures in machine learning, where dictionary lifting parallels embedding, linear updates correspond to attention, and nonlinear layers approximate closure under continuous spectrum or unresolved dynamics (Mezić, 2023).

7. Applications and Impact

Operator-theoretic and data-driven methodologies underpin advances in power grid monitoring, fluid dynamics, neuroscience, molecular dynamics, robotics, and climate modeling (Sinha et al., 2020, Sharma et al., 2019, Klus et al., 2017). They support high-fidelity prediction, online system identification, optimal and robust control, causal inference, and dimensionality reduction, especially in regimes inaccessible to first-principles modeling or classical state-space identification.

By unifying spectral theory, statistical inference, and scalable computation, the operator-theoretic/data-driven synthesis has redefined scientific computation and control of high-dimensional and nonlinear systems, while continued work addresses theoretical tractability, scaling, and integration with modern ML paradigms.