Hankel DMD with Control (HDMDc)

Updated 13 November 2025

HDMDc is a method that constructs a linear Koopman operator approximation using Hankel matrices formed from delay-embedded state and input data.
It enables robust prediction and uncertainty quantification for nonlinear, memory-dependent systems, as demonstrated in marine and fluid-structure applications.
The approach efficiently balances computational cost with accuracy through SVD truncation, regularized regression, and ensemble-based uncertainty assessments.

Hankel Dynamic Mode Decomposition with Control (HDMDc) is a data-driven, equation-free methodology for the identification and reduced-order modeling of externally forced nonlinear dynamical systems. By embedding time-delayed measurements of both the system state and control/forcing signals, HDMDc constructs a linear approximation of the system’s Koopman operator in an enlarged Hankel space. This approach has enabled robust, computationally efficient prediction and uncertainty quantification for complex systems—including marine vessels in stochastic sea states—without requiring explicit knowledge of governing equations.

1. Mathematical Foundations and Problem Formulation

Consider a continuous-time Markovian system with $n$ -dimensional state $x_j \in \mathbb{R}^n$ and $q$ -dimensional input $u_j \in \mathbb{R}^q$ at discrete times $j$ . Traditional DMDc seeks a best-fit linear mapping from current-state/input pairs to the subsequent state. HDMDc generalizes this by encoding $s$ delayed states and $z$ delayed inputs into an augmented observable:

$\hat{x}_j = \begin{bmatrix} x_j \ x_{j-1} \ \vdots \ x_{j-s} \end{bmatrix} \in \mathbb{R}^{n(s+1)}, \qquad \hat{u}_j = \begin{bmatrix} u_j \ u_{j-1} \ \vdots \ u_{j-z} \end{bmatrix} \in \mathbb{R}^{q(z+1)}.$

The corresponding Hankel matrices aggregate these extended observables across all available sample indices, resulting in data arrays of dimensions $n(s+1) \times m$ and $q(z+1) \times m$ , where $m$ is the effective number of snapshots.

The central regression equation in Hankel space is:

$\hat{x}_{j+1} = \widehat{A} \hat{x}_j + \widehat{B} \hat{u}_j,$

or, writing $\hat{y}_j = [\hat{x}_j; \hat{u}_j]$ ,

$\hat{x}_{j+1} = \widehat{G} \hat{y}_j, \quad \widehat{G} = [\widehat{A}\;\widehat{B}],$

so that system trajectories are approximated by a forced linear map in delay-embedded space (Palma et al., 5 Nov 2025, Palma et al., 6 Nov 2025, Palma et al., 17 Feb 2025).

By this construction, HDMDc implicitly approximates a finite-dimensional forced Koopman operator, enabling the representation of nonlinear, memory-dependent, and input–output couplings in a linear reduced-order form.

2. Algorithmic Workflow

The canonical HDMDc procedure is as follows:

Data Preprocessing: Time series for $x_j$ and $u_j$ are standardized (z-scored) to unit variance, which mitigates ill-conditioning during subsequent regression.
Hankel-Stacking: Construct Hankel-embedded matrices for states and inputs as described above, often with a padding strategy (e.g., zero-padding) to address early time indices.
Data Matrix Construction: Stack all delayed observables into a "joint" matrix,

$\widehat{Y} = \begin{bmatrix} X \ S \ U \ Z \end{bmatrix}, \quad \widehat{X}' = \begin{bmatrix} X' \ S' \end{bmatrix}$

where $X$ , $S$ , $U$ , $Z$ denote instantaneous and delayed state and input segments.

Singular Value Decomposition (SVD) and Truncation: Compute the (thin) SVD of $\widehat{Y}$ , $U\Sigma V^T$ , and retain the leading $r$ modes, with rank selection via energy or IQR thresholds.
Tikhonov-Regularized Least Squares: Solve

$\min_{\widehat{G}} \| \widehat{X}' - \widehat{G}\widehat{Y}\|_F^2 + \lambda \|\widehat{G}\|_F^2$

for a regularization parameter $\lambda$ , which is either fixed or tuned by cross-validation.

The closed-form solution is:

$\widehat{G} = \widehat{X}'\,\widehat{Y}^T \left( \widehat{Y} \widehat{Y}^T + \lambda I \right)^{-1}$

or, after SVD truncation and projection, in the reduced $r$ -dimensional subspace.

Prediction and Recovery: For a given initial Hankel-embedded state $\hat{x}_0$ , recursively apply

$\hat{x}_{k+1} = \widehat{A} \hat{x}_k + \widehat{B} \hat{u}_k,$

extracting the first $n$ entries of $\hat{x}_k$ as the system’s predicted state.

This workflow is compactly summarized in several recent studies (Palma et al., 5 Nov 2025, Palma et al., 17 Feb 2025, Palma et al., 6 Nov 2025).

3. Uncertainty Quantification: Bayesian and Frequentist Extensions

HDMDc's original formulation is deterministic, with hyperparameters $(l_{tr}, s, z, \lambda)$ fixed by manual or cross-validated search. To quantify model-form and training-set uncertainty, two ensemble variants have been introduced:

Bayesian HDMDc (BHDMDc):

Treat hyperparameters as random variables, e.g., $l_{tr}\sim p(l_{tr})$ , $s\sim p(s)$ , $z\sim p(z)$ , $\lambda\sim p(\lambda)$ . For each Monte Carlo sample, a model is fit and used to generate predictions. Ensemble-averaged statistics yield the predictive mean and standard deviation:

$\mu_{x}(t) = \mathbb{E}[ \tilde{x}(t) ], \quad \sigma_{x}(t)^2 = \mathbb{E}\left[ (\tilde{x}(t) - \mu_{x}(t))^2 \right].$

In practice, uniform priors over ranges identified via deterministic sweeps are used, with $N \sim 100$ samples sufficient for robust estimates (Palma et al., 5 Nov 2025, Palma et al., 17 Feb 2025, Palma et al., 6 Nov 2025).

Frequentist HDMDc (FHDMDc):

Hold the hyperparameters fixed and train over multiple subsets of the available training data (e.g., $n_f$ experimental runs). Predictions over all ensemble members are then aggregated:

$\mu_{x}(t) = \frac{1}{n_f} \sum_{i=1}^{n_f} x_i(t), \quad \sigma_{x}(t)^2 = \frac{1}{n_f-1} \sum_{i=1}^{n_f} (x_i(t) - \mu_{x}(t))^2.$

This approach provides a Frequentist uncertainty estimate reflecting finite-sample/data variability (Palma et al., 6 Nov 2025).

Both methods facilitate construction of uncertainty bands, empirical probability density functions (via moving-block bootstrap), and statistical distances (e.g., Jensen–Shannon divergence) to compare predicted vs. observed distributions.

4. Practical Implementation: Computational and Modeling Considerations

The dimension of the Hankel-embedded space grows rapidly with the number of delays; $p = n(s+1) + q(z+1)$ , so practical choices of $s$ , $z$ are governed by memory, overfitting risk, and available data. For typical engineering applications ( $n, q \lesssim 10$ , $s, z \leq 10$ ), the SVD and regression can be computed in $\mathcal{O}(\min(p m^2, m p^2))$ time, with $m \sim$ \text{hundreds} yielding sub-second training times on commodity CPUs (Palma et al., 17 Feb 2025).

Regularization ( $\lambda$ ) is essential for stabilization, given the increased collinearity and noise sensitivity at large $s$ and $z$ . SVD truncation improves conditioning and filters noise, with $r$ set via validation. Data should be standardized prior to embedding.

No eigenvalue stabilization is enforced in standard HDMDc; naive application may yield spurious unstable dynamics with poor hyperparameter selection. Extensions with explicit spectral radius constraints, as in cOptDMDc or variants inspired by (Sakib et al., 13 Aug 2024), offer improved long-term stability and are the subject of ongoing research.

A plausible implication is that delay embedding enables nonparametric capture of nonlinear system responses (memory effects, non-instantaneous couplings) within a tractable, interpretable linear framework, provided embedding and rank-selection are performed judiciously.

5. Benchmark Applications

HDMDc has been applied in several recent studies for identification and forecasting in fluid-structure interaction and marine engineering contexts:

System Identification of Ships in Irregular and Regular Waves:

Application to Codevintec CK-14e ASV under moored conditions demonstrated that HDMDc and its Bayesian variant can accurately forecast surge, heave, and pitch motions (ANRMSE $\lesssim$ 0.1–0.2), with BHDMDc modestly reducing the dispersion and providing statistically valid uncertainty intervals (Palma et al., 5 Nov 2025).

Prediction of High-Speed Catamaran Seakeeping:

For the Delft 372 model, an ensemble-based FHDMDc yielded predictions whose probability density functions matched both experimental and URANS-based estimates (JSD $\lesssim 0.01$ for kinematic variables). BHDMDc did not outperform deterministic models for this test case, suggesting context-dependence of UQ strategy effectiveness (Palma et al., 6 Nov 2025).

Free-Running 5415M Ship in Irregular Sea States:

Both deterministic and Bayesian HDMDc robustly identified the 6-dof ship motion system using only 1–2 wave-periods of data. Forecasts over 15-encounter-horizon windows in severe sea states (near roll-resonance) remained stable and accurate, with moving-block bootstrap JSD compared to CFD below 0.01 per variable (Palma et al., 17 Feb 2025).

Typical errors, computational costs, and ensemble-predicted uncertainty intervals are summarized in the following table:

System/Study	Mean NRMSE	JSD (PDf match)	Training Time
Moored ASV (Palma et al., 5 Nov 2025)	0.1–0.2 (kinem.)	$\lesssim 0.01$	$<1$ s (CPU)
Catamaran (Palma et al., 6 Nov 2025)	$<0.1$	$0.01$	$<1$ s
5415M hull (Palma et al., 17 Feb 2025)	$0.07$	$0.01$	$<0.1$ s

Values for JSD indicate strong statistical match between prediction and observed/CFD data.

6. Limitations and Extensions

HDMDc’s strengths lie in its fully data-driven, nonparametric, and equation-free nature, effective handling of memory and nonlinearity via delay embedding, and efficient training. Limitations include sensitivity to hyperparameter selection (embedding lags $s,z$ , training window $l_{tr}$ , model rank $r$ ), lack of built-in eigenvalue constraints (potentially unstable spurious modes), and increased computational cost for very large $s,z$ . High-dimensional systems may also present challenges if data are insufficient to resolve all relevant dynamical modes.

Potential extensions comprise:

Constrained DMDc formulations that directly enforce spectral radius stability,
Integration with EDMD for richer, possibly nonlinear, dictionary-based embeddings,
Use of parametric or interpolative ROM techniques to capture regime dependence,
Coupling with real-time digital twins and sensor networks (e.g., wave radar) for forecasting in operational scenarios (Palma et al., 17 Feb 2025, Palma et al., 6 Nov 2025).

7. Context Within the Koopman Operator Research Landscape

HDMDc offers a systematic, theoretically grounded method for approximating the action of the forced Koopman operator via a data-driven, linear-in-delay construction. Unlike black-box neural surrogates, HDMDc yields compact ROMs interpretable in terms of dynamic modes and eigenvalues, and is amenable to uncertainty quantification via ensemble (Bayesian or Frequentist) methods.

Connections to recent research on noise-robust, stable Koopman operator learning with Polyflow observables (Sakib et al., 13 Aug 2024) suggest that Hankel-style delay coordinates and dictionary-informed embeddings are converging themes in robust data-driven modeling and control of nonlinear systems. Extensions introducing stabilized operator parameterizations (eigenvalue placement), progressive rollout loss, and iterative data augmentation further enhance model fidelity in finite and noisy data regimes.

A plausible implication is that, for externally forced, partially observed, and strongly nonlinear systems (such as marine vessels in stochastic sea states), HDMDc and its Bayesian/Frequentist variants currently represent a leading class of nonparametric, uncertainty-aware reduced-order modeling tools.