Data-Driven Transient Growth Analysis

Updated 13 November 2025

Data-driven transient growth analysis is a methodology that quantifies the finite-time amplification of disturbances in fluid flows using empirical data in place of traditional operator-based methods.
Core algorithms include snapshot-based Rayleigh quotient optimization, adjoint-free POD/SVD projection, and statistical frameworks to assess mean amplification under noise.
Practical implementations demonstrate high accuracy and scalability in applications such as bypass transition prediction, flow control, and reduced-order modeling in complex fluid dynamics.

Data-driven transient growth analysis encompasses a suite of methodologies for quantifying and interpreting non-modal disturbance amplification in fluid flows and dynamical systems using empirical data—either from high-fidelity simulations or physical experiments—without direct reliance on governing linearized operators or their adjoints. This paradigm has accelerated the analysis of flows where model-based approaches are infeasible due to high dimensionality, unknown operators, experimental constraints, or significant stochastic effects. It underpins modern research on bypass transition, flow control, reduced-order modeling, and system identification in computation- or data-limited regimes.

1. Mathematical and Conceptual Background

Transient growth refers to the finite-time amplification of disturbances due to the non-normality of linearized (typically Navier–Stokes) evolution operators, even in systems that are asymptotically stable by eigenvalue analysis. The canonical (operator-based) formulation for a discretized dynamical system,

$\frac{dq}{dt} = A\,q,$

yields over a time horizon $t$ the propagator $M(t) = \exp(At)$ and the maximal energy amplification,

$G(t) = \max_{q(0)\neq0} \frac{\|q(t)\|^2}{\|q(0)\|^2} = \sigma_{\max}^2(M(t)),$

where the norm may be kinetic energy or another inner product. Traditionally, one computes $G(t)$ and the associated optimal input/output structures via direct SVD of $M(t)$ or iterative (often adjoint-based) methods. Non-normality ensures that non-orthogonal eigenvectors can superpose for large finite-time gain, enabling substantial amplification even when all eigenvalues are stable.

The data-driven framework recasts this into empirical optimization over observed input/output (I/O) snapshot data, potentially regularized for noise or nonlinearity. This approach can bypass explicit operator assembly, adjoint construction, and linearization constraints.

2. Core Data-driven Algorithms

Several algorithmic strategies for data-driven transient growth analysis are now established:

a. Input-Output Rayleigh Quotient over Snapshots

Given matrices $Q_0$ (inputs at $t=0$ ) and $Q_t$ (outputs after time $t$ ), the data-driven transient gain,

$G_{DD}(t) = \max_{v\neq 0} \frac{v^* Q_t^*WQ_t v}{v^* Q_0^*WQ_0 v},$

maps to the largest squared singular value of $L Q_t B^{-1}$ , where $Q_0^*WQ_0 = B^* B$ (Cholesky) and $W = L^* L$ defines the norm (Kai et al., 3 Jul 2025). This optimization is directly analogous to operator-based singular value problems but is performed entirely in the column space of observed snapshots. Introducing Tikhonov regularization $\mu I$ into $Q_0^*WQ_0$ ensures robustness to measurement or process noise and to nonlinearity in non-ideal data.

b. Adjoint-free POD/SVD Projection Methods

Alternatively, one constructs a subspace basis via proper orthogonal decomposition (POD) from multiple forward-integrated initial fields (e.g., Hermite or random seeds). The propagator $M(\tau)$ is projected onto this subspace, and transient gain optimization is performed within this reduced space, fully bypassing adjoint integrations and operator construction (Wang et al., 23 Sep 2025).

Pseudocode for this "adjoint-free POD-SVD approach" proceeds as:

Integrate $K$ independent initial fields, storing $S$ snapshots per run.
Perform rank- $R$ SVD on each trajectory, aggregate dominant singular vectors to form a basis $X$ .
Orthonormalize $X$ to $V$ , evolve $V$ forward under $M(\tau)$ .
Project the energy norm to subspace, compute SVD, extract optimal gain and modes.
Reconstruct global optimal input/output vectors.

Cost is amortized—once the forward runs are complete, many time horizons $\tau$ can be efficiently explored with only small matrix SVDs.

c. Statistical and Stochastic Generalizations

Recognizing that optimal disturbances are statistically rare for practical inputs, a statistical framework evaluates the mean amplification $\bar{G}(t)$ for random initial conditions with specified two-point covariance $C_0$ ,

$\bar{G}(t) = \frac{\operatorname{Tr}\left(e^{A t}C_0 e^{A^T t} \right)}{\operatorname{Tr}(C_0)},$

and further derives the PDF of the instantaneous gain (Frame et al., 2023). An exponential ansatz for the right-tail of the gain PDF is supported,

$p(G) \approx \gamma e^{-\gamma G}, \;\; G \geq 0,$

where moments of the PDF are analytically matched to empirical snapshots, yielding confidence intervals for realistic, non-optimal disturbances.

Extensions of dynamic mode decomposition (DMD) fit time-varying amplitudes to DMD modes,

$X \approx \Phi A, \;\; A = [a(t_0),a(t_1),\ldots,a(t_M)],$

with $\ell_1$ -sparsity and smoothness regularization to isolate transient modal activations (Tanaka et al., 14 Aug 2025). This framework interprets stagewise transitions and transient structure emergence in high-dimensional datasets.

3. Practical Implementation and Validation

The data-driven transient growth framework has been validated in a range of canonical and applied settings, as detailed below.

Flow/Model	Key Features	Notable Results/Performance
Ginzburg-Landau	Linear/nonlinear, operator and noisy data	Error $<1\%$ vs. operator-based, robust to $>3\%$ measurement noise (Kai et al., 3 Jul 2025, Wang et al., 23 Sep 2025)
Backward-facing step	2D NS, $N\sim5\times10^4$	Gain and modes $<1\%$ error vs adjoint-based; uses $n\sim4500$ POD modes (Wang et al., 23 Sep 2025)
Batchelor vortex	3D, $N\sim2.3\times10^5$	Captures "anti-lift-up" mechanisms, converges within $n\sim4500$ (Wang et al., 23 Sep 2025)
Boundary layer	JHTDB, 3D+t, non-modal growth	Identifies TS and streaks, $G/R_{ex}$ peaks at correct $\beta$ , matches literature (Kai et al., 3 Jul 2025)
Airfoil wakes, cylinder	Experimental and DNS data	Accurately captures transient, saturation, and modal structure (Tanaka et al., 14 Aug 2025, Nakamura et al., 26 Mar 2025)

The canonical computational pipeline consists of assembling input/output snapshot pairs or trajectory ensembles, pre-processing and possibly reducing dimensionality (POD), constructing empirical propagators, and performing Rayleigh-quotient or SVD-based optimization to extract transient gain and associated optimal modes. Regularization (\emph{e.g.} Tikhonov, sparsity, smoothness) should be tuned via cross-validation on held-out data or error metrics such as modal overlap and gain error.

The intrinsic value of data-driven approaches extends beyond merely reconstructing operator-based optimal growth. Data-driven energy budget and transfer analysis leverages Galerkin projections onto bi-orthogonal operator eigenmodes and their adjoints, yielding coupled ODEs for modal amplitudes

$\frac{da_k}{dt} = \sum_{ij}F_{ijk}a_i a_j + \sum_i G_{ik}a_i + H_k,$

and modal energy budgets,

$\frac{dE_k}{dt} = \sum_j T_{kj} + D_k,$

with $T_{kj}$ as energy transfer and $D_k$ as linear production/dissipation. Time-varying DMD with phase control (tDMDpc) extracts transient, non-stationary modes and enables tracking of instantaneous growth rates, modal energies, and spatial energy transfer fields through nonlinear evolution (Nakamura et al., 26 Mar 2025).

For strongly nonlinear or high-amplitude regimes, latent (autoencoder-based) dimensionality reduction, sparse identification of nonlinear dynamics (SINDy), and phase-amplitude reduction approaches quantify and control transient growth in low-dimensional manifolds, including nonlinear phase entrainment and attenuation of vortex-induced load transients (Fukami et al., 1 Mar 2024).

5. Interpretation: Statistics, Real Flow Disturbances, and Limitations

A recurring finding is that optimal transient growth substantially overstates the typical or expected energy amplification observed in stochastic disturbances. Statistical frameworks reveal that the mean gain $\bar{G}$ for isotropically correlated initial conditions can scale only linearly with Reynolds number, $\bar{G}_{\max}\sim Re^1$ , versus the $Re^2$ scaling of optimal growth. The PDFs of amplification are well-approximated by exponential laws, implying extremely low probability of realizing near-optimal growth in real flows with spatially broadband or small-scale initial conditions (Frame et al., 2023). Lengthscale of spatial correlation, disturbance orientation, and inhomogeneity in the data all critically influence realized amplification and resulting transition thresholds.

A limitation in data-driven approaches is the coverage and quality of input/output datasets: for high accuracy, the snapshot subspace must span the dominant growing structures; random-noise seeds may require up to twice as many independent runs as structured (e.g., Hermite or global mode-based) seeds (Wang et al., 23 Sep 2025). Nonlinear regimes and extremely low inertia in aeroelastic applications may require history-dependent or more sophisticated machine learning models rather than static snapshot-based regression (Zhu et al., 2023).

6. Applications, Scalability, and Future Directions

Data-driven transient growth analysis has found utility in:

Bypass transition prediction in shear flows and boundary layers, mapping of non-modal amplification surfaces, and identification of transition-inducing structures (Kai et al., 3 Jul 2025, Jovanović, 2020).
Experimental validation of operator-based theory directly from physical (noisy) measurements (Kai et al., 3 Jul 2025).
Flow-control: disturbance structure targeting and open-loop control waveform screening (Jovanović, 2020).
Nonlinear aeroelastic response and phase/amplitude sensitive mitigation strategies for lift/vibration transients in airfoil/gust interactions (Zhu et al., 2023, Fukami et al., 1 Mar 2024).
Real-time diagnostics and model reduction in turbulent or strongly time-varying environments (Tanaka et al., 14 Aug 2025).

Scaling is favorable: after initial snapshot collection, the computational cost for evaluating optimal gain at multiple time horizons, spatial locations, or parameter values is dominated by small matrix operations (SVD, Cholesky), and all forward simulations are embarassingly parallelizable. Adjoint-free methods radically lower the barrier of entry for large, complex, or experimental systems.

Extensions to flows with richer spectra, more complex boundary conditions, feedback control, or in the presence of colored stochastic excitation are straightforward: the core algorithms generalize to richer bases, block or tensor snapshot structures, or multi-input/multi-output settings.

Statistical generalizations, mode selectivity/regularization, and latent-manifold phase reduction are active research areas, promising to further unify model-based and data-driven perspectives and to enable rigorous transient growth analysis across the full spectrum of applied fluid and dynamical systems science.