Sparse Identification of Nonlinear Dynamics with Control
- SINDYc is a data-driven framework that uses sparse regression to identify parsimonious, interpretable models of nonlinear dynamical systems with explicit control inputs.
- It constructs an augmented library of nonlinear functions for both state and control, employing methods like LASSO and STLS to select active terms and mitigate noise.
- The approach integrates seamlessly with advanced control techniques such as Model Predictive Control, demonstrating improved tracking accuracy and computational efficiency in real-world applications.
Sparse Identification of Nonlinear Dynamics with Control (SINDYc) is a class of data-driven system identification methods designed to discover parsimonious, interpretable models of nonlinear dynamical systems with explicit control or exogenous input terms. SINDYc generalizes the original SINDy framework to include control inputs, thereby enabling model discovery for controlled systems and their subsequent integration into advanced control loops such as Model Predictive Control (MPC). This framework, relying on sparse regression over a candidate library of nonlinear functions of state and input, has demonstrated high data efficiency, interpretability, and computational performance in challenging real-world settings, particularly where first-principles models are unavailable or intractable (Wei et al., 9 Mar 2025, Brunton et al., 2016, Fasel et al., 2021).
1. Mathematical Foundations and Sparse Regression Formulation
The SINDYc framework assumes that the true system evolves according to a controlled nonlinear dynamical law
where is the state and is the control or forcing input (Brunton et al., 2016, Fasel et al., 2021). The objective is to recover a parsimonious model that expresses the dynamics as a sparse linear combination of a pre-specified library of candidate basis functions involving both state and control: Here, is a library matrix whose columns are nonlinear functions (e.g., monomials, cross terms, trigonometric functions) of the state and input; is a sparse coefficient matrix, with active entries corresponding to the selected terms in the model (Wei et al., 9 Mar 2025, Kaiser et al., 2017). The formulation extends naturally to the discrete-time regime by regressing .
Sparsity is critical: the regression problem is regularized or thresholded to ensure that only a minimal set of terms are retained, conferring interpretability and mitigating overfitting in the presence of noise or limited data (Wei et al., 9 Mar 2025). Common algorithms for solving the sparse regression include LASSO (ℓ₁) minimization and iterative methods such as Sequentially Thresholded Least Squares (STLS) (Brunton et al., 2016, Fasel et al., 2021):
with λ controlling the regularization level, or, equivalently, applying hard thresholding to ℓ₀-norm (Wei et al., 9 Mar 2025, Yahagi et al., 7 Mar 2025).
2. Library Construction and Treatment of Control Inputs
A central feature of SINDYc is the explicit augmentation of the regression library to include input and mixed state-input terms (Brunton et al., 2016, Kaiser et al., 2017). The typical polynomial library up to degree two includes:
where "⊗" denotes all pairwise products, thus producing terms such as , 0, etc. Advanced implementations may augment with higher-order polynomials, trigonometric functions, or more problem-specific nonlinearities (Fasel et al., 2021, Abdelsalam et al., 24 Dec 2025, López et al., 25 Apr 2026). For high-dimensional problems or partially observed systems, time-delay embeddings and parameter libraries are incorporated to capture unmeasured states or parameterized bifurcations (Yahagi et al., 7 Mar 2025, Nicolaou et al., 2023).
SINDYc can accommodate both exogenous forcing (treating input as external signals) and feedback control (input as a state-dependent function), although special care (e.g., injection of perturbation signals) is required in the latter to disentangle input effects from state-only dynamics (Brunton et al., 2016).
3. Identification Algorithms and Practical Implementations
The sparse regression is typically accomplished via one of:
- LASSO (ℓ₁-regularized least squares): Convex, widely applicable.
- Sequentially Thresholded Least Squares (STLS): Iterative, combining least squares with thresholding (Wei et al., 9 Mar 2025, Brunton et al., 2016, Kaiser et al., 2017).
- Weak-form SINDY (WSINDYc): Integral formulation using wavelet/test functions, avoiding numerical differentiation, and conferring noise robustness (López et al., 25 Apr 2026).
The selection of λ (regularization) and γ (threshold) hyperparameters is data-driven, frequently employing grid search, cross-validation, or information criteria. Ensemble learning (library bagging, clustering, elite selection) is deployed to enhance reliability in high-noise or limited-data settings, ensuring stable multi-step prediction (Yahagi et al., 7 Mar 2025).
Extensions such as SINDyCP (Nicolaou et al., 2023) separate feature and parameter libraries for systematic discovery of parameter dependencies, and integrate weak-form regression for spatiotemporal–bifurcation problems.
4. Integration with Model Predictive Control and Control Applications
The sparse, analytic form of SINDYc models enables direct integration with Model Predictive Control (MPC) schemes (Wei et al., 9 Mar 2025, Fasel et al., 2021, Kaiser et al., 2017). Once identified, the model 1 is used as the predictive plant model within the MPC optimizer, subject to constraints and costs typical in applied control: 2 Key reported benefits include model compactness (few active terms), computational efficiency in online optimization, and improved prediction/tracking accuracy relative to DMDc or black-box neural models, especially in the low-data regime (Fasel et al., 2021, Kaiser et al., 2017).
Demonstrated real-world applications include coordinated ramp metering in large traffic networks (achieving occupancy deviation of 7.51% vs. 8.31–20.64% for prior methods; throughput improvement of +1,999 vph) (Wei et al., 9 Mar 2025), SEIR infectious disease control, diesel engine airpath control, drone and plasma boundary MPC, and hybrid reinforcement learning frameworks (e.g., SINDy-TD3 for sample-efficient policy training) (Abdelsalam et al., 24 Dec 2025, López et al., 25 Apr 2026, Yahagi et al., 7 Mar 2025).
5. Robustness, Extensions, and Performance Metrics
Noise is a significant practical concern in identification. Weak-form SINDYc (WSINDYc) and its ensemble variants employ test-function convolutions or wavelets to filter measurement noise, showing superior performance in large-scale and high-noise regimes. For instance, in tokamak plasma boundary tracking, WSINDYc achieves a median tracking error of ≈2% at noise levels where SINDYc >5% (López et al., 25 Apr 2026).
Ensemble methods, such as library bagging with elite selection (R² > 0.9), clustering, and convex averaging, further improve multi-step robustness in noisy industrial systems (e.g., diesel airpath models reliable up to 20% additive measurement noise) (Yahagi et al., 7 Mar 2025).
Quantitative performance criteria include coefficient of determination R² on both derivative and multi-step rollout prediction, tracking error, constraint compliance, throughput, convergence rate, and computational time per MPC iteration (Wei et al., 9 Mar 2025, López et al., 25 Apr 2026).
6. Comparative Analysis and Limitations
SINDYc's core advantages include data efficiency, interpretability, and runtime performance. However, recent benchmarking shows that under moderate to high measurement noise, ARGOSc outperforms SINDYc across multiple metrics (MSE, R²) on forced nonlinear systems, a result attributed to explicit denoising, adaptive LASSO weighting, thresholding, and bootstrap inference absent in canonical SINDYc pipelines (Javadi et al., 11 Sep 2025).
Limitations of SINDYc pertain to noise sensitivity (especially with direct numerical differentiation), potential overfitting in high-dimensional libraries, and challenges in automatic model selection for partially observed or highly nonlinear environments (López et al., 25 Apr 2026, Yahagi et al., 7 Mar 2025).
7. Outlook and Future Directions
Emerging directions involve fully online identification (adaptive SINDYc/WSINDYc), systematic approaches to PDE and bifurcation-system identification (e.g., SINDyCP), and robustification for high-noise, high-dimensional, and partially observed settings (Nicolaou et al., 2023, López et al., 25 Apr 2026). Integrations with learning-based or hybrid reinforcement learning architectures (e.g., Dyna-style SINDy–TD3) are active areas enhancing sample efficiency and control reliability in nonlinear systems (Abdelsalam et al., 24 Dec 2025).
Continued methodological developments focus on the automation of library selection, improved noise handling (e.g., weak formulations, ensemble methods), and rigorous validation of extrapolation capabilities across parameter regimes relevant to bifurcation theory and pattern formation (Nicolaou et al., 2023, Yahagi et al., 7 Mar 2025, López et al., 25 Apr 2026).