Data-Driven SLS Approaches
- Data-driven SLS is a set of methods that harness empirical data to synthesize closed-loop control policies without explicit system modeling.
- It utilizes Hankel matrix constructions and convex optimization to directly enforce performance and safety constraints from observed trajectories.
- These approaches are scalable for distributed, output-feedback, and safety-critical applications, matching model-based performance under rich data.
Data-driven SLS (System Level Synthesis) approaches refer to a family of methods that formulate and solve control, estimation, optimization, or detection problems by directly leveraging empirical data—typically closed- or open-loop trajectories—without requiring explicit parametric identification of system models. These strategies apply to distributed and centralized control, output-feedback under perception uncertainty, scheduling, safety-critical scenario generation, and even deep neural representation selection. Common to all data-driven SLS variants are: (i) parameterizations of closed-loop behaviors (e.g., response maps or policy operators), (ii) constraints and performance metrics imposed directly on these parametrizations, and (iii) data-centric algorithms (often convex or quasi-convex) yielding statistically or deterministically certified guarantees.
1. Foundational Principles of Data-Driven SLS
Classical SLS is a model-based framework parameterizing all achievable closed-loop system responses (e.g., in a finite-horizon LTI setting) subject to causality, structural constraints, or locality. The transition to data-driven SLS is achieved via theoretical results such as Willems’s Fundamental Lemma, which establishes that any admissible trajectory for an unknown linear system can be expressed as a linear combination of past measured trajectories, provided the inputs are persistently exciting of sufficiently high order (Xue et al., 2020). This allows the replacement of model-based achievability constraints—expressed in terms of unknown system matrices—by affine relations on Hankel matrices constructed from data (Schüepp et al., 2 Apr 2025, Alonso et al., 2021, Xue et al., 2020).
In the nominal, noise-free case, data-driven SLS coincides exactly with model-based SLS both in feasibility and optimality: the solution set of the data-driven convex program is identical to the solution set of the model-based problem under sufficient data richness and excitation (Xue et al., 2020, Schüepp et al., 2 Apr 2025). In the stochastic or adversarial noise regime, robustification is essential, and sample complexity results are explicit in terms of noise level, state/input dimensions, and the excitation order (Xue et al., 2020, Li et al., 2020).
2. Data-Driven SLS Workflow and Methodologies
The canonical data-driven SLS workflow for linear or switched-linear systems proceeds as follows:
- Data Collection: Gather open-loop or closed-loop trajectories under persistently exciting (often random) inputs for the required horizon, with or without measurement noise (Xue et al., 2020, Li et al., 2020, Alonso et al., 2021).
- Hankel Construction: Assemble block Hankel matrices of prescribed lag/length (Xue et al., 2020, Schüepp et al., 2 Apr 2025).
- Trajectory Parameterization: Parameterize candidate closed-loop responses as where obeys linear “matching” constraints (e.g., ) (Xue et al., 2020, Alonso et al., 2021).
- Optimization: Pose the control or estimation problem as a convex (QP/LP/SOCP) program over or its block-structured variants, subject to performance objectives and structural or locality constraints (Schüepp et al., 2 Apr 2025, Alonso et al., 2021).
- Robustification: In the presence of disturbance or process noise, robustify the responses via worst-case (distribution-free) or high-probability bounds—often relying on empirical estimate concentration inequalities and quasi-convex surrogates for tractability (Xue et al., 2020, Li et al., 2020, Sarkar et al., 2019).
An archetypal pseudocode instance is provided in (Li et al., 2020), blending ridge regression identification, statistical confidence radii, and robust control synthesis using convex optimization.
3. Applications and Theoretical Guarantees
Data-driven SLS realizes a broad array of applications:
- Model Predictive Control (MPC): Both centralized and distributed/localized MPC can be fully recast in a data-driven SLS form, with closed-loop maps optimized over data libraries. Extensions to time-varying affine policies yield convexity and exact equivalence to model-based MPC with sufficient sample length (Schüepp et al., 2 Apr 2025, Alonso et al., 2021).
- Distributed Control: By imposing d-locality constraints on the SLS response maps, distributed data-driven SLS supports scalable synthesis for large-scale interconnected systems. Notably, required trajectory length and data complexity depend only on local neighborhood size and not on global system dimension (Alonso et al., 2021).
- Output-Feedback and Perception: VISION-SLS integrates learned low-dimensional visual representations (with calibrated error envelopes) into a robust SLS parameterization to deliver certified-safe control from RGB images in nonlinear, partially observed domains (Leeman et al., 27 Apr 2026).
- Switched and Nonlinear Systems: OLS and Hankel-type formulations with data-driven order selection and balanced truncation extend data-driven SLS to switched linear systems of unknown order, with explicit finite-sample error bounds and convergence guarantees (Sarkar et al., 2019).
- Non-control Domains: Sensitive-Layer-Select (SLS) classifiers learn data-driven layer-weightings in transformer-based speech deepfake detectors, yielding significant performance gains in SVDD tasks (Zhang et al., 2024). SmartLLMs Scheduler employs a data-driven SLS principle for dynamic task–model assignments in LLM serving (Liu et al., 5 Aug 2025).
- Safety-Critical Scenario Generation: BridgeGen’s SLS module operates as a data-driven optimizer (including RL-based solvers) over parameter spaces defined by data and knowledge, mining for high-criticality ADV test scenarios (Hao et al., 2023).
Performance and stability theorems are explicit: robust data-driven SLS can guarantee closed-loop stabilization with suboptimality where is the data-driven uncertainty radius, given by explicit matrix concentration or empirical bounds. Feasibility and optimality coincide with model-based analogs under PE and sufficient data (Li et al., 2020, Schüepp et al., 2 Apr 2025, Xue et al., 2020).
4. Computation, Scalability, and Locality
Data-driven SLS approaches are highly scalable due to two factors: (i) convex/quasi-convex optimization structures that leverage locality, structure, or separability; and (ii) parametrizations with variables whose number scales with either trajectory length or local subsystem dimension, but not global system size (Alonso et al., 2021, Li et al., 2020). Distributed ADMM algorithms, as in D³LMPC, exploit problem separability for practical implementation.
In robust versions, matrix–concentration inequalities and trajectory averaging further decrease the data demand for a desired level of statistical robustness. For networked MPC, per-step runtime remains per subsystem when enforcing d-locality, and distributed implementations communicate only with neighboring nodes (Alonso et al., 2021, Li et al., 2020).
For high-dimensional learning (e.g., SLS-based classifier heads or vision-based output-feedback), computational tractability is maintained via efficient reductions (e.g., Riccati recursions in VISION-SLS (Leeman et al., 27 Apr 2026)) and block-structured QP/LP solvers (Schüepp et al., 2 Apr 2025).
5. Empirical Performance and Case Studies
Empirical studies corroborate the efficacy of data-driven SLS:
- Power Grid Example: For a layered SLS/MPC controller over a 25-node swing network, robust data-driven SLS achieved model estimate error 0 (with 1, 2), per-node 3 complexity, and closed-loop costs within 4 of the ideal centralized MPC (versus 5 for model-based). Stability was robust to actuator saturation and held across 30 randomized trials (Li et al., 2020).
- Distributed MPC: D³LMPC on a 64-subsystem chain showed state and cost trajectories identical to model-based DLMPC. Runtime per subsystem remained nearly invariant as 6 increased (Alonso et al., 2021).
- Vision-based Control: VISION-SLS demonstrated 7 safety on light-dark and quadrotor benchmarks, large reductions in constraint violation rate compared to non-robust baselines, and real-time execution in hardware (Leeman et al., 27 Apr 2026).
- Transformer Layer Selection (SLS Classifier): Data-driven SLS classifier reduced EER from 16.10% (baseline) to 2.32% in singing voice deepfake detection (Zhang et al., 2024).
- LLM Scheduling: On log parsing, SLS scheduling cut average processing cost by 8 and time by 9, with accuracy improvements, demonstrating extreme sample- and cost-efficiency (Liu et al., 5 Aug 2025).
- Scenario Generation: BridgeGen’s PPO-based data-driven SLS improved the fraction of critical scenarios in CARLA, reducing minimum inter-vehicle distance and converging faster than random or PSO baselines (Hao et al., 2023).
6. Limitations, Extensions, and Open Problems
Current limitations include the dependence of robust guarantees on PE and data richness, the need for (possibly labeled) inspection intervals for updating caches or retraining predictors in scheduling, and computational increases in handling large trajectory or parameter spaces for high-dimensional or hybrid systems (Li et al., 2020, Liu et al., 5 Aug 2025, Hao et al., 2023). Large-scale cache management, nearest neighbor lookup, and sample complexity for outputs with complex nonlinearity remain active research areas.
Notable extensions include output-feedback SLS with learned state abstractions, integration with nonconvex perception systems, and hybrid symbolic–data approaches bridging knowledge-driven design and empirical optimization (Leeman et al., 27 Apr 2026, Hao et al., 2023). Recent advances in data-driven SLS for affine control policies have demonstrated full equivalence with model-based MPC given sufficient data (Schüepp et al., 2 Apr 2025). As data-driven SLS continues to generalize, further unification with learning-based, behavioral, and distributionally robust frameworks is anticipated.
7. Summary Table of Major Data-Driven SLS Approaches
| Domain | Method/Data Ref. | Highlights |
|---|---|---|
| LTI/LTV control | (Xue et al., 2020, Li et al., 2020) | Exact/robust SLS via Hankel, model-free, explicit 0 bounds |
| Affine policies/MPC | (Schüepp et al., 2 Apr 2025) | Data-driven affine SLS, full equivalence to MPC |
| Distributed MPC | (Alonso et al., 2021) | D³LMPC, localizes trajectory demand, distributed ADMM |
| Vision-based output feedback | (Leeman et al., 27 Apr 2026) | VISION-SLS, safe nonlinear control from high-dim. perception |
| Switched systems | (Sarkar et al., 2019) | Data-driven Hankel+truncation, unknown-order, statistical bounds |
| LLM scheduling | (Liu et al., 5 Aug 2025) | SmartLLMs Scheduler, data-driven cache+predict+schedule |
| Transformer classifier | (Zhang et al., 2024) | Sensitive-Layer-Select, layer weighting, SOTA in SVDD |
| Scenario generation | (Hao et al., 2023) | BridgeGen, PPO/PSO optimization, critical scenario mining |
Data-driven SLS approaches thus constitute a flexible, scalable, and theoretically supported paradigm for synthesizing high-performance closed-loop, optimization, or inference solutions directly from data, with systematic robustness and localization principles and wide applicability across control, perception, and learning domains.