Data-Driven Receding Horizon Control
- Data-Driven Receding Horizon Control is a method that synthesizes control laws using data-driven estimators and receding horizon strategies to manage constraints in stochastic systems.
- It leverages affine saturated feedback policies and convex finite-horizon optimization to ensure feasibility and robust performance under hard input bounds and disturbances.
- The approach employs drift constraints and extended Kalman filtering to guarantee mean-square boundedness and maintain computational tractability for real-time applications.
Data-Driven Receding Horizon Control (DDRHC) refers to a class of control methodologies for constrained dynamical systems in which the control law is synthesized in a receding (i.e., moving finite) horizon fashion, leveraging explicit representations or estimators built directly from data, rather than from first-principles models. This paradigm is particularly powerful for stochastic discrete-time systems with hard input constraints and partial state observation, as it enables tractable on-line computation, robustness to persistent disturbances, and formal closed-loop stability and boundedness guarantees even under nonideal measurement conditions. DDRHC frameworks, as exemplified by (Hokayem et al., 2010), combine convex finite-horizon optimization over affine parameterizations of feedback policies, mean-square-optimal state estimation via extended Kalman filters, and principled constraint and stability enforcement, thereby yielding systems whose performance and feasibility properties are certified rigorously under standard controllability and observability assumptions.
1. Control Policy Architecture and Optimization
DDRHC restricts the class of admissible feedback policies to those with affine “finite-memory” dependence on the recent history of (output) measurements, encoded as
where indexes the prediction horizon, is an open-loop affine term, each is a feedback gain matrix, and is a (possibly saturating) nonlinearity applied elementwise to the innovation . This “lifted” structure renders the mapping from decision variables to the entire control sequence affine, provided is fixed (e.g., as a bounded piecewise-linear function). Combined with a convex quadratic cost function and polytopic (e.g., max-norm) input constraints,
the resulting finite-horizon control problem is a convex quadratic program (QP) or quadratically constrained QP (QCQP), ensuring tractability and successive feasibility for on-line implementation.
Successive feasibility and stability are enforced via a “drift” or “recursive feasibility” constraint that mandates the existence (within the admissible set) of a candidate control sequence capable of driving the marginal (potentially unstable or non-contracting) subsystems toward a Lyapunov decrease, even as noise persists. The explicit form of such drift constraints (see (Hokayem et al., 2010), eq. (C4)) ties the ability to recover feasibility and boundedness directly to input authority and the structural properties of the system’s Jordan decomposition.
2. State Estimation under Nonlinear Feedback
Despite the induced nonlinearity of the control law (owing to the innovation saturation, ), the state estimator architecture is shown to admit a decoupled extension of the classical Kalman filter. Given the output sequence , the conditional state estimate remains Gaussian and is propagated via
where is the (nonlinear) control function of the measurement history, but the mapping of innovations into the control space is in fact independent of the optimizer’s decision variables. This crucial separation allows the feedback design to be performed over convex parameterizations, while the estimator remains mean-square optimal.
3. Enforcing Hard Input Bounds in Stochastic Settings
Systems subject to unbounded stochastic noise (Gaussian w, v) require careful treatment of control signal magnitude. Even modest amplification of the state via linear feedback may, in the presence of large but rare disturbances, violate any prespecified bound. DDRHC incorporates this consideration at the policy design stage by:
- Saturating innovations through the function to ensure control policies cannot amplify large estimation errors or outliers indefinitely,
- Imposing convex hard bounds on the sum of open-loop and feedback policy contributions, accounting for the worst-case amplification due to saturation (i.e., bounding by ),
- Expressing the maximum input constraint equivalently in terms of a lifted linear function of decision variables, preserving problem convexity.
This robust constraint handling ensures the controller always computes an admissible input sequence, irrespective of the online realization of innovation signals.
4. Stability: Mean-Square Boundedness under Noise
In stochastically forced systems, classical asymptotic stability is typically unattainable due to persistent excitation by noise. Instead, the framework guarantees mean-square boundedness: there exists such that
under four core assumptions: (i) process and measurement noise are Gaussian, (ii) the system matrix is discrete Lyapunov-stable (modulus of eigenvalues , and equality only for simple eigenstructure), (iii) is stabilizable, and (iv) is observable.
A drift or Lyapunov decrease constraint is imposed on the marginally stable component (block with eigenvalues on the unit circle), ensuring that, for large norm, the bounded controller can “return” the state with sufficiently high probability, thus precluding state blowup. The feedback policy is constructed to satisfy this drift condition, providing formal guarantees of boundedness even when control authority is limited by .
5. Computational Tractability: Offline and Online Phases
To ensure that the stochastic convex QP/QCQP remains scalable and feasible for use in applications with fast sampling or high dimensionality, the method capitalizes on the time-invariance of key terms. Critical second-moment matrices (such as , , etc.) depend only on the (eventually time-invariant) estimation error covariance, which, under standard Riccati recursion, converges to a unique limit . Hence, these matrices can be precomputed or tabulated offline as functions of (or replaced with their steady-state forms), greatly reducing the online computational cost. The result is a QP or QCQP whose dimension is determined only by the horizon length and the memory in the policy, readily solved with standard convex solvers.
6. Practical Demonstrations and Numerical Performance
A numerical example involving a three-dimensional system with both stable and marginally stable dynamics, torsional noise amplification, and saturated feedback illuminates the theoretical results. With hard-bounded actuator constraints, innovation saturation, and quadratic cost, the receding horizon controller maintains state trajectories within bounded regions for all simulated noise realizations (sample size: 200 trajectories, 200 steps each). Empirical metrics of performance (mean and deviation of state norm, per-step cost evolution) confirm that the closed-loop system exhibits the predicted mean-square boundedness, and the computational tractability of the method is verified via standard solvers.
7. Significance in Data-Driven and Stochastic RHC Research
The framework in (Hokayem et al., 2010) integrates expectation-based and data-driven techniques with receding horizon control for stochastic discrete-time systems subject to process/measurement noise and tight control bounds. Key features include:
- Affine saturated feedback policy parameterizations leading to convex optimization,
- Extended Kalman filtering for compatible nonlinear feedback structures,
- Explicit handling of hard control constraints,
- Formal mean-square boundedness guarantees under mild assumptions,
- Efficient offline/online computational separation for practical implementation,
- Numerical validation supporting scalability and robust constraint satisfaction.
These results substantially advance the application of data-driven and expectation-informed strategies in stochastic RHC, offering a rigorous basis and efficient algorithms for robust, constrained optimal control in real-world uncertain environments.