Dual Linear-Quadratic Regulator
- Dual LQR is a reformulation of the classical LQR that uses dual variables, RKHS, and convex optimization to improve regularity, robustness, and adaptive control.
- It employs reproducing kernel Hilbert space dynamics and linear-conic duality to derive Riccati equations and feedback laws, ensuring performance via semidefinite programming.
- The framework unifies model-free algorithms, stochastic adaptive control, and Koopman operator methods, offering clear trade-offs between exploration and exploitation.
The dual Linear-Quadratic Regulator (dual LQR) encompasses a spectrum of theory and techniques that reinterpret the classical linear-quadratic optimal control problem through the lens of duality, reproducing kernel Hilbert spaces, convex optimization, and exploration-exploitation trade-offs. This perspective unifies primal and dual methods, illuminates fundamental regularity and robustness properties, and bridges modern advances in machine learning, robust control, and adaptive regulation.
1. Duality in Linear-Quadratic Regulation
The classical (primal) finite-horizon LQ optimal control problem considers the linear time-varying system
with quadratic cost
where , , . The optimal value is quadratic in , , with solving the Riccati differential equation
Dual LQR refers to alternative formulations, both continuous- and discrete-time, in which dual variables (multipliers, kernel values, covariance matrices) encode structural constraints or dualize the optimality system, and the Riccati equation emerges as a KKT or extremality condition of a dual semidefinite program or kernel evolution (Aubin-Frankowski, 2020, Bamieh, 2024, Watanabe et al., 14 Mar 2025, Lee et al., 2018).
2. Reproducing Kernel Hilbert Space View: Dual Riccati Dynamics
Aubin-Frankowski (Aubin-Frankowski, 2020) provides an RKHS-based dual formulation. Here, the set of all admissible trajectories 0 for which there exists a control 1 with 2 and 3 is endowed with the inner product
4
This makes 5 into a reproducing kernel Hilbert space; its reproducing kernel 6 encodes the evolution of values 7. The diagonal 8 is shown to be 9.
As the initial time changes, the RKHS and kernel naturally evolve; differentiation yields the dual Riccati equation
0
where 1. This dual Riccati governs the evolution of the kernel's diagonal and suggests offline solution representations, sparsification benefits, and connections to Gaussian covariance kernels (Aubin-Frankowski, 2020).
3. Linear-Conic Duality and Covariance-Based Dual LQR
The linear-conic duality approach (Bamieh, 2024) represents the LQR problem as an infinite-dimensional SDP over the state-control Gramian (outer product) 2, 3, with dynamics and positivity constraints. The dual variable is a time-varying matrix arc 4. The dual program maximizes 5 subject to the DLMI
6
The Riccati equation is recovered as the extremal point of the DLMI constraint, while complementary slackness yields the state feedback law 7. This perspective unifies feedback synthesis and cost-to-go as optimal dual variables, with complementary perspectives for extensions to IQC and robust synthesis (Bamieh, 2024).
4. Dual Control: Exploration, Adaptation, and Robust Learning
Modern dual LQR theory includes the adaptive, learning, and robust regulation settings, where uncertainty in 8 necessitates online identification and controlled exploration.
Rantzer (Rantzer, 2023) formulates the adaptive LQR as a data-driven control problem: the certainty-equivalent policy 9 is updated online via a data-driven Riccati equation
0
where 1 and 2 are empirical sample covariance matrices, and 3 is recomputed as the minimizer. Explicit bounds on the spectral properties of 4 yield certified margins for stability and robustness, enabling precise trade-offs between excitation (for exploration) and closed-loop stability (Rantzer, 2023).
Aubin-Frankowski and colleagues (Lu et al., 2021) provide almost-sure performance guarantees for online dual-control algorithms by employing a risk-shielded switched controller: exploration with decaying noise is periodically interrupted by a conservative fallback to prevent destabilization. Performance and estimation errors decay at nearly optimal rates 5, and parameter updates are based on cross-correlation estimation of Markov parameters. This eliminates any fixed probability of catastrophic failures and is robust to non–sub-Gaussian disturbances (Lu et al., 2021).
5. Primal-Dual Algorithms and Model-Free Dual LQR
The primal-dual perspective treats the LQR as a nonconvex program (over policy gains 6), whose hidden convexity is revealed by Lyapunov lifting and strong duality via semidefinite programming (Lee et al., 2018, Watanabe et al., 14 Mar 2025, Li et al., 2021).
Formulations such as:
- The block-partitioned dual SDP: maximizing trace-liners over block matrices subject to an LMI,
- KKT conditions recovering the Bellman and Riccati equations,
- Model-free dual updates (parameterizing the Q-function or value function by a symmetric matrix, updating via temporal-difference errors),
embed the LQR in the landscape of saddle-point optimization, primal-dual Q-learning, and robust controller synthesis. The convergence of these algorithms is certified by stochastic approximation and strong duality; the unique KKT points correspond to the stabilizing Riccati solution and optimal linear controller (Lee et al., 2018, Li et al., 2021, Watanabe et al., 14 Mar 2025).
6. Dual LQR in Stochastic Adaptive and Koopman-Lifted Control
Extensions to stochastic and nonlinear settings leverage dual LQR for tractable, probing-aware regulation. Koopman operator theory is used to lift the nonlinear stochastic optimal control problem into a finite-dimensional space where a standard LQR is solved. The optimal feedback is explicitly dual: it penalizes both the state mean and estimation error covariance (encoding both caution and active probing), resulting in substantial performance improvement versus certainty-equivalence in scenarios with varying observability (Ramadan et al., 2024). This approach couples the design of control inputs to drive the system into regions of high observability, optimizing the exploration-exploitation trade-off in a unified dual LQR synthesis.
7. Theoretical Regularity: Strong Duality and Gradient Dominance
Primal-dual and lifted SDP analyses (Watanabe et al., 14 Mar 2025) confirm that the LQR, despite its nonconvex parametrization in 7, possesses strong duality: the primal policy optimization, its convex SDP relaxation, and associated dual SDP all attain the same value, with the Riccati solution matching the dual optimizer. Under additional regularity (8), the parameter-space cost is also gradient-dominated (PL-inequality), ensuring rapid convergence for local optimization, and reifying convexity-like global properties through extended convex lifting (ECL).
In summary, the dual LQR framework unifies and generalizes a broad range of classic, modern, and adaptive linear-quadratic regulation results. It reveals structural connections to RKHS, kernel and covariance methods, robust and adaptive control, kernel regression, machine learning, and strong duality theory, with implications for algorithm design, numerical conditioning, and optimality certification (Aubin-Frankowski, 2020, Bamieh, 2024, Watanabe et al., 14 Mar 2025, Lee et al., 2018, Lu et al., 2021, Rantzer, 2023, Ramadan et al., 2024, Li et al., 2021).