Lyapunov-Guided Controller

Updated 22 November 2025

A Lyapunov-guided controller is a control paradigm that embeds Control Lyapunov Functions to enforce stability in nonlinear systems.
It integrates Lyapunov conditions as architectural inductive biases, transforming multiple constraints into a single residual risk loss for streamlined training.
Empirical results demonstrate that this approach enhances stability, sample efficiency, and robustness compared to classical control methods.

A Lyapunov-guided controller is a control paradigm in which the stability of closed-loop nonlinear systems is explicitly encoded and enforced through the learning or construction of Control Lyapunov Functions (CLFs) or Lyapunov-like certificates. Unlike classical model-based synthesis or black-box reinforcement learning, the Lyapunov-guided framework leverages the mathematical equivalence between Lyapunov decrease and asymptotic stability, embedding this equivalence directly in the controller's design, optimization, and verification process. Recent developments—particularly in deep learning and neural control—place Lyapunov conditions as inductive biases in policy and certificate architectures, resulting in stable, sample-efficient, and certifiable control policies applicable to highly nonlinear or partially unknown systems.

1. Mathematical Formulation of Lyapunov-Guided Control

The core principle for Lyapunov-guided control is the specification of a CLF $V(x)$ for a nonlinear control-affine system: $\dot x = f(x) + g(x)\,u(x), \qquad x\in\Omega\subset\mathbb R^n,\quad u\in[\underline u,\bar u]\subset\mathbb R^m$ with equilibrium $x^*$ . $V(x)$ must satisfy:

$V(x^*) = 0$ , $V(x) > 0$ for all $x \neq x^*$
$\dot V(x,u) = \nabla V(x)^\top (f(x) + g(x)u(x)) < 0$ for all $x \neq x^*$
The set $\{x \mid V(x)\leq \rho\}$ defines an inner estimate of the region of attraction (ROA) (Lu et al., 3 Nov 2025).

For systems that are not control-affine or in discrete time, analogous conditions apply: a candidate Lyapunov function must be positive definite, radially unbounded, and its (Lie/discrete) derivative along closed-loop trajectories must be negative definite or negative semi-definite outside equilibrium.

2. Incorporating Lyapunov Conditions as Inductive Biases

Modern Lyapunov-guided controller design eliminates optimization over hard non-convex constraints imposed by Lyapunov conditions by embedding these properties architecturally. Typical approaches include:

Enforcing positivity via sum-of-squares or squared-norm neural structures: $V_\theta(x) = \|\phi_\theta(x) - \phi_\theta(x^*)\|_2^2 + k\log(1+(\mathbf{1}^\top(x-x^*))^2)$ ensures $V_\theta(x^*) = 0$ and $V_\theta(x) > 0$ elsewhere (Lu et al., 3 Nov 2025).
Inducing negative definiteness of the time-derivative via coupled controller mechanisms: Controllers are designed as explicit functions of $\nabla V_\theta$ (e.g., $u(x) = -\frac{\nabla V_\theta(x)^\top f(x)}{\nabla V_\theta(x)^\top g(x) g(x)^\top \nabla V_\theta(x)} g(x)^\top \nabla V_\theta(x) + u_2(x)$ , with $u_2$ shaping the residual (Lu et al., 3 Nov 2025)).
Reducing the learning objective to a single 'residual risk' loss for Lyapunov violation, facilitating tractable and stable end-to-end gradient-based optimization (Lu et al., 3 Nov 2025).

Contrast this with prior soft-constraint methods, which commonly require balancing multiple loss components for positivity and negativity over sampled states, often leading to non-convex and ill-conditioned optimization landscapes.

3. Architecture and Training Procedure

The typical Lyapunov-guided controller-learning workflow is:

(Optional) Learn or approximate the system dynamics if they are unknown or partially known.
Initialize parameters for the CLF (e.g., multi-layer perceptron with structured output) and the control shaping vector parameters.
For each batch of sampled states $x_i$ $x_{i}$ :
- Compute $V_\theta(x_i)$ and $\nabla V_\theta(x_i)$ .
- Compute $u(x_i)$ using the Lyapunov-guided law tied to $V_\theta$ .
- Evaluate the Lyapunov derivative $\dot V_\theta(x_i)$ .
- Compute the Lyapunov violation loss $\mathcal L_{\mathrm{lya}} = \frac{1}{N}\sum_i \lambda_1 [\max(\dot V_\theta(x_i) + b, 0)]^2$ , with $b$ accommodating modeling errors (Lu et al., 3 Nov 2025).
- Optionally, verify over a dense grid and augment the dataset with discovered counterexamples.
Update parameters using a first-order method (e.g., Adam).
Repeat the cycle, with the design ensuring that positivity and derivative negativity are inherently satisfied except for measure-zero violations handled by the residual loss.

This unified treatment—where inductive biases enforce nearly all Lyapunov conditions—converts a previously multi-constraint problem into what is effectively a "zero-constraint" setting, retaining only a single smooth regularizer in the training loop (Lu et al., 3 Nov 2025).

4. Theoretical Stability and Robustness Guarantees

The Lyapunov-guided approach provides direct theoretical guarantees under standard Lyapunov analysis:

Local Asymptotic Stability (LAS): If $V_\theta$ satisfies the aforementioned conditions and the implemented controller ensures $\dot V_\theta(x) < 0$ over some $\mathcal D$ , then all closed-loop trajectories starting in $\mathcal D$ converge to $x^*$ . The level set $\{x \mid V_\theta(x)\leq \rho\}$ defines the certified ROA (Lu et al., 3 Nov 2025).
Robustness to Model Error: If the control law is synthesized relative to learned (possibly imperfect) dynamics $\hat f$ , and a margin $b\geq M(K_{\mathrm{dyn}}\delta + \epsilon + K_\phi\delta)$ is enforced in the Lyapunov decrease, then stability holds as long as the modeling error is bounded, with $M$ the supremum of $\|\nabla V\|$ and $K_*$ Lipschitz constants (Lu et al., 3 Nov 2025).

These guarantees permit direct, certificate-based claims on local/global stability and quantifiable robustness margins.

5. Experimental Efficacy and Comparative Analysis

Empirical results demonstrate the strict superiority of inductive-bias Lyapunov-guided controllers versus classical soft-constraint neural Lyapunov controller (NLC) and unconstrained learning controller (ULC) baselines:

On the benchmark inverted pendulum: $90$– $99\%$ success rate vs. $12$– $54\%$ (NLC/ULC); region of attraction (ROA) area improved by up to $5\times$ (e.g., $31.3$ vs. $11.3$ for ULC) (Lu et al., 3 Nov 2025).
For path-following: $\sim$ 96\% success compared to $\sim$ 44\% for baseline methods; with the inductive-bias approach delivering a ROA area $>2\times$ baseline.
Baseline failures are attributed to optimizer "cheating" within overlooked subregions where Lyapunov constraints are loose, and instability introduced by poorly weighted multiple soft constraints.

The Lyapunov-guided framework, with its single-term loss and architectural hard-coding of major Lyapunov properties, achieves stable, fast convergence, end-to-end GPU compatibility, and consistency across random initializations and hyperparameter choices (Lu et al., 3 Nov 2025).

6. Comparison to Alternative Lyapunov-Guided Paradigms and Extensions

The principle of guiding controller learning via structural induction of Lyapunov conditions is reflected in several alternative state-of-the-art methodologies:

Counterexample-guided Lyapunov learning and demonstration-driven CLF synthesis: Iteratively expand the set of points where Lyapunov conditions are enforced, via counterexample verification and demonstrator-supplied control at problematic states (Ravanbakhsh et al., 2018, Ravanbakhsh et al., 2017).
Actor-critic and penalty-based neural optimization: Simultaneously train certificates and policies, often alternating between Lyapunov decrease enforcement and cost-to-go (or region-of-attraction) maximization, sometimes leveraging Zubov PDEs (Wang et al., 13 Mar 2024, Mehrjou et al., 2020).
Lyapunov-guided exploration and distributed RL: For high-dimensional or distributed subsystems, the Lyapunov decrease is used as a guiding constraint in actor-critic or distributed learning loops, often with local critics or regional certificates (Yao et al., 14 Dec 2024, Zhang et al., 2023).
Integration with diffusion models and trajectory-level sampling: Recently, guided diffusion processes use Lyapunov energy shaping to bias trajectory sampling and policy optimization, further obviating the need for explicit quadratic programming or control-affine assumptions (Cheng et al., 29 Sep 2025, Mukherjee et al., 26 Mar 2024).

These approaches, whether leveraging architectural inductive bias or iterative verification-guided refinement, converge on the paradigm that Lyapunov-guided controller synthesis is the preferred scalable, certifiable, and performance-maximizing route for nonlinear and learning-enabled control systems.

References:

(Lu et al., 3 Nov 2025, Ravanbakhsh et al., 2018, Ravanbakhsh et al., 2017, Wang et al., 13 Mar 2024, Mehrjou et al., 2020, Yao et al., 14 Dec 2024, Zhang et al., 2023, Cheng et al., 29 Sep 2025, Mukherjee et al., 26 Mar 2024)