Global convergence of policy gradient methods for LQG with noise via IOH parameterization

Establish global convergence guarantees for policy gradient methods applied to the linear quadratic Gaussian dynamic output‑feedback control problem under input–output‑history (IOH) parameterization in the presence of Gaussian process and measurement noise, extending the noise‑free convergence results of IOH‑based policy gradient methods to the noisy LQG setting.

Background

The paper studies policy gradient methods (PGMs) for LQG output‑feedback control using an input–output‑history representation. Prior IOH‑based convergence results were obtained for noise‑free systems, while the stochastic noise inputs in LQG make the optimization landscape more challenging (e.g., non‑coerciveness and numerous stationary points).

This work provides convergence to O(ε)‑stationary points via a relaxed problem that adds small process noise to ensure coerciveness, but it does not establish global convergence in the original noisy LQG setting. Extending the noise‑free global convergence guarantees to systems with process and measurement noise remains unresolved.

References

However, proving the global convergence of PGMs to LQG control problems with noise inputs by extending the result in is not straightforward (as discussed in Section~\ref{Sec3C}) and remains an open challenge.

— Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(ε)$-Stationary Points (2510.19141 - Sadamoto et al., 22 Oct 2025) in Related Works (Introduction)

Global convergence of policy gradient methods for LQG with noise via IOH parameterization

Background

References

Related Problems