Approximate Optimal Trajectory Tracking for Continuous Time Nonlinear Systems (1301.7664v2)

Published 31 Jan 2013 in cs.SY and math.OC

Abstract: Approximate dynamic programming has been investigated and used as a method to approximately solve optimal regulation problems. However, the extension of this technique to optimal tracking problems for continuous time nonlinear systems has remained a non-trivial open problem. The control development in this paper guarantees ultimately bounded tracking of a desired trajectory, while also ensuring that the controller converges to an approximate optimal policy.

Citations (188)

View on Semantic Scholar

Summary

The paper presents an approximate dynamic programming method for optimal trajectory tracking in continuous nonlinear systems using a system transformation.
The approach utilizes neural networks for value function and controller approximation, proven via Lyapunov analysis to achieve bounded tracking error and convergence.
This research has practical implications for robotics, aerospace, and autonomous systems requiring precise tracking and supports future learning-based control research.

Overview of Trajectory Tracking Control for Continuous Time Nonlinear Systems

The research paper titled "Approximately Optimal Trajectory Tracking for Continuous Time Nonlinear Systems" tackles a significant challenge in the domain of optimal control, particularly concerning the extension of Approximate Dynamic Programming (ADP) methods to continuous time nonlinear systems in trajectory tracking. The work is fundamentally situated within the field of reinforcement learning (RL) and ADP, aiming to develop a control strategy that ensures ultimately bounded tracking of a desired trajectory, while aligning the controller with an approximate optimal policy.

Problem and Approach

The primary focus of the paper is the adaptation of RL algorithms, which traditionally rely on Generalized Policy Iteration (GPI) for discrete time systems, to the domain of continuous time nonlinear systems for trajectory tracking. Classical RL and ADP approaches have concentrated on the BeLLMan equation (BE) and the Hamilton-Jacobi-BeLLMan (HJB) equation for continuous dynamics, predominantly for regulation problems. However, movement towards trajectory tracking introduces complexities due to the inherent time-variance of the control objectives.

To address these challenges, the paper introduces a transformation methodology that converts the time-varying tracking problem into a time-invariant optimal control problem. This is crucial because value functions in an infinite horizon framework can otherwise become non-approximable by neural networks due to the non-compactness of time. By transforming the system dynamics and embedding the desired trajectory into the HJB framework, the authors formulate a stationary representation that facilitates neural network approximation and ADP learning algorithms.

Key Contributions and Results

The paper presents several critical contributions:

System Transformation: By defining a new system state that combines the tracking error and the desired trajectory, the authors reformulate the time-varying optimal control problem into a time-invariant form. This allows for the use of neural networks to approximate the value function and the controller.
ADP-Based Controller: A control policy is developed using neural network approximations of the value function and leveraging least square updates and policy iterations. This policy ensures that the tracking error remains ultimately bounded.
Theoretical Foundations: Through a rigorous Lyapunov-based analysis, the paper establishes the convergence and stability properties of the proposed control strategy. The paper verifies that the controller asymptotically converges towards an approximate optimal policy, ensuring reliable trajectory tracking.

Implications and Future Directions

The implications of this research are twofold, impacting both theoretical understanding and practical applications. Theoretically, the work extends the applicability of ADP methods to the more complex problem of trajectory tracking for continuous time systems, introducing a framework that can potentially be adapted to other domains requiring real-time control adjustments. Practically, the implementations of such controllers can be highly suitable for robotics, aerospace, and autonomous systems, where precise tracking of trajectories is crucial.

Future developments could focus on loosening some of the assumptions made, such as requiring full model knowledge or persistent excitation. Alternative basis functions besides simple polynomials could be explored, possibly employing advanced neural architectures like deep reinforcement learning frameworks that have better capacity and generalization.

In summary, the paper advances the capability of ADP methods to handle continuous time trajectory tracking problems by innovatively transforming the problem dynamics. It lays the groundwork for further research into autonomous control systems and their ability to learn optimally in real-time, presenting a robust solution under certain system conditions.