Learning Time-optimized Path Tracking with or without Sensory Feedback (2203.01968v2)

Published 3 Mar 2022 in cs.RO and cs.AI

Abstract: In this paper, we present a learning-based approach that allows a robot to quickly follow a reference path defined in joint space without exceeding limits on the position, velocity, acceleration and jerk of each robot joint. Contrary to offline methods for time-optimal path parameterization, the reference path can be changed during motion execution. In addition, our approach can utilize sensory feedback, for instance, to follow a reference path with a bipedal robot without losing balance. With our method, the robot is controlled by a neural network that is trained via reinforcement learning using data generated by a physics simulator. From a mathematical perspective, the problem of tracking a reference path in a time-optimized manner is formalized as a Markov decision process. Each state includes a fixed number of waypoints specifying the next part of the reference path. The action space is designed in such a way that all resulting motions comply with the specified kinematic joint limits. The reward function finally reflects the trade-off between the execution time, the deviation from the desired reference path and optional additional objectives like balancing. We evaluate our approach with and without additional objectives and show that time-optimized path tracking can be successfully learned for both industrial and humanoid robots. In addition, we demonstrate that networks trained in simulation can be successfully transferred to a real robot.

Authors (2)

Jonas C. Kiemel (6 papers)
Torsten Kröger (27 papers)

Citations (1)

View on Semantic Scholar

Summary

An Expert Overview of Learning Time-Optimized Path Tracking with or without Sensory Feedback

Introduction

The focus of this paper is on the development of a learning-based methodology for time-optimized path tracking in robotic systems, with a novel capability to adapt in real-time using sensory feedback. Unlike conventional approaches that require predefined paths, this method allows for reactive adjustments, enhancing applicability in dynamic environments and tasks involving balance, such as humanoid robotics.

Methodology

The proposed approach formulates the problem of path tracking as a Markov Decision Process. The robot is equipped with a neural network trained through reinforcement learning (RL) to navigate reference paths while adhering to kinematic constraints such as position, velocity, acceleration, and jerk limits. The reinforcement learning framework utilized in this paper capitalizes on the flexibility of model-free reinforcement learning, allowing the robot's trajectories to be optimized without the necessity for a differentiable model of the dynamics or reward function.

Technical Insights

State and Action Representation: The state representation incorporates up to nine waypoint knots of a cubic spline-defined reference path, providing sufficient context for the neural network while limiting the necessary immediate path foresight.
Action Space and Constraint Handling: Actions consist of acceleration commands mapped onto feasible joint movements within kinematic limitations, a critical feature for preventing damage to the robot hardware.
Reward Function: The reward provision is multi-objective, balancing the quick traversal of paths and adherence to the designated trajectory, hence allowing the network to prioritize certain tasks such as stability in bipedal robots.

Evaluation and Results

The evaluation encompassed simulations with the KUKA iiwa and humanoids like ARMAR-4 and ARMAR-6, examining tasks both with and without sensory feedback objectives (e.g., path tracking and balance maintenance). The results indicated a successful learning of complex time-optimized trajectories that sustained compliance with predefined kinematic constraints. Notably, the sim-to-real transfer, a key challenge in robotics, was demonstrated effectively with a ball-on-plate task.

Comparison with Traditional Techniques

The comparison to offline approaches like TOPP-RA highlights a trade-off: offline methods achieve slightly faster trajectories due to complete foresight over path constraints but lack the adaptability offered by online adjustments of the reference path and sensory-informed control.

Practical and Theoretical Implications

Practically, this research underlines the potential for enhanced robustness in industrial and service robots, such as those engaged in unstructured environments where path obstructions or task deviations are frequent. Theoretically, the work lays foundational insights that could inform subsequent explorations in multi-objective RL in robotics, particularly in systems with high degrees of freedom.

Conclusion and Future Directions

The contributions of this paper set the stage for broader applicability and on-the-fly adaptability in robotics, with future directions potentially investigating more granular control over path speed and integration with more complex sensory modalities or environments. This exploration into model-free learning with real-time path parameterization marks a significant advancement in robotic autonomy and adaptability.

PDF Markdown

Related Papers

YouTube

Show All Videos