Papers
Topics
Authors
Recent
2000 character limit reached

Joint Space Control via Deep Reinforcement Learning

Published 12 Nov 2020 in cs.RO | (2011.06332v2)

Abstract: The dominant way to control a robot manipulator uses hand-crafted differential equations leveraging some form of inverse kinematics / dynamics. We propose a simple, versatile joint-level controller that dispenses with differential equations entirely. A deep neural network, trained via model-free reinforcement learning, is used to map from task space to joint space. Experiments show the method capable of achieving similar error to traditional methods, while greatly simplifying the process by automatically handling redundancy, joint limits, and acceleration / deceleration profiles. The basic technique is extended to avoid obstacles by augmenting the input to the network with information about the nearest obstacles. Results are shown both in simulation and on a real robot via sim-to-real transfer of the learned policy. We show that it is possible to achieve sub-centimeter accuracy, both in simulation and the real world, with a moderate amount of training.

Citations (14)

Summary

  • The paper introduces a deep RL-based JAiLeR system that bypasses differential equations to achieve sub-centimeter accuracy in joint space control.
  • It employs curriculum learning and Proximal Policy Optimization to manage joint constraints and integrate obstacle avoidance in dynamic environments.
  • The approach demonstrates robust sim-to-real transfer, simplifying robotic control architectures while enhancing precision and safety in unstructured settings.

Joint Space Control via Deep Reinforcement Learning

Introduction

The paper "Joint Space Control via Deep Reinforcement Learning" introduces a novel approach to robot control which leverages deep reinforcement learning (DRL) to replace traditional control systems that depend heavily on differential equations. Traditional robotic systems predominantly use predefined equations, employing inverse kinematics and dynamics for operational space control. These methods, despite their efficacy, struggle in dynamic and unstructured environments, primarily due to their inability to navigate joint space constraints effectively.

Methodology

Deep Reinforcement Learning Controller

The researchers propose a joint-level controller that omits the need for differential equations and instead relies on a deep neural network trained via model-free reinforcement learning. This neural network maps task space inputs directly to joint space outputs, enabling the robot to achieve tasks traditionally managed by complex mathematical modeling.

The proposed method, Joint Action-space Learned Reacher (JAiLeR), handles redundancy, joint limits, and acceleration/deceleration profiles dynamically. Furthermore, it enhances obstacle avoidance capabilities by integrating additional inputs concerning the robot's proximity to its environment. Figure 1

Figure 1: Our deep RL-based JAiLeR system is able to learn a policy that enables a robot manipulator to reach any goal position (red sphere) in the workspace with sub-centimeter average error, while avoiding obstacles (silver spheres). The green points on the robot links are used for distance computations for collision avoidance.

Architecture

JAiLeR's architecture comprises a policy network that operates on state inputs such as end-effector error, current joint states, and optional obstacle-related inputs. The network outputs desired joint velocities, optimally utilizing a Proportional-Derivative (PD) control framework for motor commands. The training employs Proximal Policy Optimization (PPO), a model-free DRL algorithm known for its stability and efficiency. Figure 2

Figure 2: Overview of the proposed JAiLeR approach. The goal is provided externally (at run time) or by curriculum learning (during training). The robot can be either simulated or real.

Curriculum Learning

To improve learning efficacy, the researchers deploy curriculum learning, wherein the workspace is divided into increasingly complex sub-regions. The network iteratively masters these sub-regions, progressively expanding its functional capabilities across a broader workspace. This method significantly enhances the policy's robustness and generalization capabilities.

Experimental Evaluation

Simulation and Real-World Testing

The JAiLeR system was evaluated through extensive simulations and real-world experiments using robotic arms such as the Kinova Jaco and Baxter. Results demonstrated that JAiLeR could reach arbitrary target positions with sub-centimeter accuracy across large workspaces. Figure 3

Figure 3

Figure 3: By feeding the network additional inputs (vectors pointing from links to the nearest obstacles), the policy learns to avoid obstacles. This is without scene-specific training, because the policy is trained only on spherical obstacles. The Jaco robot is reaching for a target (large red sphere).

Obstacle Avoidance

Through the augmentation of input vectors representing the closest obstacles, JAiLeR effectively avoids collisions without additional scene-specific training. This feature was particularly emphasized in cluttered environments, showcasing the policy's adaptability and effectiveness.

Sim-to-Real Transfer

The policy was successfully transferred from simulation to physical robotic systems without significant performance degradation—a testament to the robustness of the learned policies and the design’s resilience against the sim-to-real gap. Figure 4

Figure 4: Step response of the real Baxter robot reaching for a goal, showing the speed, accuracy, smoothness, relatively small overshoot, and stability of our learned policy. This policy was not fine-tuned in the real world and comes directly from simulation.

Discussion

The shift from theoretically rigorous control methods to data-driven models like JAiLeR simplifies implementation while maintaining, if not surpassing, the performance of traditional systems. The integration of perception capabilities directly into the control policy represents a significant step forward, simplifying the control architecture of robotic systems. The proposed approach not only handles precise manipulative tasks but also dynamically responds to environmental changes, marking an improvement over traditional methods.

Conclusion

The work presents an effective alternative to classical robotic control methods through the application of deep reinforcement learning. JAiLeR's success in both simulated and real-world environments, its comprehensive handling of redundancy, and its adaptability to unstructured environments signify a substantial advancement in robotic control systems. Future developments aim to integrate real-time perception, enhancing the robot's ability to operate autonomously in dynamic contexts.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.