Learning a Decentralized Multi-arm Motion Planner (2011.02608v1)

Published 5 Nov 2020 in cs.RO, cs.AI, cs.CV, cs.LG, and cs.MA

Abstract: We present a closed-loop multi-arm motion planner that is scalable and flexible with team size. Traditional multi-arm robot systems have relied on centralized motion planners, whose runtimes often scale exponentially with team size, and thus, fail to handle dynamic environments with open-loop control. In this paper, we tackle this problem with multi-agent reinforcement learning, where a decentralized policy is trained to control one robot arm in the multi-arm system to reach its target end-effector pose given observations of its workspace state and target end-effector pose. The policy is trained using Soft Actor-Critic with expert demonstrations from a sampling-based motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sub-linearly and can be deployed on multi-arm systems with variable team sizes. Thanks to the closed-loop and decentralized formulation, our approach generalizes to 5-10 multi-arm systems and dynamic moving targets (>90% success rate for a 10-arm system), despite being trained on only 1-4 arm planning tasks with static targets. Code and data links can be found at https://multiarm.cs.columbia.edu.

Authors (3)

Huy Ha (13 papers)
Jingxi Xu (20 papers)
Shuran Song (110 papers)

Citations (48)

View on Semantic Scholar

Summary

Learning a Decentralized Multi-arm Motion Planner: An Overview

Introduction

The paper "Learning a Decentralized Multi-arm Motion Planner" addresses the challenge of efficiently planning motions for multi-arm robotic systems in dynamic environments. Traditional centralized motion planners often struggle with scalability due to their exponential runtime growth as the number of robotic arms increases. This paper proposes a decentralized approach using Multi-Agent Reinforcement Learning (MARL) and combines this with expert demonstrations to enhance efficiency and scalability.

Methodology

The proposed system leverages MARL to train decentralized policies for individual robotic arms in a multi-arm system. Each arm receives observations from its workspace and a target end-effector pose. Soft Actor-Critic (SAC) is used to optimize the policy, incorporating expert demonstrations from the BiRRT algorithm to guide the learning process. This hybrid approach allows arms to learn effectively through exploration while reducing the sparse reward problem typical in MARL scenarios.

The authors employ an LSTM-based state encoder to facilitate scaling, allowing the system to handle variable team sizes of robotic arms without retraining. This flexibility is crucial for real-world applications where the number and configuration of arms may vary. The policy uses shared weights across all agents, supporting the homogeneous cooperation among arms necessary for avoiding collisions and achieving team goals.

Results

The system exhibits strong performance, achieving a success rate of over 90% in dynamic targets and multi-arm settings ranging from 5 to 10 arms. Notably, the policy was trained solely on 1-4 arm tasks with static targets, yet it successfully generalizes to environments with dynamic targets and additional arms. The approach significantly outperforms centralized BiRRT in computation speed, achieving trajectories up to 15 times faster in 10-arm setups. It effectively addresses the scalability challenge inherent in multi-arm motion planning.

In an evaluative context, the paper highlights several important numerical results:

The decentralized policy successfully handles dynamic targets with speeds ranging between 1 and 15 cm/s.
The runtime efficiency allows for closed-loop motion planning calculations at a rate of 920Hz on a single CPU thread.
Task success rates above 90% for team sizes not present in the training dataset underscore the system's adaptability and robustness.

Implications

Theoretically, this research advances knowledge in scalable motion planning for robotic systems. Decentralization introduces efficiency while maintaining high success rates across varied and complex environments. Practically, the approach provides a path to deploy large-scale robotic systems efficiently, where coordination among many arms is critical for tasks such as assembly, manufacturing, and manipulation in cluttered spaces.

Future Directions

The potential future developments in AI concerning decentralized multi-arm motion planning could involve integrating visual inputs to extend beyond joint state measurements, enhancing real-world applicability. Generating synthetic training data might further bolster the system's ability to generalize, while exploring reinforcement learning paradigms that can handle inherently dense reward structures could sidestep current exploration challenges. Moreover, the development of more sophisticated task-level planners could complement these low-level policies, equipping robots with greater autonomy and decision-making capabilities in varied operations.

Overall, the paper offers a comprehensive framework for deploying a decentralized motion planner in multi-arm robotic systems, facilitating scalability, efficiency, and adaptability critical in dynamic environments.

Related Papers

YouTube

Show All Videos