Reinforcement Learning Aided Sequential Optimization for Unsignalized Intersection Management of Robot Traffic (2302.05082v3)

Published 10 Feb 2023 in cs.RO, cs.MA, and eess.SY

Abstract: We consider the problem of optimal unsignalized intersection management, wherein we seek to obtain safe and optimal trajectories, for a set of robots that arrive randomly and continually. This problem involves repeatedly solving a mixed integer program (with robot acceleration trajectories as decision variables) with different parameters, for which the computation time using a naive optimization algorithm scales exponentially with the number of robots and lanes. Hence, such an approach is not suitable for real-time implementation. In this paper, we propose a solution framework that combines learning and sequential optimization. In particular, we propose an algorithm for learning a shared policy that given the traffic state information, determines the crossing order of the robots. Then, we optimize the trajectories of the robots sequentially according to that crossing order. This approach inherently guarantees safety at all times. We validate the performance of this approach using extensive simulations and compare our approach against $5$ different heuristics from the literature in $9$ different simulation settings. Our approach, on average, significantly outperforms the heuristics from the literature in various metrics like objective function, weighted average of crossing times and computation time. For example, in some scenarios, we have observed that our approach offers up to $150\%$ improvement in objective value over the first come first serve heuristic. Even on untrained scenarios, our approach shows a consistent improvement (in objective value) of more than $30\%$ over all heuristics under consideration. We also show through simulations that the computation time for our approach scales linearly with the number of robots (assuming all other factors are constant). Learnt policies are implemented on physical robots with slightly modified framework to address real-world challenges.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a reinforcement learning policy that determines robot crossing orders at unsignalized intersections, ensuring improved safety and efficiency.
It employs a modified centralized Multi-Agent DDPG algorithm to sequentially optimize trajectories, reducing average crossing times compared to traditional heuristics.
Real-world lab experiments using low-level PID controllers validate the framework's practical implementation despite tracking errors and communication delays.

Reinforcement Learning Aided Sequential Optimization for Unsignalized Intersection Management of Robot Traffic

Introduction

The paper "Reinforcement Learning Aided Sequential Optimization for Unsignalized Intersection Management of Robot Traffic" (2302.05082) addresses the complex problem of managing unsignalized intersections in environments such as autonomous warehouses, where streams of mobile robots must navigate efficiently and safely. The paper proposes a novel approach integrating reinforcement learning (RL) with sequential optimization to tackle the computational challenges of ensuring real-time coordination among robots.

Problem Setup and Framework

The key challenge in unsignalized intersection management is the optimal control of robot trajectories amidst continual streams of randomly arriving robots. The naive optimization methods typically employed result in computational times that scale exponentially with the number of robots and lanes, making them unsuitable for real-time applications.

The paper introduces a learning-based solution framework where a shared policy determines the crossing order of robots, optimizing their trajectories sequentially according to this order. The approach is designed to guarantee safety at all times, validated through extensive simulations. It demonstrates superior performance compared to existing heuristics and scales linearly concerning computational time.

Figure 1: A schematic of an example intersection and the region of interest (RoI) with 8 lanes.

Learning Algorithm

The proposed framework utilizes reinforcement learning to derive a policy that decides the robots' crossing order based on traffic state features. A modified centralized Multi-Agent Deep Deterministic Policy Gradient (MAJA-DDPG) algorithm is employed, which encodes a shared policy applicable to all robots.

The feature vector for each robot includes information readily available through local measurements or communication with neighboring robots. The algorithm learns to optimize robot coordination by minimizing travel times, effectively addressing the combinatorial nature of the intersection safety constraints.

Implementation and Results

The paper provides a robust validation of the proposed algorithm through simulations across various traffic and parameter scenarios, including homogeneous and heterogeneous traffic streams. The Collect-Merge-Learn (CML) approach allows the policy to generalize well across different densities of robot arrivals and time-varying traffic conditions.

Empirical results demonstrate that the RL-aided sequential optimization significantly surpasses traditional heuristics in terms of performance metrics such as average time to cross (TTC) and overall intersection throughput.

Figure 2: Computation time per-robot for combined optimization, BESTSEQ and CML trained sequential optimization.

Practical Implementability

The paper showcases the implementation of the algorithm in a real-world lab setting, utilizing low-level PID controllers for trajectory tracking in real-time. Adaptations are proposed to mitigate tracking errors and communication delays, ensuring that the learned policies are feasible for deployment in physical robots with limited computational and communicational capabilities.

Figure 3: A schematic of the layout used for implementation on robots. The black lines represent the paths for the robots to follow.

Conclusion

The research contributes a scalable and efficient solution for real-time intersection management in multi-robot systems, combining the strengths of reinforcement learning and optimization methods. Future work includes extending the framework for complex intersection networks, accommodating lane changes, and mitigating dynamic disturbances.

This innovative approach for unsignalized intersection management establishes a solid foundation for deploying AI mechanisms in autonomous robotic coordination, facilitating advancements in automated traffic systems and industrial automation.