Papers
Topics
Authors
Recent
2000 character limit reached

Multi-Agent Reinforcement Learning Based on Representational Communication for Large-Scale Traffic Signal Control

Published 3 Oct 2023 in cs.MA and cs.AI | (2310.02435v1)

Abstract: Traffic signal control (TSC) is a challenging problem within intelligent transportation systems and has been tackled using multi-agent reinforcement learning (MARL). While centralized approaches are often infeasible for large-scale TSC problems, decentralized approaches provide scalability but introduce new challenges, such as partial observability. Communication plays a critical role in decentralized MARL, as agents must learn to exchange information using messages to better understand the system and achieve effective coordination. Deep MARL has been used to enable inter-agent communication by learning communication protocols in a differentiable manner. However, many deep MARL communication frameworks proposed for TSC allow agents to communicate with all other agents at all times, which can add to the existing noise in the system and degrade overall performance. In this study, we propose a communication-based MARL framework for large-scale TSC. Our framework allows each agent to learn a communication policy that dictates "which" part of the message is sent "to whom". In essence, our framework enables agents to selectively choose the recipients of their messages and exchange variable length messages with them. This results in a decentralized and flexible communication mechanism in which agents can effectively use the communication channel only when necessary. We designed two networks, a synthetic $4 \times 4$ grid network and a real-world network based on the Pasubio neighborhood in Bologna. Our framework achieved the lowest network congestion compared to related methods, with agents utilizing $\sim 47-65 \%$ of the communication channel. Ablation studies further demonstrated the effectiveness of the communication policies learned within our framework.

Citations (6)

Summary

  • The paper introduces a novel decentralized MARL framework that adapts its communication strategy to reduce noise and congestion in traffic networks.
  • It integrates representational message compression with deep Q-learning to optimize traffic signal decisions, achieving lower queue lengths in simulations.
  • Selective recipient identification and dynamic message length adjustment enable efficient bandwidth use, proving scalable in both synthetic and real-world networks.

Multi-Agent Reinforcement Learning for Traffic Signal Control

Introduction

The advancement of urban intelligent transportation systems (ITS) necessitates sophisticated approaches to address traffic congestion, a critical issue exacerbated by increased urbanization and growth of ride-hailing services. Multi-Agent Reinforcement Learning (MARL) frameworks have shown promise in controlling traffic signals dynamically. However, centralized methods face scalability challenges, while decentralized protocols introduce complications such as partial observability. Communication between agents within decentralized MARL systems is pivotal for coordination, yet excessive communication can lead to noise and performance degradation. This paper proposes a novel communication-oriented MARL approach, enhancing large-scale Traffic Signal Control (TSC) by selectively determining message recipients and adjusting message content length to optimize communication.

Communication Framework

The study introduces a decentralized MARL framework incorporating communication policies that allow agents to dynamically decide the recipients and parts of messages that need transmission. This selective communication protocol aims to reduce noise and optimize bandwidth usage in large-scale settings. The proposed method builds upon existing frameworks such as Q-MIX and NDQ, enhancing them with representational message compression and selective recipient identification. Implementation within synthetic and real-world traffic networks demonstrates significant efficiency improvements. Figure 1

Figure 1: The highlighted circles represent communication range for each traffic light, i.e., each traffic light can communicate with its immediate neighbor or within 500 meters of range.

Methodology

Observation and Action Representation

Each traffic signal in the network acts as an autonomous agent. It utilizes observations within a limited range of 50 meters, gathering data from incoming vehicles through sensory inputs such as lane detectors. Actions consist of selecting traffic phases to minimize congestion: Figure 2

Figure 2: An example of the phases available for an intersection in a 4×44 \times 4 grid network from SUMO simulator. The colored lines (red, yellow, and green) together indicate the phase of the traffic signal. The first phase (from the left) indicates an all-green phase, where the vehicles are allowed to go straight and/or make turns.

Reinforcement Learning Approach

The agents employ Deep Q-Networks (DQN) to approximate action-value functions. This allows them to individually learn policies by mapping observation histories to actions using recurrent neural networks. The QRC-TSC framework leverages variational inference and mutual information maximization, eschewing heuristic-based filters for more effective message processing.

Experimental Setup

Experiments were conducted using the SUMO simulator to create a synthetic 4×44 \times 4 grid network and a real-world network from the Pasubio district in Bologna. The experimental configurations involved varying flow scenarios designed to test the robustness of communication strategies under fluctuating traffic conditions. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: (a) and (b) represent the flow scenarios for the 4×44 \times 4 grid network. (c) shows the flow in Pasubio network and (d) shows the hourly flow distribution for both the networks. The dotted lines represent flow from opposite direction whenever bidirectional flows are simulated. The red and the blue lines represent the outer and inner network flow respectively.

Results and Analysis

Performance Metrics

QRC-TSC exhibited superior performance in minimizing congestion compared to baseline methods, including traditional fixed-time algorithms. It achieved the lowest average queue lengths across both network types. Figure 4

Figure 4

Figure 4: The plot shows average queue length throughout training (lower the better). The x-axis represents simulation steps (in millions). The solid lines show mean over 5 runs and the shaded region represents 95\% CI.

Communication Efficiency

By reducing unnecessary communication, QRC-TSC maintained high efficiency within the communication channel, using only the bandwidth necessary for effective coordination. Figure 5

Figure 5

Figure 5: Comparison of performance of communication policies averaged across 100 test episodes. QRC-TSC (in blue) represents the performance of the communication policies learned by our framework.

Conclusion

This paper introduces a scalable MARL communication framework, specifically tailored for large-scale traffic signal control, demonstrating significant efficacy in controlling network-wide congestion without overwhelming communication channels. Future research should focus on integrating additional constraints, such as computational overhead, and exploring adaptive message length determination methods to further optimize communication processes. While the framework included predefined message lengths, exploration into methods allowing variable-length communication could add further robustness, facilitating more intricate dynamic coordination in various application domains.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.