Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning (1704.08883v2)

Published 28 Apr 2017 in cs.LG

Abstract: Recent advances in combining deep neural network architectures with reinforcement learning techniques have shown promising potential results in solving complex control problems with high dimensional state and action spaces. Inspired by these successes, in this paper, we build two kinds of reinforcement learning algorithms: deep policy-gradient and value-function based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The policy-gradient based agent maps its observation directly to the control signal, however the value-function based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Our methods show promising results in a traffic network simulated in the SUMO traffic simulator, without suffering from instability issues during the training process.

Citations (292)

View on Semantic Scholar

Summary

The paper introduces dual deep reinforcement learning agents—a policy-gradient agent and a value-function-based agent—to optimize adaptive traffic signal timing.
Simulation results with the SUMO simulator show a reduction in delays and queue lengths by approximately 67-73% compared to traditional fixed-time controllers.
The methodology highlights the potential of deep RL for urban traffic management, offering benefits like cost savings, reduced emissions, and improved vehicular flow.

Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

The paper presents two advanced reinforcement learning (RL) models for traffic light control, leveraging the integration of deep neural network architectures with policy-gradient and value-function-based reinforcement learning methods. The goal is to optimize traffic flow by determining the best traffic signal timing at an intersection using adaptive control.

Reinforcement Learning Framework

The authors employ two types of RL agents to control the traffic lights: a deep policy-gradient agent and a value-function-based agent.

Policy-Gradient Agent: This agent directly maps observed states to an action distribution, aiming to find policies without estimating the action-value function. It addresses the instability and oscillations often seen during training by employing a deep network to form hierarchical representations of the state space, thus mitigating variance issues traditionally associated with policy gradient methods.
Value-Function-Based Agent: This model focuses on estimating the action-value function, guiding the selection of optimal control actions. It relies on deep Q-networks (DQNs) to manage the exploration-exploitation trade-off and uses a stable learning technique by introducing target networks and experience replay, which reduces the correlation between sequential observations.

Methodology and Simulations

The paper uses the SUMO traffic simulator to test the proposed models on a four-way intersection, where actions correspond to managing the traffic flow direction (North/South or East/West). Real-time input data from the simulator snapshots, in the form of images, are fed into convolutional layers of a deep neural network, allowing the model to capture crucial visual information, such as vehicle count and movement, thus demonstrating an innovative approach beyond traditional sensor-based methods.

Results and Evaluation

The proposed adaptive signal control methods outperformed a baseline fixed-time traffic controller significantly by producing higher mean rewards and reducing average queue lengths and cumulative delays. Empirical results revealed reductions of approximately 67-73% in delay and queue length compared to a shallow neural network (SNN), confirming the superior learning capabilities and stability of the deep RL approaches.

Implications and Future Directions

The implications of this work are crucial for urban traffic management, suggesting an efficient method for reducing congestion and improving vehicular flow at intersections—contributing to cost savings, reduced emissions, and improved urban mobility. The pathways for future research include scaling the models to larger networks with multiple intersections and deploying multi-agent learning systems to address coordination challenges within interconnected traffic systems. Enhancements in state representations, such as integrating more complex traffic patterns, could further increase the practical applicability of these models.

This research not only advances traffic management solutions but also contributes to the broader discourse on applying RL and deep learning to complex control environments, showcasing the potential for AI-driven optimization in real-world scenarios.

PDF Markdown