- The paper introduces dual deep reinforcement learning agents—a policy-gradient agent and a value-function-based agent—to optimize adaptive traffic signal timing.
- Simulation results with the SUMO simulator show a reduction in delays and queue lengths by approximately 67-73% compared to traditional fixed-time controllers.
- The methodology highlights the potential of deep RL for urban traffic management, offering benefits like cost savings, reduced emissions, and improved vehicular flow.
Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning
The paper presents two advanced reinforcement learning (RL) models for traffic light control, leveraging the integration of deep neural network architectures with policy-gradient and value-function-based reinforcement learning methods. The goal is to optimize traffic flow by determining the best traffic signal timing at an intersection using adaptive control.
Reinforcement Learning Framework
The authors employ two types of RL agents to control the traffic lights: a deep policy-gradient agent and a value-function-based agent.
- Policy-Gradient Agent: This agent directly maps observed states to an action distribution, aiming to find policies without estimating the action-value function. It addresses the instability and oscillations often seen during training by employing a deep network to form hierarchical representations of the state space, thus mitigating variance issues traditionally associated with policy gradient methods.
- Value-Function-Based Agent: This model focuses on estimating the action-value function, guiding the selection of optimal control actions. It relies on deep Q-networks (DQNs) to manage the exploration-exploitation trade-off and uses a stable learning technique by introducing target networks and experience replay, which reduces the correlation between sequential observations.
Methodology and Simulations
The paper uses the SUMO traffic simulator to test the proposed models on a four-way intersection, where actions correspond to managing the traffic flow direction (North/South or East/West). Real-time input data from the simulator snapshots, in the form of images, are fed into convolutional layers of a deep neural network, allowing the model to capture crucial visual information, such as vehicle count and movement, thus demonstrating an innovative approach beyond traditional sensor-based methods.
Results and Evaluation
The proposed adaptive signal control methods outperformed a baseline fixed-time traffic controller significantly by producing higher mean rewards and reducing average queue lengths and cumulative delays. Empirical results revealed reductions of approximately 67-73% in delay and queue length compared to a shallow neural network (SNN), confirming the superior learning capabilities and stability of the deep RL approaches.
Implications and Future Directions
The implications of this work are crucial for urban traffic management, suggesting an efficient method for reducing congestion and improving vehicular flow at intersections—contributing to cost savings, reduced emissions, and improved urban mobility. The pathways for future research include scaling the models to larger networks with multiple intersections and deploying multi-agent learning systems to address coordination challenges within interconnected traffic systems. Enhancements in state representations, such as integrating more complex traffic patterns, could further increase the practical applicability of these models.
This research not only advances traffic management solutions but also contributes to the broader discourse on applying RL and deep learning to complex control environments, showcasing the potential for AI-driven optimization in real-world scenarios.