MetaVIM: Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control (2101.00746v5)
Abstract: Traffic signal control aims to coordinate traffic signals across intersections to improve the traffic efficiency of a district or a city. Deep reinforcement learning (RL) has been applied to traffic signal control recently and demonstrated promising performance where each traffic signal is regarded as an agent. However, there are still several challenges that may limit its large-scale application in the real world. To make the policy learned from a training scenario generalizable to new unseen scenarios, a novel Meta Variationally Intrinsic Motivated (MetaVIM) RL method is proposed to learn the decentralized policy for each intersection that considers neighbor information in a latent way. Specifically, we formulate the policy learning as a meta-learning problem over a set of related tasks, where each task corresponds to traffic signal control at an intersection whose neighbors are regarded as the unobserved part of the state. Then, a learned latent variable is introduced to represent the task's specific information and is further brought into the policy for learning. In addition, to make the policy learning stable, a novel intrinsic reward is designed to encourage each agent's received rewards and observation transition to be predictable only conditioned on its own history. Extensive experiments conducted on CityFlow demonstrate that the proposed method substantially outperforms existing approaches and shows superior generalizability.
- P. Koonce and L. Rodegerdts, “Traffic signal timing manual.” United States. Federal Highway Administration, Tech. Rep. FHWA-HOP-08-024, 2008.
- P. Varaiya, “The max-pressure controller for arbitrary networks of signalized intersections,” in Advances in Dynamic Network Modeling in Complex Transportation Systems. Springer, 2013.
- X. Guo, Z. Yu, P. Wang, Z. Jin, J. Huang, D. Cai, X. He, and X. Hua, “Urban traffic light control via active multi-agent communication and supply-demand modeling,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–11, 2021, early access.
- J. Ke, F. Xiao, H. Yang, and J. Ye, “Learning to delay in ride-sourcing systems: A multi-agent deep reinforcement learning framework,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 5, pp. 2280–2292, 2022.
- Z. Pan, W. Zhang, Y. Liang, W. Zhang, Y. Yu, J. Zhang, and Y. Zheng, “Spatio-temporal meta learning for urban traffic prediction,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 3, pp. 1462–1476, 2022.
- S. He and K. G. Shin, “Spatio-temporal capsule-based reinforcement learning for mobility-on-demand coordination,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 3, pp. 1446–1461, 2022.
- Y. Tong, D. Shi, Y. Xu, W. Lv, Z. Qin, and X. Tang, “Combinatorial optimization meets reinforcement learning: Effective taxi order dispatching at large-scale,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–12, 2021, early access.
- S. Wang, J. Cao, and P. S. Yu, “Deep learning for spatio-temporal data mining: A survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 8, pp. 3681–3700, 2022.
- J. Gu, Q. Zhou, J. Yang, Y. Liu, F. Zhuang, Y. Zhao, and H. Xiong, “Exploiting interpretable patterns for flow prediction in dockless bike sharing systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 2, pp. 640–652, 2022.
- J. Liu, T. Li, S. Ji, P. Xie, S. Du, F. Teng, and J. Zhang, “Urban flow pattern mining based on multi-source heterogeneous data fusion and knowledge graph embedding,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–13, 2021, early access.
- F. Zhang, Y. Liu, N. Feng, C. Yang, J. Zhai, S. Zhang, B. He, J. Lin, X. Zhang, and X. Du, “Periodic weather-aware lstm with event mechanism for parking behavior prediction,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5896–5909, 2022.
- H. Wei, G. Zheng, H. Yao, and Z. Li, “Intellilight: A reinforcement learning approach for intelligent traffic light control,” in SIGKDD, 2018.
- H. Wei, C. Chen, G. Zheng, K. Wu, V. Gayah, K. Xu, and Z. Li, “Presslight: Learning max pressure control to coordinate traffic signals in arterial network,” in SIGKDD, 2019.
- X. Zang, H. Yao, G. Zheng, N. Xu, K. Xu, and Z. Li, “Metalight: Value-based meta-reinforcement learning for traffic signal control,” in AAAI, 2020, pp. 1153–1160.
- L. Kuyer, S. Whiteson, B. Bakker, and N. Vlassis, “Multiagent reinforcement learning for urban traffic control using coordination graphs,” in ECML-PKDD. Springer, 2008.
- E. Van der Pol and F. A. Oliehoek, “Coordinated deep reinforcement learners for traffic light control,” in NeurIPS 2016 Workshop on Learning, Inference and Control of Multi-Agent Systems, 2016, pp. 1–8.
- D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in CVPR Workshops, 2017.
- M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel, and W. Zaremba, “Hindsight experience replay,” in NeurIPS, 2017, pp. 5048–5058.
- R. Chitnis, S. Tulsiani, S. Gupta, and A. Gupta, “Intrinsic motivation for encouraging synergistic behavior,” in ICLR, 2019, pp. 1–15.
- H. Zhang, S. Feng, C. Liu, Y. Ding, Y. Zhu, Z. Zhou, W. Zhang, Y. Yu, H. Jin, and Z. Li, “Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario,” in WWW, 2019.
- F. Webster, “Traffic signal settings, road research technical paper no. 39,” Road Research Laboratory, 1958.
- S. Chiu, “Adaptive traffic signal control using fuzzy logic,” in Proceedings of the Intelligent Vehicles Symposium. IEEE, 1992.
- S.-B. Cools, C. Gershenson, and B. D’Hooghe, “Self-organizing traffic lights: A realistic simulation,” in Advances in applied self-organizing systems. Springer, 2013.
- K. Gao, Y. Zhang, A. Sadollah, and R. Su, “Optimizing urban traffic light scheduling problem using harmony search with ensemble of local search,” Applied Soft Computing, vol. 48, pp. 359–372, 2016.
- K. Gao, Y. Zhang, R. Su, F. Yang, P. N. Suganthan, and M. Zhou, “Solving traffic signal scheduling problems in heterogeneous traffic network by using meta-heuristics,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 9, pp. 3272–3282, 2018.
- S. A. Celtek, A. Durdu, and M. E. M. Alı, “Real-time traffic signal control with swarm optimization methods,” Measurement, vol. 166, p. 108206, 2020.
- Y. Cheng, X. Hu, Q. Tang, H. Qi, and H. Yang, “Monte carlo tree search-based mixed traffic flow control algorithm for arterial intersections,” Transportation Research Record, vol. 2674, no. 8, pp. 167–178, 2020.
- S. Chen and D. J. Sun, “An improved adaptive signal control method for isolated signalized intersection based on dynamic programming,” IEEE Intelligent Transportation Systems Magazine, vol. 8, no. 4, pp. 4–14, 2016.
- Q. He, R. Kamineni, and Z. Zhang, “Traffic signal control with partial grade separation for oversaturated conditions,” Transportation research part C: emerging technologies, vol. 71, pp. 267–283, 2016.
- R. Mohebifard and A. Hajbabaie, “Optimal network-level traffic signal control: A benders decomposition-based solution algorithm,” Transportation Research Part B: Methodological, vol. 121, pp. 252–274, 2019.
- S. S. S. M. Qadri, M. A. Gökçe, and E. Öner, “State-of-art review of traffic signal control methods: challenges and opportunities,” European transport research review, vol. 12, no. 1, pp. 1–23, 2020.
- S. El-Tantawy, B. Abdulhai, and H. Abdelgawad, “Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1140–1150, 2013.
- M. Abdoos, N. Mozayani, and A. L. Bazzan, “Holonic multi-agent system for traffic signals control,” Engineering Applications of Artificial Intelligence, vol. 26, no. 5-6, pp. 1575–1587, 2013.
- I. Dusparic and V. Cahill, “Distributed w-learning: Multi-policy optimization in self-organizing systems,” in self-adaptive and self-organizing systems. IEEE, 2009.
- M. Abdoos, N. Mozayani, and A. L. Bazzan, “Traffic light control in non-stationary environments based on multi agent q-learning,” in IEEE International Conference on Intelligent Transportation Systems. IEEE, 2011.
- X. Huang, D. Wu, M. Jenkin, and B. Boulet, “Modellight: Model-based meta-reinforcement learning for traffic signal control,” arXiv preprint arXiv:2111.08067, 2021.
- Q. Jiang, J. Li, W. S. SUN, and B. Zheng, “Dynamic lane traffic signal control with group attention and multi-timescale reinforcement learning.” IJCAI, 2021.
- G. Zheng, X. Zang, N. Xu, H. Wei, Z. Yu, V. Gayah, K. Xu, and Z. Li, “Diagnosing reinforcement learning for traffic signal control,” arXiv, 2019.
- G. Zheng, Y. Xiong, X. Zang, J. Feng, H. Wei, H. Zhang, Y. Li, K. Xu, and Z. Li, “Learning phase competition for traffic signal control,” in CIKM, 2019.
- Y. Xiong, G. Zheng, K. Xu, and Z. Li, “Learning traffic signal control from demonstrations,” in CIKM, 2019.
- C. Chen, H. Wei, N. Xu, G. Zheng, M. Yang, Y. Xiong, K. Xu, and Z. Li, “Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control.” in AAAI, 2020, pp. 3414–3421.
- A. Oroojlooy, M. Nazari, D. Hajinezhad, and J. Silva, “Attendlight: Universal attention-based reinforcement learning model for traffic signal control,” arXiv preprint arXiv:2010.05772, 2020.
- H. Zhang, M. Kafouros, and Y. Yu, “Planlight: Learning to optimize traffic signal control with planning and iterative policy improvement,” IEEE Access, vol. 8, pp. 219 244–219 255, 2020.
- T. Chu, J. Wang, L. Codecà, and Z. Li, “Multi-agent deep reinforcement learning for large-scale traffic signal control,” ITS, vol. 21, no. 3, pp. 1086–1095, 2019.
- T. Nishi, K. Otaki, K. Hayakawa, and T. Yoshimura, “Traffic signal control based on reinforcement learning with graph convolutional neural nets,” in IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018.
- H. Wei, N. Xu, H. Zhang, G. Zheng, X. Zang, C. Chen, W. Zhang, Y. Zhu, K. Xu, and Z. Li, “Colight: Learning network-level cooperation for traffic signal control,” in CIKM, 2019.
- Z. Yu, S. Liang, L. Wei, Z. Jin, J. Huang, D. Cai, X. He, and X.-S. Hua, “Macar: Urban traffic light control via active multi-agent communication and action rectification.” in IJCAI, 2020.
- B. Xu, Y. Wang, Z. Wang, H. Jia, and Z. Lu, “Hierarchically and cooperatively learning traffic signal control,” in AAAI, 2021, pp. 669–677.
- H. Zhang, C. Liu, W. Zhang, G. Zheng, and Y. Yu, “Generalight: Improving environment generalization of traffic signal control via meta reinforcement learning,” in CIKM, 2020.
- L. Zintgraf, K. Shiarlis, M. Igl, S. Schulze, Y. Gal, K. Hofmann, and S. Whiteson, “Varibad: A very good method for bayes-adaptive deep rl via meta-learning,” in ICLR, 2019, pp. 1–20.
- H. Yang, D. Shi, C. Zhao, G. Xie, and S. Yang, “Ciexplore: Curiosity and influence-based exploration in multi-agent cooperative scenarios with sparse rewards,” in CIKM, 2021.
- M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, “Unifying count-based exploration and intrinsic motivation,” in NeurIPS, 2016, pp. 1471–1479.
- H. Tang, R. Houthooft, D. Foote, A. Stooke, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel, “# exploration: A study of count-based exploration for deep reinforcement learning,” in NeurIPS, 2017.
- G. Ostrovski, M. G. Bellemare, A. Oord, and R. Munos, “Count-based exploration with neural density models,” in ICML, 2017, pp. 2721–2730.
- Y. Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros, “Large-scale study of curiosity-driven learning,” arXiv preprint arXiv:1808.04355, 2018.
- J. Achiam and S. Sastry, “Surprise-based intrinsic motivation for deep reinforcement learning,” arXiv preprint arXiv:1703.01732, 2017.
- N. Haber, D. Mrowca, L. Fei-Fei, and D. L. Yamins, “Learning to play with intrinsically-motivated self-aware agents,” arXiv preprint arXiv:1802.07442, 2018.
- N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. Ortega, D. Strouse, J. Z. Leibo, and N. De Freitas, “Social influence as intrinsic motivation for multi-agent deep reinforcement learning,” in ICML, 2019, pp. 3040–3049.
- L. M. Zintgraf, L. Feng, C. Lu, M. Igl, K. Hartikainen, K. Hofmann, and S. Whiteson, “Exploration in approximate hyper-state space for meta reinforcement learning,” in ICML, 2021, pp. 12 991–13 001.
- R. Dorfman, I. Shenfeld, and A. Tamar, “Offline meta learning of exploration,” arXiv preprint arXiv:2008.02598, 2020.
- K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen, “Efficient off-policy meta-reinforcement learning via probabilistic context variables,” in ICML, 2019, pp. 5331–5340.
- P.-A. Kamienny, M. Pirotta, A. Lazaric, T. Lavril, N. Usunier, and L. Denoyer, “Learning adaptive exploration strategies in dynamic environments through informed policy regularization,” arXiv preprint arXiv:2005.02934, 2020.
- J. Zhang, J. Wang, H. Hu, T. Chen, Y. Chen, C. Fan, and C. Zhang, “Metacure: Meta reinforcement learning with empowerment-driven exploration,” in ICML, 2021, pp. 12 600–12 610.
- A. Gupta, R. Mendonca, Y. Liu, P. Abbeel, and S. Levine, “Meta-reinforcement learning of structured exploration strategies,” in NeurIPS, 2018, pp. 5302–5311.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv, 2013.
- Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel, “Rl 2: Fast reinforcement learning via slow reinforcement learning,” arXiv preprint arXiv:1611.02779, 2016.
- X. Zhou, W. Wang, T. Wang, Y. Lei, and F. Zhong, “Bayesian reinforcement learning for multi-robot decentralized patrolling in uncertain environments,” IEEE Transactions on Vehicular Technology, vol. 68, no. 12, pp. 11 691–11 703, 2019.
- P. Poupart, N. Vlassis, J. Hoey, and K. Regan, “An analytic solution to discrete bayesian reinforcement learning,” in ICML, 2006, pp. 697–704.
- M. Ghavamzadeh, S. Mannor, J. Pineau, A. Tamar et al., “Bayesian reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, vol. 8, no. 5-6, pp. 359–483, 2015.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv, 2017.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML, 2017, pp. 1126–1135.
- C. Cai, C. K. Wong, and B. G. Heydecker, “Adaptive traffic signal control using approximate dynamic programming,” Transportation Research Part C: Emerging Technologies, vol. 17, no. 5, pp. 456–474, 2009.
- P. Mannion, J. Duggan, and E. Howley, “An experimental review of reinforcement learning algorithms for adaptive traffic signal control,” in Autonomic road transport support systems. Springer, 2016.
- S. G. Rizzo, G. Vantini, and S. Chawla, “Time critic policy gradient methods for traffic signal control in complex and congested scenarios,” in SIGKDD, 2019.
- I. Arel, C. Liu, T. Urbanik, and A. G. Kohls, “Reinforcement learning-based multi-agent system for network traffic signal control,” IET Intelligent Transport Systems, vol. 4, no. 2, pp. 128–135, 2010.