Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalized Multi-Objective Reinforcement Learning with Envelope Updates in URLLC-enabled Vehicular Networks (2405.11331v3)

Published 18 May 2024 in cs.LG, cs.AI, and cs.NI

Abstract: We develop a novel multi-objective reinforcement learning (MORL) framework to jointly optimize wireless network selection and autonomous driving policies in a multi-band vehicular network operating on conventional sub-6GHz spectrum and Terahertz frequencies. The proposed framework is designed to 1. maximize the traffic flow and minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration), and 2. enhance the ultra-reliable low-latency communication (URLLC) while minimizing handoffs (HOs). We cast this problem as a multi-objective Markov Decision Process (MOMDP) and develop solutions for both predefined and unknown preferences of the conflicting objectives. Specifically, we develop a novel envelope MORL solution which develops policies that address multiple objectives with unknown preferences to the agent. While this approach reduces reliance on scalar rewards, policy effectiveness varying with different preferences is a challenge. To address this, we apply a generalized version of the BeLLMan equation and optimize the convex envelope of multi-objective Q values to learn a unified parametric representation capable of generating optimal policies across all possible preference configurations. Following an initial learning phase, our agent can execute optimal policies under any specified preference or infer preferences from minimal data samples. Numerical results validate the efficacy of the envelope-based MORL solution and demonstrate interesting insights related to the inter-dependency of vehicle motion dynamics, HOs, and the communication data rate. The proposed policies enable autonomous vehicles (AVs) to adopt safe driving behaviors with improved connectivity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Z. Yan and H. Tabassum, “Reinforcement learning for joint v2i network selection and autonomous driving policies,” in GLOBECOM 2022 - 2022 IEEE Global Commun. Conf., 2022, pp. 1241–1246.
  2. H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep reinforcement learning based resource allocation for V2V communications,” IEEE Trans. on Vehicular Tech., vol. 68, no. 4, pp. 3163–3173, 2019.
  3. L. Liang, H. Ye, and G. Y. Li, “Spectrum sharing in vehicular networks based on multi-agent reinforcement learning,” IEEE Journal on Sel. Areas in Commun., vol. 37, no. 10, pp. 2282–2292, 2019.
  4. Y. Xu, K. Zhu, H. Xu, and J. Ji, “Deep reinforcement learning for multi-objective resource allocation in multi-platoon cooperative vehicular networks,” IEEE Trans. on Wireless Commun., 2023.
  5. X. Hu, Y. Zhang, X. Liao, Z. Liu, W. Wang, and F. M. Ghannouchi, “Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems,” IEEE Trans. on Broadcasting, vol. 66, no. 3, pp. 630–646, 2020.
  6. G. Yu, Y. Jiang, L. Xu, and G. Y. Li, “Multi-objective energy-efficient resource allocation for multi-rat heterogeneous networks,” IEEE Journal on Sel. Areas in Commun., vol. 33, no. 10, pp. 2118–2127, 2015.
  7. R. Devarajan, S. C. Jha, U. Phuyal, and V. K. Bhargava, “Energy-aware resource allocation for cooperative cellular network using multi-objective optimization approach,” IEEE Trans. on Wireless Commun., vol. 11, no. 5, pp. 1797–1807, 2012.
  8. D. Guo, L. Tang, X. Zhang, and Y.-C. Liang, “Joint optimization of handover control and power allocation based on multi-agent deep reinforcement learning,” IEEE Trans. on Vehicular Tech., vol. 69, no. 11, pp. 13 124–13 138, 2020.
  9. H. Khan, A. Elgabli, S. Samarakoon, M. Bennis, and C. S. Hong, “Reinforcement learning-based vehicle-cell association algorithm for highly mobile millimeter wave communication,” IEEE Trans. on Cognitive Commun. and Networking, vol. 5, no. 4, pp. 1073–1085, 2019.
  10. Z. Wu, K. Qiu, and H. Gao, “Driving policies of V2X autonomous vehicles based on reinforcement learning methods,” IET Intelligent Transport Systems, vol. 14, no. 5, pp. 331–337, 2020.
  11. Z. Yan, W. Jaafar, B. Selim, and H. Tabassum, “Multi-uav speed control with collision avoidance and handover-aware cell association: Drl with action branching,” in GLOBECOM 2023 - 2023 IEEE Global Commun. Conf., 2023, pp. 5067–5072.
  12. X. Liu, Y. Liu, Y. Chen, and L. Hanzo, “Enhancing the fuel-economy of v2i-assisted autonomous driving: A reinforcement learning approach,” IEEE Trans. on Vehicular Tech., vol. 69, no. 8, pp. 8329–8342, 2020.
  13. A. Alizadeh, M. Moghadam, Y. Bicer, N. K. Ure, U. Yavas, and C. Kurtulus, “Automated lane change decision making using deep reinforcement learning in dynamic and uncertain highway environment,” in 2019 IEEE intelligent transportation systems conference (ITSC).   IEEE, 2019, pp. 1399–1404.
  14. X. He and C. Lv, “Towards energy-efficient autonomous driving: A multi-objective reinforcement learning approach,” IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 5, pp. 1329–1331, 2023.
  15. W. Wei, R. Yang, H. Gu, W. Zhao, C. Chen, and S. Wan, “Multi-objective optimization for resource allocation in vehicular cloud computing networks,” IEEE Trans. on Intelligent Transportation Systems, vol. 23, no. 12, pp. 25 536–25 545, 2021.
  16. E. Leurent, “An environment for autonomous driving decision-making,” https://github.com/eleurent/highway-env, 2018.
  17. J. Sayehvand and H. Tabassum, “Interference and coverage analysis in coexisting rf and dense terahertz wireless networks,” IEEE Wireless Commun. Letters, vol. 9, no. 10, pp. 1738–1742, 2020.
  18. C. She, C. Sun, Z. Gu, Y. Li, C. Yang, H. V. Poor, and B. Vucetic, “A tutorial on ultrareliable and low-latency communications in 6g: Integrating domain knowledge into deep learning,” Proceedings of the IEEE, vol. 109, no. 3, pp. 204–246, 2021.
  19. Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
  20. P. Polack, F. Altché, B. d’Andréa Novel, and A. de La Fortelle, “The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles?” in 2017 IEEE intelligent vehicles symposium (IV).   IEEE, 2017, pp. 812–818.
  21. E. Leurent, “Safe and efficient reinforcement learning for behavioural planning in autonomous driving,” Ph.D. dissertation, Université de Lille, 2020.
  22. M. Treiber and A. Kesting, “Traffic flow dynamics,” Traffic Flow Dynamics: Data, Models and Simulation, Springer-Verlag Berlin Heidelberg, pp. 187–201, 2013.
  23. A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model mobil for car-following models,” Transportation Research Record, vol. 1999, no. 1, pp. 86–94, 2007.
  24. R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaptation,” Advances in neural information processing systems, vol. 32, 2019.
  25. H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
  26. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
  27. H.-M. Chen, S.-F. Wang, P. Wang, S. Lin, and C. Fang, “Deep Q-learning for intelligent band coordination in 5g heterogeneous network supporting v2x communication,” Wireless Commun. and Mobile Computing, 2022.
  28. Y. Hou, L. Liu, Q. Wei, X. Xu, and C. Chen, “A novel DDPG method with prioritized experience replay,” in IEEE Intl. Conf. on Systems, Man, and Cybernetics (SMC).   IEEE, 2017, pp. 316–321.
  29. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Advances in neural information processing systems, vol. 30, 2017.
  30. L. N. Alegre, F. Felten, E.-G. Talbi, G. Danoy, A. Nowé, A. L. C. Bazzan, and B. C. da Silva, “MO-Gym: A library of multi-objective reinforcement learning environments,” in Proceedings of the 34th Benelux Conf. on Artificial Intelligence BNAIC/Benelearn 2022, 2022.
  31. E. Leurent, “rl-agents: Implementations of reinforcement learning algorithms,” https://github.com/eleurent/rl-agents, 2018.
  32. F. Felten, L. N. Alegre, A. Nowé, A. L. C. Bazzan, E. G. Talbi, G. Danoy, and B. C. d. Silva, “A toolkit for reliable benchmarking and research in multi-objective reinforcement learning,” in Proceedings of the 37th Conf. on Neural Information Processing Systems (NeurIPS 2023), 2023.
  33. Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in International conference on machine learning.   PMLR, 2016, pp. 1995–2003.
  34. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com