Multi-Objective Optimization Using Adaptive Distributed Reinforcement Learning (2403.08879v1)
Abstract: The Intelligent Transportation System (ITS) environment is known to be dynamic and distributed, where participants (vehicle users, operators, etc.) have multiple, changing and possibly conflicting objectives. Although Reinforcement Learning (RL) algorithms are commonly applied to optimize ITS applications such as resource management and offloading, most RL algorithms focus on single objectives. In many situations, converting a multi-objective problem into a single-objective one is impossible, intractable or insufficient, making such RL algorithms inapplicable. We propose a multi-objective, multi-agent reinforcement learning (MARL) algorithm with high learning efficiency and low computational requirements, which automatically triggers adaptive few-shot learning in a dynamic, distributed and noisy environment with sparse and delayed reward. We test our algorithm in an ITS environment with edge cloud computing. Empirical results show that the algorithm is quick to adapt to new environments and performs better in all individual and system metrics compared to the state-of-the-art benchmark. Our algorithm also addresses various practical concerns with its modularized and asynchronous online training method. In addition to the cloud simulation, we test our algorithm on a single-board computer and show that it can make inference in 6 milliseconds.
- P. Arthurs, L. Gillam, P. Krause, N. Wang, K. Halder, and A. Mouzakitis, “A taxonomy and survey of edge cloud computing for intelligent transportation systems and connected vehicles,” IEEE Transactions on Intelligent Transportation Systems, 2021.
- K. Xiong, S. Leng, C. Huang, C. Yuen, and Y. L. Guan, “Intelligent task offloading for heterogeneous v2x communications,” IEEE Transactions on Intelligent Transportation Systems, 2020.
- M. Bowling and M. Veloso, “An analysis of stochastic game theory for multiagent reinforcement learning,” Carnegie-Mellon Univ Pittsburgh Pa School of Computer Science, Tech. Rep., 2000.
- A. Haydari and Y. Yılmaz, “Deep reinforcement learning for intelligent transportation systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, 2020.
- I. Althamary, C.-W. Huang, and P. Lin, “A survey on multi-agent reinforcement learning methods for vehicular networks,” in IEEE IWCMC, 2019.
- J.-H. Cho, Y. Wang, R. Chen, K. S. Chan, and A. Swami, “A survey on modeling and optimizing multi-objective systems,” IEEE Communications Surveys & Tutorials, 2017.
- Q. H. Ansari, E. Köbis, and J.-C. Yao, “Vector variational inequalities and vector optimization,” Springer, 2018.
- C. F. Hayes, R. Rădulescu, E. Bargiacchi, J. Källström, M. Macfarlane, M. Reymond, T. Verstraeten, L. M. Zintgraf, R. Dazeley, F. Heintz et al., “A practical guide to multi-objective reinforcement learning and planning,” AAMAS, 2022.
- M. A. Khamis and W. Gomaa, “Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework,” Engineering Applications of Artificial Intelligence, 2014.
- H. A. Aziz, F. Zhu, and S. V. Ukkusuri, “Learning-based traffic signal control algorithms with neighborhood information sharing: An application for sustainable mobility,” Journal of Intelligent Transportation Systems, 2018.
- V. Pandey, E. Wang, and S. D. Boyles, “Deep reinforcement learning algorithm for dynamic pricing of express lanes with multiple access locations,” Transportation Research Part C: Emerging Technologies, 2020.
- S. Parisi, M. Pirotta, N. Smacchia, L. Bascetta, and M. Restelli, “Policy gradient approaches for multi-objective sequential decision making,” in IEEE IJCNN, 2014.
- M. Pirotta, S. Parisi, and M. Restelli, “Multi-objective reinforcement learning with continuous pareto frontier approximation,” in AAAI, 2015.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML, 2017.
- “Moody source,” https://github.com/moodysourcecode/moody.
- Z. Ning, K. Zhang, X. Wang, L. Guo, X. Hu, J. Huang, B. Hu, and R. Y. Kwok, “Intelligent edge computing in internet of vehicles: a joint computation offloading and caching solution,” IEEE Transactions on Intelligent Transportation Systems, 2020.
- G. Ma, X. Wang, M. Hu, W. Ouyang, X. Chen, and Y. Li, “Drl-based computation offloading with queue stability for vehicular-cloud-assisted mobile edge computing systems,” IEEE Transactions on Intelligent Vehicles, 2022.
- X. Zhu, Y. Luo, A. Liu, N. N. Xiong, M. Dong, and S. Zhang, “A deep reinforcement learning-based resource management game in vehicular edge computing,” IEEE Transactions on Intelligent Transportation Systems, 2021.
- L. Liu, J. Feng, X. Mu, Q. Pei, D. Lan, and M. Xiao, “Asynchronous deep reinforcement learning for collaborative task computing and on-demand resource allocation in vehicular edge computing,” IEEE Transactions on Intelligent Transportation Systems, 2023.
- R. Bajracharya, R. Shrestha, S. A. Hassan, K. Konstantin, and H. Jung, “Dynamic pricing for intelligent transportation system in the 6g unlicensed band,” IEEE Transactions on Intelligent Transportation Systems, 2021.
- S. Xia, Z. Yao, G. Wu, and Y. Li, “Distributed offloading for cooperative intelligent transportation under heterogeneous networks,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- D. Wei, J. Zhang, M. Shojafar, S. Kumari, N. Xi, and J. Ma, “Privacy-aware multiagent deep reinforcement learning for task offloading in vanet,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- Y. Ju, Y. Chen, Z. Cao, L. Liu, Q. Pei, M. Xiao, K. Ota, M. Dong, and V. C. Leung, “Joint secure offloading and resource allocation for vehicular edge computing network: A multi-agent deep reinforcement learning approach,” IEEE Transactions on Intelligent Transportation Systems, 2023.
- X. Xu, C. Yang, M. Bilal, W. Li, and H. Wang, “Computation offloading for energy and delay trade-offs with traffic flow prediction in edge computing-enabled iov,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- L. Yao, X. Xu, M. Bilal, and H. Wang, “Dynamic edge computation offloading for internet of vehicles with deep reinforcement learning,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- L. S. Shapley and F. D. Rigby, “Equilibrium points in games with vector payoffs,” Naval Research Logistics Quarterly, 1959.
- F. Patrone, L. Pusillo, and S. Tijs, “Multicriteria games and potentials,” Top, 2007.
- A.-I. Mouaddib, M. Boussard, and M. Bouzid, “Towards a formal framework for multi-objective multiagent planning,” in AAMAS, 2007.
- P. Perny, P. Weng, J. Goldsmith, and J. Hanna, “Approximation of lorenz-optimal solutions in multiobjective markov decision processes,” in Conference on Uncertainty in Artificial Intelligence, 2013.
- C. Jonker, R. Aydogan, T. Baarslag, K. Fujita, T. Ito, and K. Hindriks, “Automated negotiating agents competition (anac),” in AAAI, 2017.
- G. D. O. Ramos, R. Radulescu, and A. Nowe, “A budged-balanced tolling scheme for efficient equilibria under heterogeneous preferences,” in AAMAS ALA workshop, 2019.
- A. Bousia, E. Kartsakli, A. Antonopoulos, L. Alonso, and C. Verikoukis, “Multiobjective auction-based switching-off scheme in heterogeneous networks: To bid or not to bid?” IEEE Transactions on Vehicular Technology, 2016.
- H. Gedawy, K. Habak, K. A. Harras, and M. Hamdi, “Ramos: A resource-aware multi-objective system for edge computing,” IEEE Transactions on Mobile Computing, 2021.
- Z. Li and Z. Ding, “Distributed multiobjective optimization for network resource allocation of multiagent systems,” IEEE Transactions on Cybernetics, 2021.
- R. Wang et al., “Wang, rui and zhang, qingfu and zhang, tao,” IEEE Transactions on Evolutionary Computation, 2016.
- Y. Sun, S. Zhou, and Z. Niu, “Distributed task replication for vehicular edge computing: Performance analysis and learning-based algorithm,” IEEE Transactions on Wireless Communications, 2021.
- M. Kayaalp, S. Vlaski, and A. H. Sayed, “Dif-maml: Decentralized multi-agent meta-learning,” IEEE Open Journal of Signal Processing, 2022.
- M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel, “Continuous adaptation via meta-learning in nonstationary and competitive environments,” in ICLR, 2018.
- J. Tan, R. Khalili, H. Karl, and A. Hecker, “Multi-agent distributed reinforcement learning for making decentralized offloading decisions,” IEEE INFOCOM, 2022.
- C. Raquel and X. Yao, “Dynamic multi-objective optimization: a survey of the state-of-the-art,” in Evolutionary computation for dynamic optimization problems, 2013.
- X. Lian, W. Zhang, C. Zhang, and J. Liu, “Asynchronous decentralized parallel stochastic gradient descent,” in ICML, 2018.
- J. Tan, R. Khalili, and H. Karl, “Learning to bid long-term: Multi-agent reinforcement learning with long-term and sparse reward in repeated auction games,” in AAAI RLG Workshop, 2022.
- J. Heinrich, M. Lanctot, and D. Silver, “Fictitious self-play in extensive-form games,” in ICML, 2015.
- D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in ICML, 2017.
- A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv preprint arXiv:1803.02999, 2018.
- M. Whaiduzzaman, M. Sookhak, A. Gani, and R. Buyya, “A survey on vehicular cloud computing,” Journal of Network and Computer applications, 2014.
- “C-v2x use cases volume ii: Examples and service level requirements,” 5GAA Automotive Association, 2020.
- “Malfoy source,” https://github.com/DRACOsource/malfoy.
- H. Shen and L. Chen, “A resource usage intensity aware load balancing method for virtual machine migration in cloud datacenters,” IEEE Trans. on Cloud Computing, 2020.
- M. Behrisch, L. Bieker, J. Erdmann, and D. Krajzewicz, “Sumo–simulation of urban mobility: an overview,” in SIMUL, 2011.
- R. K. Jain, D.-M. W. Chiu, W. R. Hawe et al., “A quantitative measure of fairness and discrimination,” Eastern Research Laboratory, 1984.
- S. A. Rhoades, “The herfindahl-hirschman index,” Fed. Res. Bull., 1993.
- Z. Shah, S. Rau, and A. Baig, “Throughput comparison of ieee 802.11 ac and ieee 802.11 n in an indoor environment with interference,” in IEEE ITNAC, 2015.
- J.-h. Ryu, S. Kim, and H. Wan, “Pareto front approximation with adaptive weighted sum method in multiobjective simulation optimization,” in Proceedings of the 2009 Winter Simulation Conference (WSC). IEEE, 2009.
- E. Khorram, K. Khaledian, and M. Khaledyan, “A numerical method for constructing the pareto front of multi-objective optimization problems,” Journal of Computational and Applied Mathematics, 2014.