Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks (2404.16920v1)
Abstract: We study the data packet transmission problem (mmDPT) in dense cell-free millimeter wave (mmWave) networks, i.e., users sending data packet requests to access points (APs) via uplinks and APs transmitting requested data packets to users via downlinks. Our objective is to minimize the average delay in the system due to APs' limited service capacity and unreliable wireless channels between APs and users. This problem can be formulated as a restless multi-armed bandits problem with fairness constraint (RMAB-F). Since finding the optimal policy for RMAB-F is intractable, existing learning algorithms are computationally expensive and not suitable for practical dynamic dense mmWave networks. In this paper, we propose a structured reinforcement learning (RL) solution for mmDPT by exploiting the inherent structure encoded in RMAB-F. To achieve this, we first design a low-complexity and provably asymptotically optimal index policy for RMAB-F. Then, we leverage this structure information to develop a structured RL algorithm called mmDPT-TS, which provably achieves an \tilde{O}(\sqrt{T}) Bayesian regret. More importantly, mmDPT-TS is computation-efficient and thus amenable to practical implementation, as it fully exploits the structure of index policy for making decisions. Extensive emulation based on data collected in realistic mmWave networks demonstrate significant gains of mmDPT-TS over existing approaches.
- T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5g cellular: It will work!” IEEE Access, vol. 1, pp. 335–349, 2013.
- J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong, and J. C. Zhang, “What will 5g be?” IEEE Journal on selected areas in communications, vol. 32, no. 6, pp. 1065–1082, 2014.
- M. Agiwal, A. Roy, and N. Saxena, “Next generation 5g wireless networks: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 3, pp. 1617–1655, 2016.
- S. Jog, J. Wang, H. Hassanieh, and R. R. Choudhury, “Enabling dense spatial reuse in mmwave networks,” in Proc. of ACM SIGCOMM Posters and Demos, 2018.
- O. Abari, D. Bharadia, A. Duffield, and D. Katabi, “Enabling high-quality untethered virtual reality,” in Proc. of USENIX NSDI, 2017.
- Y. Niu, Y. Li, D. Jin, L. Su, and A. V. Vasilakos, “A survey of millimeter wave communications (mmwave) for 5g: opportunities and challenges,” Wireless networks, vol. 21, no. 8, pp. 2657–2676, 2015.
- D. Liu, L. Wang, Y. Chen, M. Elkashlan, K.-K. Wong, R. Schober, and L. Hanzo, “User association in 5g networks: A survey and an outlook,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, 2016.
- M. Mezzavilla, M. Zhang, M. Polese, R. Ford, S. Dutta, S. Rangan, and M. Zorzi, “End-to-end simulation of 5g mmwave networks,” IEEE Communications Surveys & Tutorials, vol. 20, no. 3, pp. 2237–2263, 2018.
- X. Wang, L. Kong, F. Kong, F. Qiu, M. Xia, S. Arnon, and G. Chen, “Millimeter wave communication: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 20, no. 3, pp. 1616–1653, 2018.
- X. Li, T. N. Guo, and A. B. Mackenzie, “Multi-agent reinforcement learning with measured difference reward for multi-association in ultra-dense mmwave network,” IEEE Access, vol. 10, pp. 118 747–118 758, 2022.
- W. Feng, Y. Wang, D. Lin, N. Ge, J. Lu, and S. Li, “When mmwave communications meet network densification: A scalable interference coordination perspective,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 7, pp. 1459–1471, 2017.
- G. Yang, M. Xiao, and H. V. Poor, “Low-latency millimeter-wave communications: Traffic dispersion or network densification?” IEEE Transactions on Communications, vol. 66, no. 8, pp. 3526–3539, 2018.
- S. K. Singh, V. S. Borkar, and G. Kasbekar, “User association in dense mmwave networks as restless bandits,” IEEE Transactions on Vehicular Technology, 2022.
- X. Liu, E. K. Chong, and N. B. Shroff, “A framework for opportunistic scheduling in wireless networks,” Computer networks, vol. 41, no. 4, pp. 451–474, 2003.
- I.-H. Hou, V. Borkar, and P. Kumar, “A theory of qos for wireless,” in Proc. of IEEE INFOCOM, 2009.
- T. Lan, D. Kao, M. Chiang, and A. Sabharwal, “An axiomatic theory of fairness in network resource allocation,” in Proc. of IEEE INFOCOM, 2010.
- F. Li, J. Liu, and B. Ji, “Combinatorial sleeping bandits with fairness constraints,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 3, pp. 1799–1813, 2019.
- P. Whittle, “Restless Bandits: Activity Allocation in A Changing World,” Journal of Applied Probability, pp. 287–298, 1988.
- R. Ortner, D. Ryabko, P. Auer, and R. Munos, “Regret Bounds for Restless Markov Bandits,” in Proc. of ALT, 2012.
- Y. H. Jung and A. Tewari, “Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems,” Proc. of NeurIPS, 2019.
- N. Akbarzadeh and A. Mahajan, “On learning whittle index policy for restless bandits with scalable regret,” arXiv preprint arXiv:2202.03463, 2022.
- J. Fu, Y. Nazarathy, S. Moka, and P. G. Taylor, “Towards Q-Learning the Whittle Index for Restless Bandits,” in Proc. of Australian & New Zealand Control Conference, 2019.
- K. E. Avrachenkov and V. S. Borkar, “Whittle index based q-learning for restless bandits with average reward,” Automatica, vol. 139, 2022.
- J. A. Killian, A. Biswas, S. Shah, and M. Tambe, “Q-learning lagrange policies for multi-action restless bandits,” in Proc. ACM SIGKDD, 2021.
- G. Xiong, J. Li, and R. Singh, “Reinforcement Learning Augmented Asymptotically Optimal Index Policies for Finite-Horizon Restless Bandits,” in Proc. of AAAI, 2022.
- G. Xiong, X. Qin, B. Li, R. Singh, and J. Li, “Index-aware reinforcement learning for adaptive video streaming at the wireless edge,” in Proc. of ACM MobiHoc, 2022.
- S. Wang, L. Huang, and J. Lui, “Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits,” in Proc. of NeurIPS, 2020.
- G. Xiong, S. Wang, and J. Li, “Learning infinite-horizon average-reward restless multi-action bandits via index awareness,” in Proc. of NeurIPS, 2022.
- Y. Sun, G. Feng, S. Qin, and S. Sun, “Cell association with user behavior awareness in heterogeneous cellular networks,” IEEE Transactions on Vehicular Technology, vol. 67, no. 5, pp. 4589–4601, 2018.
- X. Zhang, S. Sarkar, A. Bhuyan, S. K. Kasera, and M. Ji, “A q-learning-based approach for distributed beam scheduling in mmwave networks,” in Proc. of IEEE DySPAN, 2021.
- M. G. Dogan, Y. H. Ezzeldin, C. Fragouli, and A. W. Bohannon, “A reinforcement learning approach for scheduling in mmwave networks,” in Proc. of IEEE MILCOM, 2021.
- T. H. L. Dinh, M. Kaneko, K. Wakao, K. Kawamura, T. Moriyama, H. Abeysekera, and Y. Takatori, “Deep reinforcement learning-based user association in sub6ghz/mmwave integrated networks,” in Proc. of IEEE CCNC, 2021.
- G. Yao, M. Hashemi, R. Singh, and N. B. Shroff, “Delay-optimal scheduling for integrated mmwave sub-6 ghz systems with markovian blockage model,” IEEE Transactions on Mobile Computing, 2022.
- Y. Cao, B. Sun, and D. H. Tsang, “Delay-aware scheduling over mmwave/sub-6 dual interfaces: A reinforcement learning approach,” in 2020 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, 2020, pp. 1–6.
- D. C. Araújo and A. L. de Almeida, “Beam management solution using q-learning framework,” in 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP). IEEE, 2019, pp. 594–598.
- Y.-N. R. Li, B. Gao, X. Zhang, and K. Huang, “Beam management in millimeter-wave communications for 5g and beyond,” IEEE Access, vol. 8, pp. 13 282–13 293, 2020.
- I. M. Verloop, “Asymptotically Optimal Priority Policies for Indexable and Nonindexable Restless Bandits,” The Annals of Applied Probability, vol. 26, no. 4, pp. 1947–1995, 2016.
- Y. Zou, K. T. Kim, X. Lin, and M. Chiang, “Minimizing Age-of-Information in Heterogeneous Multi-Channel Systems: A New Partial-Index Approach,” in Proc. of ACM MobiHoc, 2021.
- W. Hu and P. Frazier, “An Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits,” arXiv preprint arXiv:1707.00205, 2017.
- G. Zayas-Cabán, S. Jasin, and G. Wang, “An Asymptotically Optimal Heuristic for General Nonstationary Finite-Horizon Restless Multi-Armed, Multi-Action Bandits,” Advances in Applied Probability, vol. 51, no. 3, pp. 745–772, 2019.
- X. Zhang and P. I. Frazier, “Restless bandits with many arms: Beating the central limit theorem,” arXiv preprint arXiv:2107.11911, 2021.
- G. Xiong, S. Wang, G. Yan, and J. Li, “Reinforcement learning for dynamic dimensioning of cloud caches: A restless bandit approach,” in Proc. of IEEE INFOCOM, 2022.
- D. J. Hodge and K. D. Glazebrook, “On the asymptotic optimality of greedy index heuristics for multi-action restless bandits,” Advances in Applied Probability, vol. 47, no. 3, pp. 652–667, 2015.
- K. D. Glazebrook, D. J. Hodge, and C. Kirkbride, “General notions of indexability for queueing control and asset management,” The Annals of Applied Probability, vol. 21, no. 3, pp. 876–907, 2011.
- J. Niño-Mora, “Dynamic Priority Allocation via Restless Bandit Marginal Productivity Indices,” Top, vol. 15, no. 2, pp. 161–198, 2007.
- N. Karmarkar, “A new polynomial-time algorithm for linear programming,” in Proc. of ACM STOC, 1984.
- J. Dunagan and S. Vempala, “A simple polynomial-time rescaling algorithm for solving linear programs,” in Proc. of ACM STOC, 2004.
- J. A. Kelner and D. A. Spielman, “A randomized polynomial-time simplex algorithm for linear programming,” in Proc. of ACM STOC, 2006.
- Gurobi Optimizer, https://www.gurobi.com/solutions/gurobi-optimizer/.
- R. R. Weber and G. Weiss, “On An Index Policy for Restless Bandits,” Journal of Applied Probability, pp. 637–648, 1990.
- Y. Ouyang, M. Gagrani, A. Nayyar, and R. Jain, “Learning unknown markov decision processes: A thompson sampling approach,” Advances in neural information processing systems, vol. 30, 2017.
- P. L. Bartlett and A. Tewari, “Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps,” arXiv preprint arXiv:1205.2661, 2012.
- “IEEE 802.11ad-2012,” https://standards.ieee.org/ieee/802.11ad/4527/, Accessed:08-July-2022.
- C. R. da Silva, A. Lomayev, C. Chen, and C. Cordeiro, “Analysis and simulation of the ieee 802.11 ay single-carrier phy,” in Proc. of IEEE ICC, 2018.
- V. S. Borkar and S. Pattathil, “Whittle indexability in egalitarian processor sharing systems,” Annals of Operations Research, vol. 317, no. 2, pp. 417–437, 2022.
- N. Gast and G. Bruno, “A mean field model of work stealing in large-scale systems,” ACM SIGMETRICS Performance Evaluation Review, vol. 38, no. 1, pp. 13–24, 2010.