Cascade Reinforcement Learning with State Space Factorization for O-RAN-based Traffic Steering (2312.01970v3)
Abstract: The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility settings in near-real-time, aiming to significantly improve network spectral efficiency. In this paper, we design a novel TS algorithm based on a Cascade Reinforcement Learning (CaRL) framework. We propose state space factorization and policy decomposition to reduce the need for large models and well-labeled datasets. For each sub-state space, an RL sub-policy will be trained to learn an optimized mapping onto the action space. To apply CaRL on new network regions, we propose a knowledge transfer approach to initialize a new sub-policy based on knowledge learned by the trained policies. To evaluate CaRL, we build a data-driven and scalable RIC digital twin (DT) that is modeled using important real-world data, including network configuration, user geo-distribution, and traffic demand, among others, from a tier-1 mobile operator in the US. We evaluate CaRL on two DT scenarios representing two network clusters in two different cities and compare its performance with the business-as-usual (BAU) policy and other competing optimization approaches using heuristic and Q-table algorithms. Benchmarking results show that CaRL performs the best and improves the average cluster-aggregated downlink throughput over the BAU policy by 24% and 18% in these two scenarios, respectively.
- [n.d.]. O-RAN: Towards an Open and Smart RAN. https://www.o-ran.org/resources. Accessed: 2022-7-25.
- Q-Learning based intelligent traffic steering in heterogeneous network. In 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring). IEEE, 1–5.
- ORAN Alliance. 2021. O-RAN Minimum Viable Plan and Acceleration Towards Commercialization. White Paper, June (2021).
- POWDER: Platform for open wireless data-driven experimental research. In Proceedings of the 14th International Workshop on Wireless Network Testbeds, Experimental evaluation & Characterization. 17–24.
- 6G wireless communication systems: Applications, requirements, technologies, challenges, and research directions. IEEE Open Journal of the Communications Society 1 (2020), 957–975.
- What should 6G be? Nature Electronics 3, 1 (2020), 20–29.
- Satellite Integration into 5G: Deep Reinforcement Learning for Network Selection. Machine Intelligence Research 19, 2 (2022), 127–137.
- Value function factorization with dynamic weighting for deep multi-agent reinforcement learning. Information Sciences 615 (2022), 191–208.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126–1135.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
- srsLTE: An open-source platform for LTE evolution and experimentation. In Proceedings of the Tenth ACM International Workshop on Wireless Network Testbeds, Experimental Evaluation, and Characterization. 25–32.
- Deep learning. MIT press.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861–1870.
- Vivisecting Mobility Management in 5G Cellular Networks. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 86–100. https://doi.org/10.1145/3544216.3544217
- Support vector machines. IEEE Intelligent Systems and their applications 13, 4 (1998), 18–28.
- A novel approach for radio resource management in multi-dimensional heterogeneous 5G networks. Journal of Communications and Information Networks 1, 2 (2016), 77–83.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016).
- Menglan Jiang. 2015. Device-controlled traffic steering in mobile networks. In 2015 9th International Conference on Next Generation Mobile Applications, Services and Technologies. IEEE, 7–12.
- The road towards 6G: A comprehensive survey. IEEE Open Journal of the Communications Society 2 (2021), 334–366.
- James M. Joyce. 2011. Kullback-Leibler Divergence. Springer Berlin Heidelberg, Berlin, Heidelberg, 720–722. https://doi.org/10.1007/978-3-642-04898-2_327
- Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
- Programmable and Customized Intelligence for Traffic Steering in 5G Networks Using Open RAN Architectures. arXiv preprint arXiv:2209.14171 (2022).
- A review of applications in federated learning. Computers & Industrial Engineering 149 (2020), 106854.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
- Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1509.02971
- Reinforcement Learning in Factored Action Spaces using Tensor Decompositions. arXiv preprint arXiv:2110.14538 (2021).
- A QoE-driven traffic steering algorithm for LTE networks. IEEE Transactions on Vehicular Technology 68, 11 (2019), 11271–11282.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.
- An introduction to decision tree modeling. Journal of Chemometrics: A Journal of the Chemometrics Society 18, 6 (2004), 275–285.
- Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020).
- Transfer learning for wireless networks: A comprehensive survey. Proc. IEEE (2022).
- OpenAirInterface: A flexible platform for 5G research. ACM SIGCOMM Computer Communication Review 44, 5 (2014), 33–38.
- Intelligent O-RAN for beyond 5G and 6G wireless networks. In 2022 IEEE Globecom Workshops (GC Wkshps). IEEE, 215–220.
- OpenAI. 2022. CHATGPT: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/
- Iterative MPC for Energy Management and Load Balancing in 5G Heterogeneous Networks. In 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 0467–0471.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177 (2019).
- Jan Peters and Stefan Schaal. 2007. Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of the 24th international conference on Machine learning. 745–750.
- Understanding O-RAN: Architecture, interfaces, algorithms, security, and research challenges. IEEE Communications Surveys & Tutorials (2023).
- Traffic steering and network selection in 5G networks based on reinforcement learning. In 2020 European Control Conference (ECC). IEEE, 595–601.
- Challenge: COSMOS: A city-scale programmable testbed for experimentation with advanced wireless. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–13.
- George F Riley and Thomas R Henderson. 2010. The ns-3 network simulator. In Modeling and tools for network simulation. Springer, 15–34.
- Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347
- Learning to factor policies and action-value functions: Factored action space representations for deep reinforcement learning. arXiv preprint arXiv:1705.07269 (2017).
- Deterministic policy gradient algorithms. In International conference on machine learning. PMLR, 387–395.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning. PMLR, 5887–5896.
- Fed2KD: Heterogeneous Federated Learning for Pandemic Risk Assessment via Two-Way Knowledge Distillation. In 2022 17th Wireless On-Demand Network Systems and Services Conference (WONS). IEEE, 1–8.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems 12 (1999).
- Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial intelligence review 18 (2002), 77–95.
- Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8 (1992), 279–292.
- State-space decomposition for Reinforcement Learning. Dept. Comput., Imperial College London, London, UK, Rep (2021).
- Mobile Access Bandwidth in Practice: Measurement, Analysis, and Implications. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 114–128. https://doi.org/10.1145/3544216.3544237
- Understanding 5G Performance for Real-World Services: A Content Provider’s Perspective. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 101–113. https://doi.org/10.1145/3544216.3544219
- Factorized q-learning for large-scale multi-agent systems. In Proceedings of the first international conference on distributed artificial intelligence. 1–7.
- A comprehensive survey on transfer learning. Proc. IEEE 109, 1 (2020), 43–76.
Collections
Sign up for free to add this paper to one or more collections.