Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report (2404.04106v1)
Abstract: Deep Reinforcement Learning (DRL) offers a powerful approach to training neural network control policies for stochastic queuing networks (SQN). However, traditional DRL methods rely on offline simulations or static datasets, limiting their real-world application in SQN control. This work proposes Online Deep Reinforcement Learning-based Controls (ODRLC) as an alternative, where an intelligent agent interacts directly with a real environment and learns an optimal control policy from these online interactions. SQNs present a challenge for ODRLC due to the unbounded nature of the queues within the network resulting in an unbounded state-space. An unbounded state-space is particularly challenging for neural network policies as neural networks are notoriously poor at extrapolating to unseen states. To address this challenge, we propose an intervention-assisted framework that leverages strategic interventions from known stable policies to ensure the queue sizes remain bounded. This framework combines the learning power of neural networks with the guaranteed stability of classical control policies for SQNs. We introduce a method to design these intervention-assisted policies to ensure strong stability of the network. Furthermore, we extend foundational DRL theorems for intervention-assisted policies and develop two practical algorithms specifically for ODRLC of SQNs. Finally, we demonstrate through experiments that our proposed algorithms outperform both classical control approaches and prior ODRLC algorithms.
- J. G. Dai and Mark Gluzman. 2022. Queueing Network Controls via Deep Reinforcement Learning. Stochastic Systems 12, 1 (March 2022), 30–67.
- Generalization and Regularization in DQN. arXiv:1810.00123
- Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning. Journal of Machine Learning Research 5, Nov (2004), 1471–1530.
- Shengyi Huang and Santiago Ontañón. 2022. A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. The International FLAIRS Conference Proceedings 35 (May 2022). arXiv:2006.14171
- John D. C. Little. 1961. A Proof for the Queuing Formula: L= λ𝜆\lambdaitalic_λ W. Operations Research 9, 3 (1961), 383–387. arXiv:167570
- Average-Reward Reinforcement Learning with Trust Region Methods. arXiv (2021).
- Sean Meyn and Richard L. Tweedie. 2009. Markov Chains and Stochastic Stability (2 ed.). Cambridge University Press, Cambridge.
- Michael Neely. 2010. Stochastic Network Optimization with Application to Communication and Queueing Systems. Synthesis Lectures on Communication Networks 3, 1 (2010), 1–211.
- Tackling Unbounded State Spaces in Continuing Task Reinforcement Learning. arXiv:2306.01896
- Queue-Learning: A Reinforcement Learning Approach for Providing Quality of Service. Proceedings of the AAAI Conference on Artificial Intelligence 35, 1 (May 2021), 461–468.
- Crossing the Reality Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning. IEEE Access 9 (2021), 153171–153187.
- Trust Region Policy Optimization. arXiv:1502.05477
- High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv:1506.02438
- Proximal Policy Optimization Algorithms. arXiv:1707.06347
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second edition ed.). The MIT Press, Cambridge, Massachusetts.
- Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems, Vol. 12. MIT Press.
- L. Tassiulas and A. Ephremides. 1992. Stability Properties of Constrained Queueing Systems and Scheduling Policies for Maximum Throughput in Multihop Radio Networks. IEEE Trans. Automat. Control 37, 12 (1992), 1936–1948.
- Crossing the Gap: A Deep Dive into Zero-Shot Sim-to-Real Transfer for Dynamics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 5372–5379.
- Safe Reinforcement Learning Using Advantage-Based Intervention. arXiv:2106.09110
- Intervention Aided Reinforcement Learning for Safe and Practical Policy Optimization in Navigation. In Proceedings of The 2nd Conference on Robot Learning. PMLR, 410–421.
- A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning. arXiv:1806.07937
- A Study on Overfitting in Deep Reinforcement Learning. arXiv:1804.06893
- Yiming Zhang and Keith W. Ross. 2021. On-Policy Deep Reinforcement Learning for the Average-Reward Criterion. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 12535–12545.