Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
86 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
52 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation (2404.09188v1)

Published 14 Apr 2024 in eess.SY and cs.SY

Abstract: Learning-based approaches are increasingly popular for traffic control problems. However, these approaches are applied typically as black boxes with limited theoretical guarantees and interpretability. In this paper, we consider the theory of dynamic routing over parallel servers, a representative traffic control task, using semi-gradient on-policy control algorithm, a representative reinforcement learning method. We consider a linear value function approximation on an infinite state space; a Lyapunov function is also derived from the approximator. In particular, the structure of the approximator naturally makes possible idling policies, which is an interesting and useful advantage over existing dynamic routing schemes. We show that the convergence of the approximation weights is coupled with the convergence of the traffic state. We show that if the system is stabilizable, then (i) the weight vector converges to a bounded region, and (ii) the traffic state is bounded in the mean. We also empirically show that the proposed algorithm is computationally efficient with an insignificant optimality gap.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. J. G. Dai and M. Gluzman, “Queueing network controls via deep reinforcement learning,” Stochastic Systems, vol. 12, no. 1, pp. 30–67, 2022.
  2. Q. Xie and L. Jin, “Stabilizing queuing networks with model data-independent control,” IEEE Transactions on Control of Network Systems, vol. 9, no. 3, pp. 1317–1326, 2022.
  3. P. Kumar and S. P. Meyn, “Stability of queueing networks and scheduling policies,” IEEE Transactions on Automatic Control, vol. 40, no. 2, pp. 251–260, 1995.
  4. J. G. Dai and S. P. Meyn, “Stability and convergence of moments for multiclass queueing networks via fluid limit models,” IEEE Transactions on Automatic Control, vol. 40, no. 11, pp. 1889–1904, 1995.
  5. S. Bradtke and M. Duff, “Reinforcement learning methods for continuous-time markov decision problems,” Advances in Neural Information Processing Systems, vol. 7, 1994.
  6. B. Liu, Q. Xie, and E. Modiano, “Rl-qn: A reinforcement learning framework for optimal control of queueing systems,” ACM Transactions on Modeling and Performance Evaluation of Computing Systems, vol. 7, no. 1, pp. 1–35, 2022.
  7. Z. Xu, J. Tang, J. Meng, W. Zhang, Y. Wang, C. H. Liu, and D. Yang, “Experience-driven networking: A deep reinforcement learning based approach,” in IEEE INFOCOM 2018-IEEE Conference on Computer Communications.   IEEE, 2018, pp. 1871–1879.
  8. S.-C. Lin, I. F. Akyildiz, P. Wang, and M. Luo, “Qos-aware adaptive routing in multi-layer hierarchical software defined networks: A reinforcement learning approach,” in 2016 IEEE International Conference on Services Computing (SCC).   IEEE, 2016, pp. 25–33.
  9. G. J. Gordon, “Reinforcement learning with function approximation converges to a region,” Advances in Neural Information Processing Systems, vol. 13, 2000.
  10. S. Zhang, R. T. Des Combes, and R. Laroche, “On the convergence of sarsa with linear function approximation,” in International Conference on Machine Learning.   PMLR, 2023, pp. 41 613–41 646.
  11. D. P. De Farias and B. Van Roy, “On the existence of fixed points for approximate value iteration and temporal-difference learning,” Journal of Optimization Theory and Applications, vol. 105, pp. 589–608, 2000.
  12. S. P. Meyn and R. L. Tweedie, “Stability of markovian processes iii: Foster–lyapunov criteria for continuous-time processes,” Advances in Applied Probability, vol. 25, no. 3, p. 518–548, 1993.
  13. J. Tsitsiklis and B. Van Roy, “Analysis of temporal-diffference learning with function approximation,” Advances in Neural Information Processing Systems, vol. 9, 1996.
  14. L. Georgiadis, M. J. Neely, L. Tassiulas et al., “Resource allocation and cross-layer control in wireless networks,” Foundations and Trends® in Networking, vol. 1, no. 1, pp. 1–144, 2006.
  15. F. S. Melo, S. P. Meyn, and M. I. Ribeiro, “An analysis of reinforcement learning with function approximation,” in Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 664–671.
  16. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  17. “Google Colaboratory,” https://colab.research.google.com/, accessed: 2023-03-28.

Summary

We haven't generated a summary for this paper yet.