Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems (2402.01147v2)

Published 2 Feb 2024 in cs.LG and cs.PF

Abstract: We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ~30% over the greedy policy that routes to the fastest available server.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506.
  2. Amazon (2023). Aws lambda.
  3. Ayesta, U. (2022). Reinforcement learning in queues. Queueing Systems, 100(3-4):497–499.
  4. Global optimality guarantees for policy gradient methods. arXiv preprint arXiv:1906.01786.
  5. A finite-time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR.
  6. Queueing network controls via deep reinforcement learning. Stochastic Systems, 12(1):30–67.
  7. On the incomplete results for the heterogeneous server problem. Queueing Systems, 52(3):189.
  8. DeepMind, G. (2023). Deepmind ai reduces google data centre cooling bill by 40
  9. Google (2023a). Cloud tensor processing units.
  10. Google (2023b). Gpu platforms.
  11. Harchol-Balter, M. (2013). Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press.
  12. Actor-critic algorithms. In Solla, S., Leen, T., and Müller, K., editors, Advances in Neural Information Processing Systems, volume 12. MIT Press.
  13. Koole, G. (1995). A simple proof of the optimality of a threshold policy in a two-server queueing system. Systems and Control Letters, 26(5):301–303.
  14. Koole, G. (2022). The slow-server problem with multiple slow servers. Queueing Systems, 100:469–471.
  15. On the sample complexity of actor-critic method for reinforcement learning with function approximation. Machine Learning, pages 1–35.
  16. Larsen, R. L. (1981). Control of multiple exponential servers with application to computer systems.
  17. Optimal control of a queueing system with two heterogeneous servers. IEEE Transactions on Automatic Control, 29(8):696–703.
  18. Lippman, S. A. (1975). Applying a new device in the optimization of exponential queuing systems. Operations Research, 23(4):687–710.
  19. Rl-qn: A reinforcement learning framework for optimal control of queueing systems. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 7(1):1–35.
  20. Threshold control policies for heterogeneous server systems. Mathematical Methods of Operations Research, 55:121–142.
  21. The cross entropy method for fast policy search. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 512–519.
  22. Approximate and data-driven dynamic programming for queueing networks. Submitted for publication.
  23. Asymptotic optimality of power-of-d load balancing in large-scale systems. Mathematics of Operations Research, 45(4):1535–1571.
  24. Qwi: Q-learning with whittle index. ACM SIGMETRICS Performance Evaluation Review, 49(2):47–50.
  25. Rykov, V. V. (2001). Monotone control of queueing systems with heterogeneous servers. Queueing systems, 37(4):391–403.
  26. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538.
  27. Communication networks: an optimization, control, and stochastic networks perspective. Cambridge University Press.
  28. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA.
  29. Flexible queueing architectures. Operations Research, 65(5):1398–1413.
  30. Real-time ambulance relocation: Assessing real-time redeployment strategies for ambulance relocation. Socio-Economic Planning Sciences, 62:129–142.
  31. Walrand, J. (1984). A note on “optimal control of a queuing system with two heterogeneous servers”. Systems and Control Letters, 4(3):131–134.
  32. Learning and information in stochastic networks and queues. In Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications, pages 161–198. INFORMS.
  33. White, D. J. (1963). Dynamic programming, markov chains, and the method of successive approximations. J. Math. Anal. Appl, 6(3):373–376.
  34. A finite-time analysis of two time-scale actor-critic methods. Advances in Neural Information Processing Systems, 33:17617–17628.
  35. Finite sample analysis of average-reward td learning and q-learning. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems, volume 34, pages 1230–1242.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets