Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The RESET and MARC Techniques, with Application to Multiserver-Job Analysis (2310.01621v1)

Published 2 Oct 2023 in cs.PF

Abstract: Multiserver-job (MSJ) systems, where jobs need to run concurrently across many servers, are increasingly common in practice. The default service ordering in many settings is First-Come First-Served (FCFS) service. Virtually all theoretical work on MSJ FCFS models focuses on characterizing the stability region, with almost nothing known about mean response time. We derive the first explicit characterization of mean response time in the MSJ FCFS system. Our formula characterizes mean response time up to an additive constant, which becomes negligible as arrival rate approaches throughput, and allows for general phase-type job durations. We derive our result by utilizing two key techniques: REduction to Saturated for Expected Time (RESET) and MArkovian Relative Completions (MARC). Using our novel RESET technique, we reduce the problem of characterizing mean response time in the MSJ FCFS system to an M/M/1 with Markovian service rate (MMSR). The Markov chain controlling the service rate is based on the saturated system, a simpler closed system which is far more analytically tractable. Unfortunately, the MMSR has no explicit characterization of mean response time. We therefore use our novel MARC technique to give the first explicit characterization of mean response time in the MMSR, again up to constant additive error. We specifically introduce the concept of "relative completions," which is the cornerstone of our MARC technique.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Stability Analysis of a Multi-server Model with Simultaneous Service and a Regenerative Input Flow. Methodology and Computing in Applied Probability (2019), 1–17.
  2. François Baccelli and Serguei Foss. 1995. On the saturation rule for the stability of queues. Journal of Applied Probability 32, 2 (1995), 494–507. https://doi.org/10.2307/3215303
  3. Percy H. Brill and Linda Green. 1984. Queues in Which Customers Receive Simultaneous Service from a Random Number of Servers: A System Point Approach. Management Science 30, 1 (1984), 51–68.
  4. One Can Only Gain by Replacing EASY Backfilling: A Simple Scheduling Policies Case Study. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). 1–10.
  5. A Bruce Clarke. 1956. A waiting line process of Markov type. The Annals of Mathematical Statistics (1956), 452–459.
  6. Modeling Load and Overwork Effects in Queueing Systems with Adaptive Service Rates. Operations Research 64, 4 (2016), 867–885.
  7. Sherwin Doroudi. 2016. Stochastic analysis of maintenance and routing policies in queueing systems. (2016).
  8. Atilla Eryilmaz and R. Srikant. 2012. Asymptotically Tight Steady-State Queue Length Bounds Implied by Drift Conditions. Queueing Syst. Theory Appl. 72, 3–4 (dec 2012), 311–359. https://doi.org/10.1007/s11134-012-9305-y
  9. Yoav Etsion and Dan Tsafrir. 2005. A short survey of commercial cluster batch schedulers. School of Computer Science and Engineering, The Hebrew University of Jerusalem 44221 (2005), 2005–13.
  10. Parallel job scheduling—a status report. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer, New York, NY, USA, 1–16.
  11. Dimitrios Filippopoulos and Helen Karatza. 2007. An M/M/2 parallel system model with pure space sharing among rigid jobs. Mathematical and Computer Modelling 45, 5 (2007), 491 – 530.
  12. Serguei Foss and Takis Konstantopoulos. 2004. An overview of some stochastic stability methods. Journal of the Operations Research Society of Japan 47, 4 (2004), 275–303.
  13. Javad Ghaderi. 2016. Randomized algorithms for scheduling VMs in the cloud. In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. 1–9.
  14. Bounding stationary expectations of Markov processes. Markov processes and related topics: a Festschrift for Thomas G. Kurtz 4 (2008), 195–214.
  15. Isaac Grosof and Mor Harchol-Balter. 2023. Invited Paper: ServerFilling: A Better Approach to Packing Multiserver Jobs. In Proceedings of the 5th Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems (Orlando, FL, USA) (ApPLIED 2023). Association for Computing Machinery, New York, NY, USA, Article 7, 5 pages. https://doi.org/10.1145/3584684.3597264
  16. Stability for two-class multiserver-job systems. arXiv preprint arXiv:2010.00631 (2020).
  17. WCFS: A new framework for analyzing multiserver systems. Queueing Systems (2022).
  18. New stability results for multiserver-job models via product-form saturated systems. MAthematical performance Modeling and Analysis (MAMA) 4, 6 (2023), 1.
  19. Optimal Scheduling in the Multiserver-Job Model under Heavy Traffic. Proc. ACM Meas. Anal. Comput. Syst. 6, 3, Article 51 (dec 2022), 32 pages. https://doi.org/10.1145/3570612
  20. Fundamental characteristics of queues with fluctuating load. In Proceedings of the joint international conference on Measurement and modeling of computer systems. 203–215.
  21. Bruce Hajek. 1982. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied Probability 14, 3 (1982), 502–525. https://doi.org/10.2307/1426671
  22. Yige Hong. 2022. Sharp Zero-Queueing Bounds for Multi-Server Jobs. SIGMETRICS Perform. Eval. Rev. 49, 2 (jan 2022), 66–68.
  23. James Patton Jones and Bill Nitzberg. 1999. Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–16.
  24. Charles Knessl and Yongzhi Peter Yang. 2002. An exact solution for an M (t)/M (t)/1 queue with time-dependent arrivals and service. Queueing systems 40 (2002), 233–245.
  25. David M Lucantoni and Marcel F Neuts. 1994. Some steady-state distributions for the MAP/SM/1 queue. Stochastic Models 10, 3 (1994), 575–598.
  26. Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment. PLOS ONE 12, 5 (05 2017), 1–26. https://doi.org/10.1371/journal.pone.0176321
  27. S. T. Maguluri and R. Srikant. 2014. Scheduling Jobs With Unknown Duration in Clouds. IEEE/ACM Transactions on Networking 22, 6 (2014), 1938–1951.
  28. Siva Theja Maguluri and R. Srikant. 2016. Heavy traffic queue length behavior in a switch under the MaxWeight algorithm. 6, 1 (2016), 211–250.
  29. William A Massey. 1985. Asymptotic analysis of the time dependent M/M/1 queue. Mathematics of Operations Research 10, 2 (1985), 305–327.
  30. Sean Meyn. 2008. Control techniques for complex networks. Cambridge University Press.
  31. Isi Mitrani and Ram Chakka. 1995. Spectral expansion solution for a class of Markov models: Application and comparison with the matrix-geometric method. Performance Evaluation 23, 3 (1995), 241–260.
  32. Evsey Morozov and Alexander Rumyantsev. 2016. Stability Analysis of a MAP/M/s Cluster Model by Matrix-Analytic Method. In Computer Performance Engineering, Dieter Fiems, Marco Paolieri, and Agapios N. Platis (Eds.). Springer International Publishing, Cham, 63–76.
  33. Marcel F Neuts. 1966. The single server queue with Poisson input and semi-Markov service times. Journal of Applied Probability 3, 1 (1966), 202–230.
  34. GF Newell. 1968a. Queues with time-dependent arrival rates. III—A mild rush hour. Journal of Applied Probability 5, 3 (1968), 591–606.
  35. GF Newell. 1968b. Queues with time-dependent arrival rates. II—The maximum queue and the return to equilibrium. Journal of Applied Probability 5, 3 (1968), 579–590.
  36. Gordon Frank Newell. 1968c. Queues with time-dependent arrival rates I—the transition through saturation. Journal of Applied Probability 5, 2 (1968), 436–451.
  37. Edwin Peng. 2022. Exact Response Time Analysis of Preemptive Priority Scheduling with Switching Overhead. ACM SIGMETRICS Performance Evaluation Review 49, 2 (2022), 72–74.
  38. Efrat Perel and Uri Yechiali. 2008. Queues where customers of one queue act as servers of the other queue. Queueing Systems 60 (2008), 271–288.
  39. Konstantinos Psychas and Javad Ghaderi. 2018. Randomized Algorithms for Scheduling Multi-Resource Jobs in the Cloud. IEEE/ACM Transactions on Networking 26, 5 (2018), 2202–2215.
  40. Alexander Rumyantsev. 2020. Stability of multiclass multiserver models with automata-type phase transitions. In Proceedings of the second international workshop on stochastic modeling and applied research of technology (SMARTY 2020), Vol. 2792. 213–225.
  41. Three-level modeling of a speed-scaling supercomputer. Annals of Operations Research (2022), 1–29.
  42. Alexander Rumyantsev and Evsey Morozov. 2017. Stability criterion of a multiserver model with simultaneous service. Annals of Operations Research 252, 1 (2017), 29–39.
  43. Leszek Sliwko. 2019. A Taxonomy of Schedulers–Operating Systems, Clusters and Big Data Frameworks. Global Journal of Computer Science and Technology (2019).
  44. Rayadurgam Srikant and Lei Ying. 2013. Communication networks: an optimization, control, and stochastic networks perspective. Cambridge University Press.
  45. Characterization of backfilling strategies for parallel job scheduling. In Proceedings. International Conference on Parallel Processing Workshop. 514–519.
  46. Borg: The next Generation. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 30, 14 pages.
  47. Scaling properties of queues with time-varying load processes: extensions and applications. Probability in the Engineering and Informational Sciences 36, 3 (2022), 690–731.
  48. Juan Wang and Wenming Guo. 2009. The Application of Backfilling in Cluster Systems. In 2009 WRI International Conference on Communications and Mobile Computing, Vol. 3. 55–59.
  49. Zero Queueing for Multi-Server Jobs. In Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems (Virtual Event, China) (SIGMETRICS ’21). Association for Computing Machinery, New York, NY, USA, 13–14.
  50. Ury Yechiali and Pinhas Naor. 1971. Queuing problems with heterogeneous arrivals and service. Operations Research 19, 3 (1971), 722–734.
Citations (10)

Summary

We haven't generated a summary for this paper yet.