Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Zero-Waiting Load Balancing with Heterogeneous Servers in Heavy Traffic (2509.23918v1)

Published 28 Sep 2025 in math.PR

Abstract: We study the steady-state delay performance of load balancing in large-scale systems with heterogeneous servers in the heavy-traffic regimes. The system consists of $N$ servers, each with a local buffer of size $b-1$, serving jobs in the first-in-first-out (FIFO) order. Jobs arrive according to a Poisson process with rate $\lambda N$, where $\lambda = 1 - N{-\alpha}$ for any $\alpha \in (0,1)$. Service times are assumed to be exponentially distributed with fully heterogeneous rates, where the service rate of each server can differ and may scale with the system size $N$. We study a queue length aware and service rate aware load balancing policy, Join-the-Fastest-Shortest-Queue (JFSQ), and demonstrate that it achieves asymptotic zero waiting time and probability under the heavy traffic regimes, including both the Sub-Halfin-Whitt ($\alpha \in (0,0.5)$) and Super-Halfin-Whitt ($\alpha \in [0.5,1)$) regimes. The performance bounds of waiting time and probability explicitly capture the convergence rate w.r.t. the system size $N$ and show the negative effect of server heterogeneity. Our analysis builds on the general framework of Stein's method with iterative state-space peeling, where we design a sequence of Lyapunov functions to analyze the high-dimensional heterogeneous system without assuming exchangeability and monotonicity. Our analysis shows that JFSQ efficiently utilizes servers with higher capacities, and the steady-state system can be coupled with a single-server queue via Stein's method. To the best of our knowledge, this is the first work to establish delay performance bounds of a load-balancing system with size $N$ and fully heterogeneous servers in heavy traffic.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.