JFSQ: Load Balancing for Heterogeneous Systems

Updated 5 October 2025

JFSQ is a load balancing policy that routes jobs to the server with the shortest queue while favoring the highest service rate to optimize performance in heterogeneous systems.
Analytical studies using Lyapunov functions and Stein’s method show that JFSQ achieves near-zero waiting time and tightly controlled queue lengths in heavy-traffic regimes.
JFSQ outperforms traditional queue assignment protocols by balancing load and maximizing resource utilization, making it ideal for large-scale systems with diverse server capabilities.

The Join-the-Fastest-Shortest-Queue (JFSQ) policy is a queue assignment protocol where each arriving job is routed to a server with the shortest queue length, breaking ties by favoring the server with the highest service rate. The JFSQ discipline directly generalizes the classic Join-the-Shortest-Queue (JSQ) paradigm to account for server heterogeneity, aiming to optimize delay and system utilization in large-scale, heterogeneous queueing systems.

1. Definition and Queueing Model

In a JFSQ system, servers may differ in their processing capabilities—formally, each server $i$ has service rate $\mu_i$ and maintains a local buffer of size $b-1$ . Job arrivals are typically modeled as a Poisson process with rate $\lambda N$ , where $N$ is the number of servers and $\lambda$ the mean load per server. Upon arrival, the dispatcher observes the current queue lengths $Q_i$ (for $i=1,\dots,N$ ) and assigns the new job according to the following rule:

$\text{Select } j^* = \arg\min_{i=1}^N Q_i, \quad \text{breaking ties by}~\max \mu_i$

If multiple servers share the minimal queue length and maximal service rate, the assignment can be resolved randomly among those servers. The system operates in heavy-traffic regimes, commonly parametrized as $\lambda = 1 - N^{-\alpha}$ for $\alpha \in (0,1)$ , distinguishing Sub-Halfin-Whitt $(0,0.5)$ and Super-Halfin-Whitt $[0.5, 1)$ limits (Liu et al., 28 Sep 2025).

2. Delay and Heavy-Traffic Performance

The JFSQ policy is analytically shown to achieve asymptotic zero waiting probability and expected waiting time as $N \rightarrow \infty$ , under both Sub-Halfin-Whitt and Super-Halfin-Whitt scaling (Liu et al., 28 Sep 2025). Let $\bar Q = (1/N)\sum_{i=1}^N Q_i$ denote the average queue length. The key performance bounds, which hold uniformly in heavy-traffic regimes, are summarized as:

$\mathbb{E}\left[(\bar Q - \eta)^r_+\right] = O\left(\left(\frac{k}{N^{1-\alpha}}\right)^r\right)$

where $\eta$ is a threshold depending on the total capacity of "fast" servers and $k$ is a constant that depends explicitly on the degree of server heterogeneity (e.g., ratios of $\mu_\text{fast}/\mu_\text{slow}$ ).

For fixed buffer size $b$ , the queue occupancy above a threshold vanishes in the large- $N$ limit: the number of servers with non-empty waiting is $O(N^\alpha \log N)$ , and the number with more than two jobs is $O(N^{-r(1-\alpha)+1})$ for any independent $r>0$ (Liu et al., 2019), confirming the near-complete elimination of queueing delay.

3. Impact of Server Heterogeneity

JFSQ balances load while aggressively exploiting available high-capacity servers. Classic JSQ and power-of-d policies without speed awareness can overload slow servers, resulting in delay and utilization inefficiencies (Bhambay et al., 2022), especially as heterogeneity increases. In JFSQ, the tie-breaking in favor of the server with maximal $\mu_i$ ensures that jobs are preferentially directed toward units with higher capacity. The analytical bounds on delay contain a prefactor $k$ reflecting the overall spread in service rates; as heterogeneity increases, the gap between JFSQ and non-speed-aware policies broadens, but JFSQ continues to ensure asymptotically vanishing delay, subject to appropriate scaling of load and buffer sizes (Liu et al., 28 Sep 2025).

4. Methodological Advances: Lyapunov and Stein’s Method

The performance analysis of JFSQ in fully heterogeneous settings leverages non-trivial probabilistic tools:

Sequence of Lyapunov Functions with State-Space Peeling: To handle the lack of exchangeability and monotonicity, the state space is decomposed iteratively, showing that the occupancies of fast servers exhibit vanishing queueing delay while slow servers’ occupancies remain bounded and well-behaved (Liu et al., 28 Sep 2025).
Stein’s Method for High-Dimensional Systems: The drift of an appropriately constructed test function $g(\cdot)$ under the Markov generator is related to the residual $h(\cdot)$ (e.g., $h(\bar Q) = (\bar Q - \eta)_+^r$ ). Solving the associated Stein equation yields moment inequalities for the queueing process, which, together with the Lyapunov bounds, guarantee that the steady-state distribution of the mean queue length is tightly concentrated around optimal values (Liu et al., 28 Sep 2025, Hurtado-Lange et al., 2020).
State-Space Collapse and Mean-Field Approximations: The queue length process under JFSQ can be coupled to a single-server G/M/1 queue in the heavy-traffic limit, effectively reducing a high-dimensional stochastic network to a tractable one-dimensional process (Liu et al., 28 Sep 2025).

JFSQ extends the JSQ and GJSQ paradigms by explicitly incorporating service rates into the routing rule (Selen et al., 2015). In the homogeneous case, JFSQ reduces to JSQ, recovering known doubly/exponential tail decay results for queue lengths (Bramson et al., 2011). In heterogeneous systems, JFSQ outperforms both classical JSQ and power-of-d algorithms when those lack service rate awareness, leading to lower mean response times and improved buffer utilization (Liu et al., 28 Sep 2025, Weng et al., 2020). Analytical results and simulation studies confirm that even in constrained environments—such as bipartite graph load-balancing or in the presence of access limitations—JFSQ is asymptotically optimal as long as suitable connectivity or traffic slack persists (Weng et al., 2020).

Speed-aware tie-breaking in JFSQ is closely related to the speed-aware JSQ (SA-JSQ) studied in fluid limit regimes. Under the SA-JSQ, jobs joining the queue with fewest jobs favor the server with highest $\mu_i$ among those tied, leading to a unique, globally attractive fixed point in the fluid limit, matching the lower bound on average delay given by an equivalent pooled system (Bhambay et al., 2022).

6. Extensions, Generalizations, and System Constraints

Variants and extensions of the JFSQ policy account for practical constraints:

Partial or Noisy Information: When only a subset of queue lengths or instantaneous server rates is observable, dynamic threshold mechanisms or pull-based assignment (e.g., JBT-d and JFIQ) have been proposed as extensions that maintain near-optimality at much lower communication cost (Zhou et al., 2017, Weng et al., 2020).
Constrained Load Balancing: When subject to bandwidth or queue utilization requirements, JFSQ-like rules can be made constraint-safe by integrating virtual queues, arrival memory (JSED- $k$ ), or target-level occupancy policies (JSSQ). These ensure compliance with per-server operational constraints while retaining low-delay properties (Fox et al., 3 Feb 2025).
Retrial and Orbit Queue Models: Load balancing rules that include “join the shortest orbit queue” or generalized retrial models also admit JFSQ-type tie-breaking, with geometric tail asymptotics and stability reflecting the interplay between arrival flows, retrial rates, and server speeds (Dimitriou, 2020, Dimitriou, 2021).

7. Practical Implications and System Design

The theoretical guarantees established for JFSQ offer strong guidance for architecting high-throughput, low-latency dispatchers in heterogeneous environments:

Zero-Delay Regime: For sufficiently large $N$ and appropriate heavy-traffic scaling ( $\lambda = 1 - N^{-\alpha}$ ), JFSQ achieves a regime where the fraction of waiting jobs and the waiting time vanish, even as the overall workload approximates system capacity (Liu et al., 28 Sep 2025).
Guided Capacity Planning: The explicit convergence rates, as functions of system size, $\alpha$ , and service rate heterogeneity, inform buffer sizing and resource scaling decisions.
Heterogeneity Resiliency: By always selecting the fastest among the shortest queues, JFSQ ensures that high-capacity servers are efficiently utilized, preventing underuse of resources and avoiding starvation of slow servers.
Generality: The policy is applicable not just in canonical queueing networks, but also in settings with connectivity constraints (e.g., data locality in bipartite graph models), retrial dynamics, or under dynamic constraints (Weng et al., 2020, Fox et al., 3 Feb 2025).

In summary, JFSQ provides a robust, theoretically-validated load-balancing policy for heterogeneous server environments that guarantees system stability, tight tail control for queue lengths, and vanishing delay in the large-scale heavy-traffic regime, leveraging advanced probabilistic techniques such as iterative Lyapunov drift and Stein's method to establish performance guarantees under broad modeling assumptions (Liu et al., 28 Sep 2025).