Multiserver Job Queuing Model (MJQM)

Updated 4 February 2026

MJQM is a queueing model where each job occupies multiple servers simultaneously, reflecting demands of data centers and cloud computing.
Analytical techniques such as mean-field limits, heavy-traffic scaling, and saturated-system methods are used to derive key performance metrics.
Scheduling policies—from simple FCFS to advanced size-aware strategies—are critical for mitigating queue delays and optimizing resource utilization.

The multiserver-job queuing model (MJQM) encompasses a broad and mathematically rich class of queueing systems in which each job, upon entering the system, simultaneously occupies multiple servers for the duration of its service. MJQMs, and their variants such as systems with synchronization, job component splitting, and adaptive and nonpreemptive scheduling, are foundational in modeling contemporary data center, high-performance computing, and cloud workloads, where computational tasks often require co-allocation of multiple processing units. Analysis of MJQM has progressed via multiple approaches, including mean-field limits, saturated-system techniques, priority policies, loss models, and advanced heavy-traffic scaling. MJQM unifies several practical settings, providing insight into delay, throughput, and resource wastage in large-scale stochastic resource-sharing systems.

1. Core Model Structure and Variants

The canonical MJQM is specified by a pool of $n$ or $k$ identical servers, each with unit or normalized capacity. Jobs (or “tasks”) arrive at the system according to a Poisson process (either global, with rate λ, or class-split with rates $\{\lambda_i\}$ ). Each job belongs to a class, indexed by $i\in \{1,...,C\}$ , and requires exactly $k_i$ servers (the “server-need”) for execution. The service time $d_i$ or service requirement $s_j$ may depend on both the class and the individual job, possibly with arbitrary correlation.

For multicomponent variants, a job may be split into $k_j$ subcomponents distributed across chosen servers. Assignment may be random or based on system state information (“least-load”, “water-filling”); synchronization disciplines range from strict fork-join (all pieces must start together) to cancel-on-completion/arrival redundancy (Shneer et al., 2020, Olvera-Cravioto et al., 2014).

Upon arrival, jobs are either queued if insufficient resources are available, lost (blocking systems), or adaptively split across available servers up to a maximum (adaptive MJQM) (Ghanbarian et al., 2023). The most common scheduling regimes are nonpreemptive FCFS, priority by job-size or server-need, or more advanced size-based and index policies.

2. Stability, Scaling Regimes, and Performance Metrics

A central object in MJQM analysis is the system load:

$\rho = \frac{\lambda \, \sum_{i=1}^C \alpha_i d_i n_i}{s}$

or, in resource-pooled normalization, $\rho = \lambda E[S]$ with $S$ the normalized job size (Grosof et al., 2022, Grosof et al., 2021). Stability (positive recurrence of the system’s Markov chain) requires $\rho<1$ . However, MQJM systems with head-of-line blocking, synchronization, or large job classes may have much tighter stability regions (Grosof et al., 2020, Grosof et al., 7 May 2025).

MJQM theoretical analyses focus on:

Queueing probability: probability that an arriving job must wait, $P\{\text{wait}\}$ .
Mean response time: $R = \E[\text{sojourn time}]$.
Steady-state throughput and system capacity.
Resource wastage: idle servers despite non-empty queues due to non-fit constraints.

Key scaling regimes include the Halfin-Whitt (critical heavy-traffic), multilevel scaling (when large jobs become rare), and mean-field/large- $n$ limits (Grosof et al., 7 May 2025, Hong et al., 2021). Sufficient and necessary scaling conditions for zero-queueing, vanishing waiting probability, or delay optimality depend on both growth rates of job sizes and server pool size (Wang et al., 2020).

3. Scheduling Policies and Delay Optimality

Classical FCFS, under head-of-line blocking, can induce substantial performance degradation: small jobs can be indefinitely blocked by earlier large jobs, leading to queue buildup and server wastage (Grosof et al., 2020, Hong et al., 2021). Several policy classes fundamental to MJQM are:

Size-blind policies: e.g., FCFS, Most-Servers-First (MSF), and nonadaptive packing. Simple and implementable, but suboptimal in presence of heterogeneity.
Size-aware policies: Shortest Remaining Processing Time (SRPT), ServerFilling-SRPT, Smallest-Need-First (SNF), and Gittins-based scheduling, typically preemptive, minimize response time asymptotically (Grosof et al., 2022, Grosof et al., 2021, Scully et al., 2020). ServerFilling-SRPT is proven heavy-traffic mean-delay optimal, achieving $E[T] \sim E[T_\text{SRPT-1}]$ as $\rho\to 1$ (Grosof et al., 2022).
Nonpreemptive, job-size oblivious policies: Balanced-Splitting (BSF), MSF-QuickSwap (MSFQ). BSF partitions servers by job class to isolate interference, achieving vanishing queueing in many-server limits without needing job size knowledge or preemption (Anselmi et al., 2024). MSFQ addresses performance variability by periodically prioritizing other jobs, giving strong empirical delay improvements (Chen et al., 2 Sep 2025).
Adaptive splitting: Jobs adapt to system state, e.g., splitting into as many components as are idle at arrival, achieving asymptotic optimality under modest system observation (Ghanbarian et al., 2023).

Empirical studies confirm the analytic results: policies exploiting workload structure (size, class, or resource need) dramatically outperform FCFS or naive packing, particularly in heterogeneous or high-traffic settings (Grosof et al., 2022, Hong et al., 2021, Chen et al., 2 Sep 2025).

4. Analytical Techniques and Limit Theorems

Analysis leverages:

Lyapunov drift methods: For stability regions, zero-wait conditions, and explicit bounds for $P\{\text{wait}\}$ (Wang et al., 2020).
Mean-field and fluid limits: Yield deterministic evolution and fixed-point equations for large-scale systems, capturing asymptotic workload distributions and independence across servers (Shneer et al., 2020).
Saturated-system (“backpressure”) methods: Closed-form stability boundaries for multi-class FCFS with blocking, product-form solutions for saturated Markov chains (Grosof et al., 2020, Grosof et al., 7 May 2025).
Queueing theory connections: Equivalence with the M/GI/s/s loss system for isolated server partitions (Erlang B formula), heavy-traffic limits matched to M/G/1 (universal curves for response time), and Cramér–Lundberg-type results for synchronized systems (Anselmi et al., 2024, Olvera-Cravioto et al., 2014, Grosof et al., 2021).
Stein’s method, coupling, and state-space collapse: To rigorously bound blocking and response time, especially in adaptive MJQM settings (Ghanbarian et al., 2023, Hong et al., 2021).

Heavy-traffic and zero-wait limit theorems quantify precisely when queueing vanishes and establish the rate of decay for delay as system size grows, with sharp thresholds dictated by interaction of service requirement scaling and job-class prevalence.

5. Synchronization Constraints and Redundancy

In fork-join and parallelization models, job components may be assigned independently and require synchronization (all sub-jobs start together). The high-order Lindley recursion analytically characterizes the stationary waiting time:

$W \stackrel{d}{=} \max\bigg\{0, \, \max_{1 \le i \le N} (\chi_i - \tau_i + W_i)\bigg\},$

where $N$ is the job size (in pieces), $\chi_i$ service times, $\tau_i$ interarrival times, and $W_i$ i.i.d. copies (Olvera-Cravioto et al., 2014). The generalized Cramér–Lundberg result gives exponential decay of delay tails, and—under branching process constructions—unique mean-field limiting distributions.

Synchronization can significantly increase waiting times and server idleness compared to resource-pooled models, especially under strict FCFS and lack of queue-length information for assignment.

6. Design Insights, Wastage, and Practical Guidelines

Key design contributions of MJQM studies include:

Partitioning servers by job class (e.g., Balanced-Splitting) is extremely effective when inter-class variability is high and intra-class variability is low—this isolates large jobs, preventing starvation of small jobs (Anselmi et al., 2024).
Server wastage is significant under naïve FCFS with blocking—idle servers can approach the largest server-need in the system or become a vanishing fraction only under suitable scaling (Grosof et al., 2020).
Scaling for zero delay: To achieve vanishing queueing, one requires sublinear maximal job sizes and/or load levels below explicit thresholds (e.g., $2\alpha + \gamma < 1$ for system scaling indices $\alpha, \gamma$ (Wang et al., 2020)).
Policy selection: Preemptive, size-aware policies offer order-of-magnitude delay reductions when implementable. Nonpreemptive, size-blind assignment, if structured (e.g., Balanced-Splitting, Adaptive Quickswap), can still deliver near-optimal delay in regimes of high heterogeneity.
Saturated-system analysis enables practical capacity planning: explicit stability boundaries allow data center designers to balance provisioning against server wastage, especially as job-size diversity increases (Grosof et al., 2020, Grosof et al., 7 May 2025).

7. Connections, Extensions, and Open Directions

MJQM encompasses and extends several classical models:

Redundancy models: cancel-on-start, cancel-on-completion, and join-the-shortest-queue can be analyzed as special MJQM cases via component splitting and assignment strategies (Shneer et al., 2020).
Work-conserving finite-skip (WCFS) framework: MJQM under ServerFilling lies in the WCFS class, attaining the universal heavy-traffic limit

$\lim_{\rho \to 1} E[T](1-\rho) = \frac{E[S^2]}{2E[S]},$

with explicit additive bounds for all $\rho$ (Grosof et al., 2021).

Heavy-traffic optimality of Gittins-type policies: M-Gittins and monotonic SERPT extend size-index policies to multiserver settings with unknown job sizes, with tight 2-approximation bounds (Scully et al., 2020).
Mean-field and fluid analysis: MJQM is amenable to large-scale asymptotics, capturing asymptotic independence and enabling direct calculation of limiting delay distributions (Shneer et al., 2020).
Open problems include analysis under dynamic server assignment (e.g., with data locality), non-Poissonian arrivals, correlated server speeds, and resilience against stragglers or server failures.

MJQM continues to be a central modeling framework for modern parallel computation, with ongoing advances in scheduling theory, stochastic process analysis, and real-system validation.