Queue Scheduling: Models, Policies, and Analysis

Updated 1 June 2026

Queue scheduling is the systematic allocation of service opportunities for backlogged tasks, ensuring stability and optimal throughput in diverse systems.
Research in queue scheduling encompasses rigorous mathematical models, classical policies (like LQS and LDQS), and randomized methods that balance delay and computational cost.
Emerging techniques integrate learning-based algorithms and quantum-classical scheduling to optimize multi-resource operations in cloud computing and IoT environments.

Queue scheduling is the systematic assignment of service opportunities to backlogged jobs, packets, or tasks awaiting service in a queueing system, subject to constraints imposed by processing resources, network topologies, performance objectives, or application-level QoS requirements. The design, analysis, and implementation of queue scheduling policies represent a central theme in operations research, communications, computer systems, and industrial control, encompassing a wide spectrum of mathematical models and systems architectures. Rigorous stability, optimality, and complexity analyses inform the engineering of scheduling mechanisms in both classical and emerging multi-resource environments.

1. Fundamental Models and Stability Criteria

The canonical model considers an open multiclass queueing network comprising $M$ queues with exogenous arrivals $\lambda_i$ and service rates $\mu_i$ , organized into $G$ disjoint “service groups” $\mathcal{G}_1,\dots,\mathcal{G}_G$ in which at most one queue per group can be served simultaneously. Job departures may be routed according to an arbitrary stochastic matrix $P$ (Pedarsani et al., 2012). The queue-length process $Q(t)$ is Markovian; stability is formalized as positive Harris recurrence of $Q(t)$ . The network’s capacity region is

$\Lambda = \left\{\,\lambda \geq 0 : \forall j,\, \sum_{i\in\mathcal{G}_j} \frac{\nu_i}{\mu_i} \leq 1\,\right\},$

where $\nu = R\lambda$ , $\lambda_i$ 0. Throughput-optimality requires stabilizing every $\lambda_i$ 1 in the interior of $\lambda_i$ 2.

Alternative models include large-scale parallel buffered systems with many queues and a single shared server (Dieker et al., 2013), finite-capacity queues with hard deadlines (0807.2694), processor sharing disciplines (Bor et al., 2024), and systems with resource pooling under complicated interference graphs and schedule sets (Shah et al., 2011, Shin et al., 2014, Mohan et al., 2020).

2. Classical and Contemporary Queue Scheduling Policies

2.1 Longest-Queue and Longest-Dominating-Queue Policies

Longest-Queue Scheduling (LQS): In each group, allocate service to the queue(s) with maximal queue length. For two groups of two queues each, LQS is proved throughput-optimal via a fluid-model Lyapunov analysis (Pedarsani et al., 2012).

Longest-Dominating-Queue Scheduling (LDQS): Define the “dominating” queues as those not feeding any strictly larger queue among global maxima. In acyclic networks, LDQS is throughput-optimal for all group sizes and topologies—proven using a max-queue Lyapunov function on the induced fluid subnetwork (Pedarsani et al., 2012).

2.2 Randomized Longest-Queue Sampling

In large $\lambda_i$ 3-queue systems, the Randomized Longest-Queue-First (RLQF) scheduler samples $\lambda_i$ 4 queues uniformly at each service opportunity, selects the queue with the maximum length among the sample, and serves it. As $\lambda_i$ 5, RLQF admits a mean-field fluid limit; stationary average queue length scales as $\lambda_i$ 6 (Dieker et al., 2013). This achieves near-centralized performance with $\lambda_i$ 7 computational complexity per decision and admits a trade-off curve between delay and computational cost.

2.3 Scheduling with Deadlines and QoS

For finite-capacity queues with online-arriving packets, each associated with a weight and hard deadline, competitive analysis of deterministic and randomized “virtual deadline” memoryless algorithms shows that a deterministic 3-competitive and randomized $\lambda_i$ 8-competitive policy is achievable, outperforming Earliest Deadline First and greedy policies in the bounded-buffer setting (0807.2694).

2.4 Priority and Programmable Scheduling

Modern high-speed switches and operating systems require programmable, line-rate packet schedulers capable of implementing hierarchical and custom policies. The push-in first-out (PIFO) abstraction provides a unifying priority queue model supporting hierarchies, shaping, and differentiation via arbitrary rank computations on enqueue. All conventional scheduling algorithms (WFQ, SP, EDF, stop-and-go, DRR, CBQ) are instances of PIFO-based programming, and a 64-port, 10 Gbit/s shared-memory switch PIFO-mesh implementation achieves <4% chip area overhead (Sivaraman et al., 2016).

3. Algorithmic Complexity, Distributed, and Learning-Based Queue Scheduling

3.1 Interactive Oracle and Approximate Optimization

In constrained queueing networks (e.g., wireless or switched systems), maximum weight scheduling is often NP-hard. Approximate and distributed policies that interact with optimization oracles (randomized search, MCMC Glauber dynamics, belief propagation, primal-dual methods) provide throughput-optimality under mild mixing-time and function-growth conditions (Shin et al., 2014). MCMC-based oracles enable fully distributed, local-information policies with delay scaling polynomial in backlog, while random search and PDM approaches guarantee performance for centralized or matching-constrained systems.

3.2 Adversarial and Bandit Learning Schedulers

Dynamic, non-stationary systems with time-varying and unknown arrivals and service rates can be stabilized by learning-augmented queue scheduling algorithms. Key examples include SoftMaxWeight (SoftMW) and Sliding-Window SoftMaxWeight (SSMW), which combine Lyapunov-drift minimization with bandit learning using mirror-descent (EXP3.S+) to select actions based only on observed rewards. These algorithms provably achieve $\lambda_i$ 9 backlog under piecewise stabilizability and bounded-variation conditions, without knowledge of instantaneous network state (Huang et al., 2023).

3.3 Deep Reinforcement Learning for Multi-Objective Scheduling

Hierarchical DRL frameworks (e.g., MERLIN) decompose multi-objective queue scheduling into independent “inner” and “outer” policy networks, permitting scalable training and execution on queues orders-of-magnitude larger than the policy input width. Modular separation enables robust, near-optimal completion time performance in settings where tasks are themselves complex subproblems (Birman et al., 2020).

4. Analytical and Numerical Methods for Performance and Delay

4.1 Queueing Analysis and Tail Bounds

For buffer-aware scheduling under stringent delay constraints (e.g., URLLC), hybrid queue analysis methods combine censored Markov chain augmentation in the small-queue regime with large deviations and extreme value theory (EVT) in the large-queue regime. Explicit error bounds for stationary distribution truncation and stitched LDT/EVT approximations enable accurate and computationally efficient analysis of cross-layer wireless scheduling policies (Li et al., 2024).

In processor-sharing systems with JSQ, Laplace-Stieltjes-transforms of response time are characterized via coupled functional PDEs, which are solved numerically via operator-matrix discretization and complex contour integration, yielding full response-time distributions and moments in heavy-traffic and asymmetric regimes (Bor et al., 2024).

4.2 Queue Scheduling Under Resource, Interference, and Machine Constraints

Switched networks with complex service constraints (convex polytopes of allowed schedules) can attain the optimal $\mu_i$ 0 scaling of mean queue size via emulation of continuous-time Store-and-Forward Allocation (SFA) policies. The key is to track lag between discrete system service and SFA and decompose it onto extreme points, scheduling as aggressively as possible consistent with the convex service region. This resolves long-standing conjectures in input-queued switching and wireless network scaling (Shah et al., 2011).

5. Practical Implementations and Industry Applications

5.1 Deterministic and Industry-Scale Schedulers

In deterministic networking, hardware-programmable cycle-specified queues (PCSQ) implement microsecond-precision cyclic queue rotation, per-flow resource reservation, queue-cycle mapping, and bounded-deviation dequeue to provide mathematically guaranteed delay and jitter bounds: per-hop jitter below $\mu_i$ 1 (cycle size), ms-scale end-to-end delay, and demonstrated scalability to tens of thousands of flows in long-distance WAN testbeds (Huang et al., 2024).

5.2 Parallel Priority Queues

High-throughput parallel task schedulers (e.g., Stealing Multi-Queue, SMQ) achieve $\mu_i$ 2 expected extraction rank using queue affinity, probabilistic stealing, and task batching, validated theoretically via balls-into-bins coupling and empirically across graph benchmarks. NUMA-aware and cache-optimal implementations outperform hand-tuned heuristics in fine-grained task environments (Postnikova et al., 2021).

5.3 Wireless and Sensor Networks

Decentralized scheduling decisions using only single-bit queue nonemptiness feedback—from local sensing or channel-state detection—are sufficient to guarantee throughput-optimality in path and cluster-of-cliques (CoC) conflict graphs. Policy design via “policy splicing” and local tie-breaking yields scalable, low-complexity approaches suitable for low-power IoT and sensor networks, approaching delay-optimality under mild assumptions (Mohan et al., 2020).

6. Emerging Frontiers: Predictions, Robustness, and Quantum-Classical Workflows

6.1 Prediction-Driven Scheduling

Queue scheduling with machine-learning predictions of service times introduces performance improvements but raises new analytical questions. Policies such as Shortest Predicted Remaining Processing Time (SPRPT) and “bounce” ranked variants deliver $\mu_i$ 3-consistency and $\mu_i$ 4-robustness (response time scaling with prediction error) under multiplicative prediction error models. Applications to LLM inference require hybrid memory-aware preemption, dynamic resource constraints, and complex two-phase workload models, motivating new competitive analyses and algorithmic approaches (Mitzenmacher et al., 10 Mar 2025).

6.2 Quantum-Classical Queue Scheduling

Operation scheduling in the quantum cloud—where tasks contend for access to heterogeneous quantum devices with variable queue times and calibration-induced fidelity—is modeled as joint queue-fidelity optimization over dynamic DAGs. The Qurator scheduler algebraically unifies provider calibration data, queue-time estimation, circuit cutting/merging, and synchronization constraints, providing bounded queue time reductions under fidelity constraints and supporting dynamically adaptive scheduling across providers and technologies (Pehlivanoglu et al., 7 Apr 2026).

Queue scheduling is a multifaceted domain in which foundational mathematical analysis, theoretical performance bounds, algorithmic complexity, and practical system implementation converge. Continued advances in queue scheduling will be central to the design of scalable, efficient, and robust systems in cloud computing, communication networks, manufacturing, and quantum-classical integration.