Virtual-Time Fair Queuing Algorithm

Updated 26 October 2025

Virtual-time based fair queuing is a scheduling approach that assigns virtual finish times to work units to mimic GPS and ensure equitable resource allocation.
It is applied in packet scheduling, multi-server queues, and cloud systems to maintain fairness across diverse network flows and resource demands.
The algorithm guarantees strict delay bounds and performance fairness, making it essential for modern networking, OFDMA systems, and LLM serving infrastructures.

A virtual-time based fair queuing algorithm is a class of network or resource scheduling mechanisms designed to approximate Generalized Processor Sharing (GPS) by assigning “virtual” finish times to each unit of work (e.g., packet, job, or flow), and making queuing decisions based on these virtual times. These algorithms ensure that every flow or application receives a fair share of resources over time—regardless of arrival patterns, flow sizes, or resource demands—by leveraging the notion of simulated "virtual time," which tracks idealized progress in a fair service regime. Modern virtual-time based algorithms have been extensively deployed for packet scheduling, bandwidth sharing, work scheduling in multi-server clusters, and fair resource allocation in contemporary LLM serving infrastructures.

1. Principles of Virtual-Time Based Scheduling

Virtual-time based fair queuing algorithms simulate the fluid GPS discipline, under which all active flows are served simultaneously and proportionally to their weights. Since GPS is impossible to implement directly in discrete, packetized systems, these algorithms instead compute a virtual finish time for each work unit based on its weight, arrival time, and the amount of resource it should fairly receive.

For instance, in classical fair queuing:

Each packet of flow $i$ arriving at time $a_{i,k}$ is assigned a virtual finish time $F_{i,k}$ based on its size $L_{i,k}$ and its weight $w_i$ .
The virtual time $V(t)$ , representing cumulative fair resource progress, is updated at discrete events, typically advancing faster when fewer flows are present (each receives a larger instantaneous share) and slower as more flows contend.
Packets (or jobs) are scheduled in the order of their virtual finish times, aligning actual resource usage with the GPS ideal.

In resource-constrained domains, such as OFDMA wireless resource allocation (0711.1269), virtual times can be generalized via dual parameters that track fair share adjustments and history-adapted metrics (such as exponentially smoothed rate) to reflect true service progression.

2. Algorithmic Frameworks

Classical and Generalized Models

Several frameworks have been developed under the virtual-time paradigm:

Weighted Fair Queuing (WFQ): Maintains per-flow queues, assigns virtual finish times, and dequeues the earliest, robust to packet sizes and arrival times.
Start-Time Fair Queuing / Self-Clocked Fair Queuing: Variants that improve implementation efficiency by advancing virtual time only at packet service events.

Multi-Resource and Hierarchical Extensions

Hierarchical Dominant Resource Fair Queueing (H-DRFQ) algorithms (You et al., 2022) adapt virtual-time scheduling to settings where flows are grouped hierarchically and contend for multiple types of resources, such as CPU, bandwidth, and cache. The virtual time is recursively computed either on a collapsed flat tree (collapsed H-DRFQ) or dovetailed through each layer (dove-tailing H-DRFQ), ensuring hierarchical share guarantees.

A representative virtual time evolution for Justitia (Yang et al., 19 Oct 2025), an LLM scheduler, is defined by the differential equation:

$V(0) = 0,\qquad \frac{dV(t)}{dt} = \frac{M}{N_t}$

where $M$ is the total memory resource and $N_t$ is the number of competing applications at time $t$ . Applications' virtual finish times are computed as $\bar{F}_j = V(a_j) + C_j$ , with $C_j$ as predicted service cost.

3. Fairness Guarantees and Delay Bounds

Virtual-time scheduling seeks not just mean fairness but strong worst-case delay bounds. By tracking each flow's or application's ideal finish time under GPS, these algorithms can offer strict guarantees of maximal delay or slowdown, often proven analytically.

For instance, in Justitia (Yang et al., 19 Oct 2025), the actual finish time $f_j$ versus ideal $\bar{F}_j$ is bounded:

$f_j - \bar{F}_j \leq 2c_{\max} + \frac{C_{\max}}{M}$

where $c_{\max}$ is the maximum service cost of any single inference and $C_{\max}$ is the largest overall service cost among all applications.

Hierarchical DRFQ extensions explicitly guarantee that every group and individual flow get at least their prescribed dominant resource share, regardless of demand inflation or aggregation.

4. Resource Allocation Methodologies

Resource allocation under virtual-time based fair queuing adapts dynamically to varying demands, resource types, and system constraints:

OFDMA Wireless Systems: Allocation of power and bandwidth follows virtual-time analogs via dual variable binary search, ensuring exponential smoothing of rates and strict satisfaction of QoS constraints (0711.1269).
Packet Elections: Fair scheduling in constrained networks models packet decisions as elections, where virtual departures from a shadow GPS system determine weights for maximum-weight scheduling (0808.2530).
Multi-server Queues: Service tags representing the cumulative (normalized) work per user are treated as the virtual finish time to admit packets to servers under eligibility constraints, offering provable max-min fairness (Khamse-Ashari et al., 2016).

5. Implementation in Networked and Cloud Systems

Virtual-time based fair queuing algorithms are extensively implemented in diverse environments:

Routers and Switches: WFQ and its variants are foundational for IP routers, ensuring isolation and fairness among flows in scenarios such as VoIP traffic (Mohammed et al., 2013).
Datacenter Traffic Management: Scheduling schemes combining virtual time logic with deficit round robin (e.g., RL-SP-DRR) optimize latency-sensitive and bulk traffic, mutable via programmable switch architectures (Tokmakov et al., 2020).
LLM Serving: For LLM inference frameworks, virtual-time scheduling orders applications for “saturated serving”—using full resources sequentially, with theoretical fairness and dramatic improvements in average completion times (Yang et al., 19 Oct 2025).

The computational complexity is kept manageable with simple arithmetic, credit counters, and round-robin traversals, favoring hardware implementation in high-speed devices (Roberts et al., 2022).

Virtual-time based algorithms stand distinct from size-based schedulers such as SRPT (Shortest Remaining Processing Time):

SRPT is optimal for mean flow completion time under Poisson arrivals but can starve stragglers in batch/bursty environments.
Virtual fair scheduling (VFS) (Roberts et al., 2022) approximates PS fairness via credit updates and threshold-based packet dropping, outperforming SRPT for batch fairness and keeping active flow state sizes scalable.

Balanced fairness algorithms in computer clusters (Bonald et al., 2016) can produce ideal resource sharing but eschew explicit virtual time computation. Instead, they achieve fairness through frequent job interruptions and queue re-routings, indirectly mimicking the equilibrium produced by virtual-time disciplines.

7. Impact and Future Directions

Virtual-time based fair queuing algorithms continue to influence congestion control, resource scheduling, and fairness in heterogeneous AI serving and compute substrates.

Newer designs such as Justitia (Yang et al., 19 Oct 2025) and hierarchical DRFQ (You et al., 2022) extend virtual time concepts to GPU-bound and multi-resource workloads, providing theoretical guarantees and practical scalability.
Systematic evaluation in performance-sensitive settings (OFDMA wireless (0711.1269), VoIP routing (Mohammed et al., 2013), LLM inference (Yang et al., 19 Oct 2025)) demonstrate consistent improvements in fairness, average and worst-case completion times, and application-level QoS.
Emergent applications include dynamic buffer allocation for congestion-controlled flows, adaptive per-flow and per-tenant policies, and flexible integration with programmable network fabrics.

A plausible implication is that network and cloud architectures will increasingly embed virtual-time based fair queuing logic at multiple levels (link, application, cluster) due to its strong theoretical foundations, generalizability across resource types, and provable fairness/delay properties.