Program-Level Attained Service (PLAS)

Updated 15 January 2026

Program-Level Attained Service (PLAS) is a load balancing paradigm that uses the elapsed service time of the head-of-line job to inform dispatch decisions.
It integrates coarse-grained service measurements into policies that significantly reduce mean waiting times, especially under high job size variability.
PLAS-based approaches demonstrate up to 80% waiting time reduction with minimal overhead, validated by both mean-field analysis and simulation studies.

Program-Level Attained Service (PLAS) introduces a paradigm in load balancing for large-scale systems equipped with multiple dispatchers and FCFS servers, wherein servers report not only classical queue length but also a coarse-grained indicator of "attained service"—the elapsed service time of the head-of-line job. For workloads with high job size variability, PLAS leverages the strong empirical correlation between high attained service and large jobs to reduce blocking and mean waiting times dramatically, at minimal communication and implementation cost. By integrating this second-order metric into the dispatcher's decision logic, a spectrum of new load balancing policies surpass classical queue-length-based schemes, such as SQ(d), in both theoretical performance and simulated empirical results (Hellemans et al., 2020).

1. Definition of Attained Service and Reporting Layers

The attained service of the head-of-line job at a server at time $t$ is defined as $S(t) = t - \tau$ , where $\tau$ is the start time of service for the head-of-line job. To reduce measurement overhead and communication cost, servers report the attained service in discrete layers determined by coarsening parameter $\Delta>0$ . The reporting thresholds are $c_k = k \Delta$ for $k=0,1,\ldots,r$ (with $c_{r+1} = \infty$ ). A server experiencing $c_{k-1} < S(t) \leq c_k$ reports its state as being in layer $k$ . This discretization ensures practical feasibility and compresses the attained service to as few as $r\leq15$ layers, requiring merely 4 bits for transmission (Hellemans et al., 2020).

2. Classes of Load Balancing Policies Using PLAS

All PLAS-based policies fall into decision structures where the dispatcher probes $d$ servers, each reporting a tuple $(k,\ell)$ : $k$ is the attained-service layer, $\ell$ the queue length. The dispatcher computes an aversion metric $\xi(k,\ell)\in\mathbb{R}^n$ for each server, dispatching the job to the server minimizing $\xi$ in lexicographic order.

a. Joint Queue-Length & Attained-Service Policies:

SQ(d)-RTB (“runtime-tie-break”): Breaks queue-length ties by favouring servers with smaller head-of-line attained service: $\xi(k,\ell) = (\ell,k)$ .
SQ(d)-RE(T) (“runtime-exclusion”): Excludes servers whose attained service exceeds threshold $T$ ( $k>1$ for $\Delta=T$ ); within non-excluded servers, dispatches to the shortest queue: $\xi(k,\ell) = (k,\ell)$ .
SQ(d)-RTB-RE(T): First eliminates servers with $k>T$ , then applies SQ(d)-RTB on the remainder: $\xi(k,\ell) = (1_{k>T},\ell,k)$ .
LEW(d) (“least-expected-workload”): When the job-size distribution is known, computes expected residual workload; benchmarks proximity of distribution-unaware PLAS policies to size-aware approaches:

$\xi(k,\ell) = (\ell-1)\mathbb{E}[X] + \mathbb{E}[X|X\geq c_k]-c_k$

b. Attained-Service-Only Policies:

LAS(d) (“least-attained-service”): Dispatches to server with lowest $k$ : $\xi(k,\ell)=k$ .
LAS(d)-QTB: In case of tie in $k$ , resolves using queue length: $\xi(k,\ell)=(k,\ell)$ .
RE(d,T): Equivalent to LAS(d) with $r=1$ , flagging jobs as "small" or "large" based on whether $k\in\{1,2\}$ .

3. Analytical Framework and Mean-Field Cavity Analysis

The performance of PLAS policies is characterized through mean-field (asymptotic independence) analysis as system size $N\to\infty$ . Focusing on a representative “cavity” server with state $(j,a,\ell)$ —service-phase, age $a$ , queue length $\ell$ —jobs arrive at a potential Poisson( $\lambda d$ ) rate, actually joining depending on minimal $\xi(k,\ell)$ evaluation among the $d$ probed candidates. The cavity's steady-state is described by $\pi_{k,\ell,j}$ , the probability of being in layer $k$ , queue length $\ell$ , service phase $j$ .

Fixed-point equations express the actual arrival rate into each $(k,\ell)$ as

$\lambda_{act}(k,\ell) = \lambda\, d\, \sum_{s=0}^{d-1} \frac{1}{s+1} \binom{d-1}{s} w_{k,\ell}^s v_{k,\ell}^{d-1-s}$

$\lambda_{act}(k,\ell) = \lambda\, \frac{u_{k,\ell}^d - v_{k,\ell}^d}{w_{k,\ell}},$

where $u_{k,\ell}, v_{k,\ell}, w_{k,\ell}$ are functions of the steady-state distribution over $(k,\ell)$ . A coupled “queue map” then yields the steady distribution by treating the cavity as a state-dependent $M/PH/1$ queue with FCFS discipline and phase-type service.

Uniqueness and Solution: Iterating the mapping $\pi = T(H(\pi))$ (with $H$ the fixed-point equation and $T$ the queue map) converges to a unique solution under mild technical conditions.

4. Performance Analysis and Closed-form Metrics

Performance metrics derive directly from the steady-state:

Mean queue size: $\mathbb{E}[Q] = \sum_{k,\ell,j} \ell\, \pi_{k,\ell,j}$
Mean sojourn time: $\mathbb{E}[R] = \mathbb{E}[Q]/\lambda$
Mean waiting time: $\mathbb{E}[W] = \mathbb{E}[R] - 1$
Waiting-time distribution:

$\bar{F}_W(w) = \sum_{\ell\geq 1}\sum_j J_{\ell, j} \cdot \bar{F}_{X_{\ell, j}}(w),$

with $J_{\ell, j}$ the probability that an arrival sees $(\ell, j)$ and $X_{\ell, j}$ the sum of remaining service and queued jobs.

Simulation with $d=5$ , phase-type or mixed-Erlang size distributions (SCV 10–30) demonstrates:

SQ(5)-RTB policy achieves $\sim$ 50% reduction in mean waiting at load $\lambda=0.6$ .
SQ(5)-RTB-RE(2) achieves $\sim$ 65%; LEW(5) achieves $\sim$ 70%.
LAS(5), using only attained service, still achieves 30% improvement.
For SCV 30, best policies reach 60–80% reduction, with joint policies approaching LEW(d) within 5%.
Finite $N$ simulation up to 2,000 servers aligns with mean-field predictions to within 1–3% (Hellemans et al., 2020).

5. Implementation, Overhead, and Policy Robustness

PLAS requires negligible overhead. Attained-service layers can be encoded in 4 bits with suitable $\Delta$ granularity ( $r\leq 15$ ). The updating of attained service requires only a local clock per server; reporting incurs minimal communication overhead, as both queue length and attained-service layer are needed only during probing by the dispatcher. Policies degrade gracefully under coarse $\Delta$ , so exact measurement is unnecessary for efficacy.

The method’s applicability extends to heterogeneous servers by tagging probes with server capacity parameters and adapting the same analytical techniques; to alternative scheduling disciplines (e.g., Processor Sharing, SRPT) by reporting attained service for all queue positions; and to stateful dispatchers, which leverage historical probe responses to reduce probe delay effects.

6. Relation to Prior Art and Theoretical Position

PLAS positions attained service as a “second-order” metric—finer than first-moment queue length (classical SQ(d)), but coarser than full workload (LL(d)), achieving near-optimal performance in systems with mixed job sizes. Unlike size-aware scheduling, it does not require any prior knowledge of the job size distribution. The mechanism functions as a lightweight extension: with minimal changes to existing FCFS infrastructure, it enables performance improvements nearly matching complex, information-rich schemes, thus effectively bridging the gap between purely agnostic and fully distribution-aware load balancing strategies (Hellemans et al., 2020).

7. Empirical Observations and Limit Behaviors

Across simulation regimes:

PLAS policies confer 30–75% waiting time reductions for light to moderate system loads and highly variable workloads.
All attained-service-based policies converge in low-traffic to a relative gain $\lim_{\lambda \to 0} E_{W,rel} \approx 1-u^d$ , with $u$ the probability a probe targets a busy server.
Performance gains increase for higher job size SCV.
The simulation underscores minimal sensitivity to layer granularity and validates the theoretical mean-field predictions for sizable, realistic systems.

PLAS, by combining elementary local measurement with simple policy design, offers a robust, analytically tractable, and high-impact augmentation to existing load balancing frameworks in large-scale, variable-workload systems (Hellemans et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

Improved Load Balancing in Large Scale Systems using Attained Service Time Reporting (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Program-Level Attained Service (PLAS).