Papers
Topics
Authors
Recent
Search
2000 character limit reached

Program-Level Attained Service (PLAS)

Updated 15 January 2026
  • Program-Level Attained Service (PLAS) is a load balancing paradigm that uses the elapsed service time of the head-of-line job to inform dispatch decisions.
  • It integrates coarse-grained service measurements into policies that significantly reduce mean waiting times, especially under high job size variability.
  • PLAS-based approaches demonstrate up to 80% waiting time reduction with minimal overhead, validated by both mean-field analysis and simulation studies.

Program-Level Attained Service (PLAS) introduces a paradigm in load balancing for large-scale systems equipped with multiple dispatchers and FCFS servers, wherein servers report not only classical queue length but also a coarse-grained indicator of "attained service"—the elapsed service time of the head-of-line job. For workloads with high job size variability, PLAS leverages the strong empirical correlation between high attained service and large jobs to reduce blocking and mean waiting times dramatically, at minimal communication and implementation cost. By integrating this second-order metric into the dispatcher's decision logic, a spectrum of new load balancing policies surpass classical queue-length-based schemes, such as SQ(d), in both theoretical performance and simulated empirical results (Hellemans et al., 2020).

1. Definition of Attained Service and Reporting Layers

The attained service of the head-of-line job at a server at time tt is defined as S(t)=tτS(t) = t - \tau, where τ\tau is the start time of service for the head-of-line job. To reduce measurement overhead and communication cost, servers report the attained service in discrete layers determined by coarsening parameter Δ>0\Delta>0. The reporting thresholds are ck=kΔc_k = k \Delta for k=0,1,,rk=0,1,\ldots,r (with cr+1=c_{r+1} = \infty). A server experiencing ck1<S(t)ckc_{k-1} < S(t) \leq c_k reports its state as being in layer kk. This discretization ensures practical feasibility and compresses the attained service to as few as r15r\leq15 layers, requiring merely 4 bits for transmission (Hellemans et al., 2020).

2. Classes of Load Balancing Policies Using PLAS

All PLAS-based policies fall into decision structures where the dispatcher probes dd servers, each reporting a tuple (k,)(k,\ell): kk is the attained-service layer, \ell the queue length. The dispatcher computes an aversion metric ξ(k,)Rn\xi(k,\ell)\in\mathbb{R}^n for each server, dispatching the job to the server minimizing ξ\xi in lexicographic order.

a. Joint Queue-Length & Attained-Service Policies:

  • SQ(d)-RTB (“runtime-tie-break”): Breaks queue-length ties by favouring servers with smaller head-of-line attained service: ξ(k,)=(,k)\xi(k,\ell) = (\ell,k).
  • SQ(d)-RE(T) (“runtime-exclusion”): Excludes servers whose attained service exceeds threshold TT (k>1k>1 for Δ=T\Delta=T); within non-excluded servers, dispatches to the shortest queue: ξ(k,)=(k,)\xi(k,\ell) = (k,\ell).
  • SQ(d)-RTB-RE(T): First eliminates servers with k>Tk>T, then applies SQ(d)-RTB on the remainder: ξ(k,)=(1k>T,,k)\xi(k,\ell) = (1_{k>T},\ell,k).
  • LEW(d) (“least-expected-workload”): When the job-size distribution is known, computes expected residual workload; benchmarks proximity of distribution-unaware PLAS policies to size-aware approaches:

ξ(k,)=(1)E[X]+E[XXck]ck\xi(k,\ell) = (\ell-1)\mathbb{E}[X] + \mathbb{E}[X|X\geq c_k]-c_k

b. Attained-Service-Only Policies:

  • LAS(d) (“least-attained-service”): Dispatches to server with lowest kk: ξ(k,)=k\xi(k,\ell)=k.
  • LAS(d)-QTB: In case of tie in kk, resolves using queue length: ξ(k,)=(k,)\xi(k,\ell)=(k,\ell).
  • RE(d,T): Equivalent to LAS(d) with r=1r=1, flagging jobs as "small" or "large" based on whether k{1,2}k\in\{1,2\}.

3. Analytical Framework and Mean-Field Cavity Analysis

The performance of PLAS policies is characterized through mean-field (asymptotic independence) analysis as system size NN\to\infty. Focusing on a representative “cavity” server with state (j,a,)(j,a,\ell)—service-phase, age aa, queue length \ell—jobs arrive at a potential Poisson(λd\lambda d) rate, actually joining depending on minimal ξ(k,)\xi(k,\ell) evaluation among the dd probed candidates. The cavity's steady-state is described by πk,,j\pi_{k,\ell,j}, the probability of being in layer kk, queue length \ell, service phase jj.

  • Fixed-point equations express the actual arrival rate into each (k,)(k,\ell) as

λact(k,)=λds=0d11s+1(d1s)wk,svk,d1s\lambda_{act}(k,\ell) = \lambda\, d\, \sum_{s=0}^{d-1} \frac{1}{s+1} \binom{d-1}{s} w_{k,\ell}^s v_{k,\ell}^{d-1-s}

or

λact(k,)=λuk,dvk,dwk,,\lambda_{act}(k,\ell) = \lambda\, \frac{u_{k,\ell}^d - v_{k,\ell}^d}{w_{k,\ell}},

where uk,,vk,,wk,u_{k,\ell}, v_{k,\ell}, w_{k,\ell} are functions of the steady-state distribution over (k,)(k,\ell). A coupled “queue map” then yields the steady distribution by treating the cavity as a state-dependent M/PH/1M/PH/1 queue with FCFS discipline and phase-type service.

  • Uniqueness and Solution: Iterating the mapping π=T(H(π))\pi = T(H(\pi)) (with HH the fixed-point equation and TT the queue map) converges to a unique solution under mild technical conditions.

4. Performance Analysis and Closed-form Metrics

Performance metrics derive directly from the steady-state:

  • Mean queue size: E[Q]=k,,jπk,,j\mathbb{E}[Q] = \sum_{k,\ell,j} \ell\, \pi_{k,\ell,j}
  • Mean sojourn time: E[R]=E[Q]/λ\mathbb{E}[R] = \mathbb{E}[Q]/\lambda
  • Mean waiting time: E[W]=E[R]1\mathbb{E}[W] = \mathbb{E}[R] - 1
  • Waiting-time distribution:

FˉW(w)=1jJ,jFˉX,j(w),\bar{F}_W(w) = \sum_{\ell\geq 1}\sum_j J_{\ell, j} \cdot \bar{F}_{X_{\ell, j}}(w),

with J,jJ_{\ell, j} the probability that an arrival sees (,j)(\ell, j) and X,jX_{\ell, j} the sum of remaining service and queued jobs.

Simulation with d=5d=5, phase-type or mixed-Erlang size distributions (SCV 10–30) demonstrates:

  • SQ(5)-RTB policy achieves \sim50% reduction in mean waiting at load λ=0.6\lambda=0.6.
  • SQ(5)-RTB-RE(2) achieves \sim65%; LEW(5) achieves \sim70%.
  • LAS(5), using only attained service, still achieves 30% improvement.
  • For SCV 30, best policies reach 60–80% reduction, with joint policies approaching LEW(d) within 5%.
  • Finite NN simulation up to 2,000 servers aligns with mean-field predictions to within 1–3% (Hellemans et al., 2020).

5. Implementation, Overhead, and Policy Robustness

PLAS requires negligible overhead. Attained-service layers can be encoded in 4 bits with suitable Δ\Delta granularity (r15r\leq 15). The updating of attained service requires only a local clock per server; reporting incurs minimal communication overhead, as both queue length and attained-service layer are needed only during probing by the dispatcher. Policies degrade gracefully under coarse Δ\Delta, so exact measurement is unnecessary for efficacy.

The method’s applicability extends to heterogeneous servers by tagging probes with server capacity parameters and adapting the same analytical techniques; to alternative scheduling disciplines (e.g., Processor Sharing, SRPT) by reporting attained service for all queue positions; and to stateful dispatchers, which leverage historical probe responses to reduce probe delay effects.

6. Relation to Prior Art and Theoretical Position

PLAS positions attained service as a “second-order” metric—finer than first-moment queue length (classical SQ(d)), but coarser than full workload (LL(d)), achieving near-optimal performance in systems with mixed job sizes. Unlike size-aware scheduling, it does not require any prior knowledge of the job size distribution. The mechanism functions as a lightweight extension: with minimal changes to existing FCFS infrastructure, it enables performance improvements nearly matching complex, information-rich schemes, thus effectively bridging the gap between purely agnostic and fully distribution-aware load balancing strategies (Hellemans et al., 2020).

7. Empirical Observations and Limit Behaviors

Across simulation regimes:

  • PLAS policies confer 30–75% waiting time reductions for light to moderate system loads and highly variable workloads.
  • All attained-service-based policies converge in low-traffic to a relative gain limλ0EW,rel1ud\lim_{\lambda \to 0} E_{W,rel} \approx 1-u^d, with uu the probability a probe targets a busy server.
  • Performance gains increase for higher job size SCV.
  • The simulation underscores minimal sensitivity to layer granularity and validates the theoretical mean-field predictions for sizable, realistic systems.

PLAS, by combining elementary local measurement with simple policy design, offers a robust, analytically tractable, and high-impact augmentation to existing load balancing frameworks in large-scale, variable-workload systems (Hellemans et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Program-Level Attained Service (PLAS).