Age of Job Completion Analysis

Updated 13 November 2025

Age of Job Completion (AoJC) is defined as the elapsed time from a job's arrival to its completion, capturing end-to-end latency in distributed and stochastic systems.
Recent research introduces both optimal and heuristic scheduling algorithms, such as OBTA, Water-Filling, and Replica-Deletion, to minimize AoJC while balancing throughput and stability.
Optimization strategies utilize methodologies like MILP, MDP, and Markovian models to jointly address resource allocation, data locality constraints, and sampling costs in job scheduling.

The age of job completion is a metric and analytical framework for quantifying, optimizing, and stabilizing the delay between the arrival and completion of jobs in networked and distributed systems. It is distinct from classical metrics (e.g., response time, makespan) by measuring the time elapsed from job arrival to completion and is applied as both an objective and constraint in online scheduling, data-locality-constrained task assignment, queueing systems with nontrivial machine dynamics, and throughput optimization. Recent studies in distributed execution with data locality (Zhao et al., 11 Jul 2024) and stochastic job assignment with Markovian server states (Mitrolaris et al., 6 Nov 2025) have developed rigorous definitions, problem formulations, and both optimal and heuristic policies for minimizing this age while accounting for constraints such as stability, sampling cost, and heterogeneous service capabilities.

1. Formal Definition and System Contexts

The age of job completion (AoJC), often denoted as $age_c = C_c - a_c$ or $\Phi_c$ for job $c$ , represents the interval between the arrival time $a_c$ and the estimated completion time $C_c$ . In single-server queueing systems with multiple users, the instantaneous age for user $i$ at slot $t$ is $v_i^\phi(t) = t - \sup\{t'<t: b_i^\phi(t')=1\}$ , where $b_i^\phi(t')$ marks the latest completion under policy $\phi$ (Mitrolaris et al., 6 Nov 2025). For batch scheduling in data-locality-constrained environments, $C_c$ is the maximal finishing time across all servers assigned tasks from job $c$ , accounting for the outstanding backlogs and server-specific service rates (Zhao et al., 11 Jul 2024). Extension to long-run averages defines $\Delta_i^\phi$ and $\Delta^\phi$ as the time-averaged age per user and system-wide, respectively.

AoJC is employed to:

Capture end-to-end latency for jobs or users.
Directly align scheduling with throughput maximization (completed jobs per time unit).
Guide trade-offs between quick completions and resource stability in dynamic systems.

2. Optimization Formulations

Minimization of AoJC is framed in two principal models:

a. Distributed Data-Locality-Aware Scheduling (Zhao et al., 11 Jul 2024):

Variables: For each job $c$ , tasks are grouped by identical server-availability sets into $K_c$ groups. Servers $m\in\mathcal{M}$ each possess profiled capacities $\mu_m^c$ and instantaneous backlog $o_m^c$ .
Objective: On every job arrival, solve

$\min \Phi_c$

subject to per-server and per-task-group constraints: 1. $\sum_k n_m^k \leq \max\{\Phi_c - b_m^c, 0\}$ , 2. $\sum_{m\in\mathcal{S}_c^k} n_m^k \mu_m^c \geq |\mathcal{T}_c^k|$ , where $n_m^k$ represents time slots assigned by server $m$ to group $k$ .

b. Job Assignment with Markovian Machine States (Mitrolaris et al., 6 Nov 2025):

Variables: Multiple user queues $Q_i(t)$ , Bernoulli job arrivals $a_i(t)$ , binary machine state sampled at cost $L$ , stochastic external job assignment.
Objective:

$\min_\phi (\Delta^\phi + S^\phi)$

subject to queue stability and action constraints, where $S^\phi$ is the long-term sampling cost.

Constrained optimization is typically solved via MDP/stochastic control approaches, or by parameterizing randomized or round-robin scheduling policies and tuning adaptive sampling frequencies.

3. Algorithms and Policies for AoJC Minimization

A range of algorithms have been proposed and analyzed:

A. OBTA (Optimal Balanced Task Assignment) (Zhao et al., 11 Jul 2024):

Decomposes nonlinear integer program via MILP subproblems, using bounds $\Phi_c^-$ and $\Phi_c^+$ for search-space restriction.
Iteratively checks intervals sorted by distinct server busy times, exploiting linearity within each subrange for tractable MILP solution.
Optimality is guaranteed with exact service profiles; complexity is $O(K_c + M)$ MILP solves per job.

B. Water-Filling (WF) Heuristic (Zhao et al., 11 Jul 2024):

Assigns tasks group-by-group, raising server busy times incrementally (“pour water” analogy), using binary search to find minimum slot increments for each group.
Computational cost $O(K_c M \log |\mathcal{T}_c|)$ .
Approximation factor proven to be $K_c$ ; worst-case instances achieve this bound.

C. Replica-Deletion (RD) Heuristic (Zhao et al., 11 Jul 2024):

Initially assigns all possible task replicas, then iteratively deletes excess assignments from the most loaded servers, prioritizing tasks with many alternatives.
Empirical performance yields ages close to OBTA; computational overhead $O(M^2 n \log n)$ per job.

D. Job Reordering via Shortest-Estimated-Time-First (OCWF-ACC) (Zhao et al., 11 Jul 2024):

Maintains outstanding job set $O$ , builds new execution order $Q$ by repeatedly selecting the job with smallest WF-estimated remaining age.
Implements early-exit pruning by lower-bounding age estimates.

E. Centralized Policies in Markov Machine Setting (Mitrolaris et al., 6 Nov 2025):

Adaptive randomized scheduling and sampling: For every active subset $\mathcal{S}$ , precompute sampling probabilities $\mu^*(\mathcal{S})$ and scheduling distributions $\pi^*(\mathcal{S})$ via convex nonlinear programs, plug in closed-form expressions.
Max-age scheduling (round-robin among active users with highest age) combined with stationary optimized sampling $\bar{\mu}(\mathcal{S})$ .

4. Stability Conditions and Analytical Age Expressions

Stability of AoJC-minimizing policies requires strict control of the arrival-service rate gap:

Sufficient Condition	Policy	Formula (for all nonempty $\mathcal{S}$ )
Adaptive Randomized (Prop 1)	Randomized Scheduling	$\sum_j p_j - \mu(\mathcal{S})[1-\chi(q,s)] \sum_{i \in \mathcal{S}} \pi_i(\mathcal{S}) q_i \le -\epsilon$
Max-age (Prop 2)	Round-Robin Scheduling	$\sum_j p_j - \mu(\mathcal{S})[1-\chi(q,s)]q_{\min (\mathcal{S})} \le -\epsilon$

Here $\chi(q,s)$ encodes the Markov machine’s idle/busy transition structure, and $\epsilon > 0$ quantifies strict inequality required for positive recurrence.

Analytical expressions for long-run average age are derived for each policy, enabling local optimization. For example, for adaptive randomized scheduling:

$\Delta_k(\mathcal{S}) = \frac{1}{\left( \frac{s}{q} + 2(\frac{1}{\mu} - 1) + \bar{\eta} \right)} \Bigl( \frac{\psi_k^2}{\pi_k} + (\frac{1}{q_k} + \frac{1-s}{q} - 2)\psi_k + \frac{1}{q}[(1-s)(1-\pi_k - \eta_k) - \frac{1}{\mu}] - \frac{\pi_k(1-q_k)}{q_k} + \sum_{i \in \mathcal{S}} \frac{\pi_i(1-q_i)}{q_i^2} \Bigr) + 1$

where the auxiliary quantities are as defined in (Mitrolaris et al., 6 Nov 2025).

Sampling cost under stationary randomized sampling, as per Theorem 2: $S^\phi(\mathcal{S}) \le S^\phi_{\mathrm{ub}} (\mathcal{S}) = \frac{(L+1)\mu}{p^* (1 / (\mu p^* + \bar{\eta}))}$ with $p^* = q / [1 - (1-2q)(1-\mu)]$ .

A plausible implication is that system designers must calibrate both the scheduling and sampling frequency jointly to ensure stability and minimize AoJC.

5. Empirical Evaluation and Practical Implications

Trace-driven and simulation studies rigorously validate theory:

For distributed scheduling with data locality (Zhao et al., 11 Jul 2024), OBTA achieves optimal ages, WF obtains ages within a few percent of OBTA at $\sim100\times$ lower cost, RD closes the gap by $1-2\%$ with moderate overhead.
Job-reordering (OCWF-ACC) further reduces mean age, maintaining performance even under highly skewed workloads.
In Markov machine systems (Mitrolaris et al., 6 Nov 2025), round-robin (max-age) scheduling with optimized sampling outperforms adaptive randomized policies, particularly under high traffic. Both age and cost decrease with faster transition rate $q$ .
Sufficient stability conditions occasionally underestimate the practical regime; queues may remain stable outside the proven sufficient region, suggesting the conditions are conservative.
System utilization increases absolute ages but the performance hierarchy of policies is preserved.

These results underscore that AoJC-centric scheduling provides a robust, throughput-aligned, and verifiable method for job assignment in environments ranging from distributed compute clusters to dynamic central-server queueing systems.

6. Significance and Interpretative Remarks

Adoption of AoJC as a primary metric implies an operational focus on completed job rates, online tractability, and fairness among users/jobs. Its applicability to both deterministic MILP-based scheduling (Zhao et al., 11 Jul 2024) and stochastic controlled queueing (Mitrolaris et al., 6 Nov 2025) indicates methodological generality. The analytic forms for age and cost enable explicit policy tuning, unlike blackbox simulation methods. A plausible implication is that future extensions may integrate AoJC within broader resource optimization (energy, reliability, SLA), or generalize job priorities and dependencies.

Furthermore, these frameworks reveal that:

Simple heuristics (water-filling, replica deletion, round-robin max-age) can yield near-optimal AoJC at orders-of-magnitude lower computation than offline optimal assignment.
Carefully designed sampling policies are essential in systems with non-work-conserving machine states; undersampling yields idle service capacity, oversampling incurs unnecessary cost.
Data locality and job structure must be explicitly incorporated into task assignment to avoid worst-case approximation factors.

7. Relation to Contemporary Research and Outlook

Current research (Zhao et al., 11 Jul 2024, Mitrolaris et al., 6 Nov 2025) emphasizes AoJC in diverse system architectures—including distributed clusters with partial data replication, FIFO queues, Markov-modulated service capabilities—and establishes theoretical complexity, optimality, and practical efficiency. Empirical validation against proprietary production traces (e.g., Alibaba Batch Trace) ensures relevance of findings.

The AoJC perspective complements and enhances existing metrics such as average response time, makespan, and freshness age, suggesting further directions in multi-resource and multi-criteria scheduling. Adoption in operational systems will require integration with workload forecasting, adaptive policy deployment, and resilience against adversarial arrival and failure scenarios.

PDF Markdown Chat (Pro)

References (2)

Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions (2024)

Age of Job Completion Minimization with Stable Queues (2025)

Follow Topic

Get notified by email when new papers are published related to Age of Job Completion.