Optimal Control Policy in Queueing Systems

Updated 27 January 2026

Optimal control policy is a set of actions defined for each system state that minimizes a performance metric such as average holding cost, completion time, or AoI.
These policies are formulated using the Markov Decision Process framework and solved via dynamic programming techniques like value or policy iteration.
The approach highlights tradeoffs like computation-transmission balance, peak versus average freshness, and resource allocation versus throughput in networked systems.

An optimal control policy in queueing systems specifies, for every system state, the admissible action (or set of actions) that minimizes a pre-defined performance criterion—such as long-run average holding cost, total expected completion time, or, in networks relevant to edge-computing, specialized notions like Age-of-Information (AoI). The structure of such policies, the methodologies for deriving and analyzing them, and the operational tradeoffs they induce are central concerns in the stochastic control of tandem and multi-stage queueing networks.

1. Mathematical Formulation of the Optimal Control Policy

A rigorous mathematical definition of an optimal control policy in a queueing system leverages the Markov Decision Process (MDP) framework. Consider, for concreteness, a two-stage tandem queueing network:

System state: The configuration is often represented as $(n_1, n_2)$ for customer counts at each node, possibly augmented with information about servers, buffer occupancies, or job phases.
Action space: At each state, the controller selects an action vector (e.g., resource allocation, routing choice, server assignment).
Transition dynamics: System evolves according to the queueing discipline (arrival, service, routing), with rates possibly modulated by the selected actions.
Performance metric: The control objective can be to minimize total expected cost (e.g., $g(f) = \lim_{T\to\infty}\frac{1}{T}\mathbb{E}^f[\int_0^T C(X_t, a_t)\,dt]$ ) or a specialized functional such as expected AoI.

The solution is characterized by the average-cost optimality equation (ACOE)

$g + h(x) = \min_{a\in \mathcal{A}(x)} \Big\{ \sum_{x'} Q_{xx'}(a)[h(x') - h(x)] + C(x, a) \Big\}$

where $h(x)$ is the relative value function, $g$ is the optimal average cost, $Q_{xx'}(a)$ is the action-dependent generator, and $C(x,a)$ is instantaneous cost (Zaiming et al., 2015).

The optimal policy $f^*$ is a stationary mapping from system state to action, achieving the infimum in the right-hand side of the ACOE.

2. Queueing Models and Controlled System Structures

Optimal control policy development is sensitive to the characteristics of the underlying queueing structure. Notable examples from the literature include:

Classical tandem queues with infinite or finite buffers (Krivulin et al., 2012, Wu et al., 2014): Focus is on blocking effects, buffer-induced interruptions, and bottleneck analysis, with system evolution governed by coupled recurrences or max-plus algebraic models (Krivulin, 2012).
Edge computing/IoT update systems (Zou et al., 2019): The control variable is typically the allocation between local computation/preprocessing and transmission, optimizing AoI in a two-stage tandem.
Resource-allocation in collaborative or flexible-server systems (Zaiming et al., 2015, Papachristos et al., 2019, Lu et al., 20 Jan 2026): Actions correspond to allocating servers or deciding between parallel and serial processing, subject to constraints (e.g., only one flexible server, partial or full collaboration).

A summary table distinguishes core mathematical features:

Model Reference	State Representation	Control Action	Primary Objective
(Zou et al., 2019)	(queue occupancies, phase)	Preprocessing vs. transmission time	Minimize AoI, Peak AoI
(Zaiming et al., 2015)	$(n_1, n_2)$	Allocation of server resources	Minimize long-run average cost
(Papachristos et al., 2019)	$(n_1, n_2)$ , server status	Assignment of collaborative server	Minimize total expected cost
(Lu et al., 20 Jan 2026)	$(i,j,k,\ell)$	Routing after stage 1	Minimize total holding cost

3. Canonical Solution Methods

Dynamic programming, specifically value or policy iteration on the ACOE, is the primary computational approach. Structural properties of the optimal policy are often derived by analyzing monotonicity, convexity, and coupling in the cost and transition structures:

Monotonicity: Under mild conditions on costs and resource increments, optimal resource allocation is non-decreasing in local queue length. For each node, more jobs never result in fewer allocated server units in the optimal solution (Zaiming et al., 2015).
Threshold/Bang–Bang Structures: In many cases, the action set is discrete and the optimal policy reduces to threshold-type or "bang-bang" forms: for each value of queue length, the controller either assigns the maximum or minimum permissible action (e.g., all or no servers, route always to parallel node or not), with switching curves determined by finite-dimensional optimization (Zaiming et al., 2015, Papachristos et al., 2019, Lu et al., 20 Jan 2026).
Coupling and Dominance Relationships: The optimal actions at downstream nodes may dominate those at upstream nodes if cost and service rate sensitivities are aligned accordingly (Zaiming et al., 2015).

Sharp analytic characterization of the policy is often possible in modular two-stage systems, while networks with feedback, blocking, or breakdowns (as in (Reddy et al., 2010, Dimitriou, 2019)) frequently necessitate spectral-analytic or matrix approaches.

4. Tradeoffs and Structural Insights

Optimal control induces key tradeoffs in performance metrics:

Computation–Transmission tradeoff: In edge-computing tandem queues, increased local preprocessing reduces transmission time but increases per-packet latency. The induced AoI is convex in the preprocessing mean, so a unique nontrivial minimizer exists in many regimes (Zou et al., 2019).
Peak vs. average freshness: Simultaneous optimization of average and peak AoI can exhibit Pareto-fronts: minimizing one metric worsens the other, with region boundaries determined by preemption and buffer discipline (Zou et al., 2019).
Resource Collocation vs. Distribution: In presence of collaborative servers, partially additive rates and holding cost imbalance can induce idling as optimal, in contrast to classical non-idling results for fully additive systems (Papachristos et al., 2019).
Blocking/Starvation vs. Throughput: Finite buffers, blocking, and feedback mechanisms restrict optimal system throughput below that given by naive bottleneck capacity, with optimal policies balancing buffer utilization against blocking-induced delays (Wu et al., 2014, Reddy et al., 2010).

These tradeoffs are formally quantifiable via explicit performance formulas (mean cycle time, holding cost rate, effective service rate).

5. Representative Example: Dynamic Resource Allocation in Tandem Networks

A prototypical result is from the study of dynamic resource allocation (Zaiming et al., 2015):

System: Two-stage tandem queue, Poisson arrivals, resource units allocated dynamically to each node.
Objective: Minimize average holding plus resource cost.
MDP description: State $(n_1, n_2)$ , controls $(a, b)$ representing allocated server units.
Optimal policy: If resource cost functions and service rates satisfy monotonicity and convexity, the allocation to each node is nondecreasing in its queue. Under strong convexity, the policy is bang-bang; i.e., one always allocates either zero or the maximal permitted servers to each node.
Further, if the cost and rate increments at node 2 dominate those at node 1, optimal allocation to node 2 always matches or exceeds that to node 1.

Explicitly, the ACOE is

$g + h(n_1,n_2) = \min_{a,b} \left\{ \lambda\,h(n_1+1,n_2) + \mu_1(a)\,h(n_1-1, n_2+1) + \mu_2(b)\,h(n_1,n_2-1) \ + [1-\lambda-\mu_1(a)-\mu_2(b)]\, h(n_1,n_2) + c_1(a) + c_2(b) + h_1 n_1 + h_2 n_2 \right\}$

where $h(\cdot)$ is the bias, $g$ the average cost, $c_i(\cdot)$ and $\mu_i(\cdot)$ are the cost and rate functions per node (Zaiming et al., 2015).

6. Application: Age-of-Information–Optimal Policies in Edge Computing

In two-stage edge-computing queues for information update systems, the optimal policy concerns the selection of computational intensity $E[P]$ so as to minimize AoI at the monitor. Key findings include:

Explicit optimality condition: The average AoI is a convex function of $E[P]$ along the tradeoff curve $1/\mu = g(E[P])$ . Optimization selects an $E[P]^*$ attaining the minimum (Zou et al., 2019).
Management discipline: Preemptive schemes can be superior or inferior to non-preemptive depending on the variance of $P$ ; lower variance suits non-preemptive systems, but in preemptive variants, higher service time variability can in some regimes decrease AoI.
Buffer and preemption choices: The optimal packet management (e.g., GI/M/1/2* with preemption consistently yields the lowest AoI for moderate-tailed service times; if service times are heavy-tailed, preemption in the source may outperform at the expense of higher peak AoI).

Quantitative formulas for AoI and peak AoI under four management schemes can be written in closed-form using renewal-reward arguments, MGF calculus, and conditioning on two-state Markov chain occupation probabilities (Zou et al., 2019).

7. Implementation and Computational Considerations

Optimal policies often admit threshold or bang-bang characterization, leading to significant computational simplification. For example:

Threshold parameters: In collaborative-server and parallel/single-server control (see (Papachristos et al., 2019, Lu et al., 20 Jan 2026)), the allocation or routing choice as a function of queue lengths is determined by the sign of a state-dependent index (e.g., $d(n_1, n_2) = \mu_1 f(n_1, n_2) + \mu_2 g(n_1, n_2)$ ), leading to monotone switching curves or explicit state partitions.
Mathematical tools: Proofs and policy verification utilize coupling arguments, monotonicity induction, value function difference comparisons, and Riemann–Hilbert techniques in spectral-analytic settings (Dimitriou, 2019).
Numerical validation: Performance gains of optimal policies over heuristic or static strategies are quantifiable—improvements in holding cost or response time can be substantial, as demonstrated in dedicated numerical studies (Zaiming et al., 2015, Lu et al., 20 Jan 2026, Papachristos et al., 2019).

In summary, the optimal control policy for a queueing system, particularly tandem and edge-centric models, is characterized by explicit structural results grounded in the dynamic programming framework, validated via analytic and numerical techniques, and demonstrates tight coupling among arrival, service, buffer, and action-induced dynamics. The policy’s structure and performance depend critically on the system's stochastic primitives, cost structure, and allowed management actions (Zou et al., 2019, Zaiming et al., 2015, Papachristos et al., 2019, Lu et al., 20 Jan 2026).

Markdown Upgrade to Chat

References (9)

Analysis of the Optimal Resource Allocation for a Tandem Queueing System (2015)

On evaluation of the mean service cycle time in tandem queueing systems (2012)

Analysis and Approximation of Dual Tandem Queues with Finite Buffer Capacity (2014)

A max-algebra approach to modeling and simulation of tandem queueing systems (2012)

Optimizing Information Freshness Through Computation-Transmission Tradeoff and Queue Management in Edge Computing (2019)

Optimal Dynamic Allocation of Collaborative Servers in Two Station Tandem Systems (2019)

Control policies for a two-stage queueing system with parallel and single server options (2026)

The Study State Analysis of Tandem Queue with Blocking and Feedback (2010)

Stationary analysis of a tandem queue with coupled processors subject to global breakdowns (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Optimal Control Policy.