Papers
Topics
Authors
Recent
Search
2000 character limit reached

Average-Cost Optimality Equation (ACOE)

Updated 23 February 2026
  • ACOE is a fundamental condition in Markov decision processes that defines optimality by balancing immediate costs with future value through a dynamic programming equation.
  • It is derived from discounted-cost models by taking the limit of relative value functions under conditions like weak continuity and inf-compact cost criteria.
  • ACOE underpins applications in queueing, inventory, and mean-field systems, enabling the computation of stationary deterministic optimal policies.

The average-cost optimality equation (ACOE) is a central object in the theory of Markov decision processes (MDPs) and stochastic control, providing a necessary and sufficient condition for a policy to be optimal with respect to long-run average (per-stage) cost. The ACOE connects dynamic programming with ergodic control, is foundational for the structure and computation of optimal policies, and provides the backbone for applications across queueing, inventory control, mean-field systems, and beyond.

1. Formal Statement of the ACOE

Let XX be a Borel subset of a Polish space (state space), A(x)AA(x) \subset A a Borel-action set (possibly noncompact) for each xXx \in X, c:X×A[0,]c: X \times A \to [0,\infty] a one-step cost, and P(x,a)P(\cdot\,|\,x,a) a transition probability kernel on XX. The average-cost optimality equation is

λ+h(x)=minaA(x){c(x,a)+Xh(y)P(dyx,a)},xX\lambda + h(x) = \min_{a \in A(x)} \left\{ c(x,a) + \int_X h(y)\,P(dy|x,a) \right\}, \quad \forall x\in X

where λR\lambda \in \mathbb R is the optimal average cost per unit time, and h:XRh:X \to \mathbb R is a "differential" or "relative" value function. The pair (λ,h)(\lambda,h) solves the ACOE under the following properties:

  • h(x)h(x) is measurable (typically lower-semicontinuous),
  • ac(x,a)a \mapsto c(x,a) is inf-compact or cc is KK-inf-compact over X×AX \times A,
  • The minimum is attained, so measurable selectors φ(x)argminaA(x){}\varphi(x)\in \arg\min_{a\in A(x)}\{\cdots\} exist,
  • The integral h(y)P(dyx,a)\int |h(y)|\,P(dy|x,a) is finite when aA(x)a\in A(x).

The ACOE governs the structure of stationary deterministic optimal policies and provides the threshold between Bellman-type optimality inequalities and actual equations. Solutions (λ,h)(\lambda,h) are unique up to an additive constant in hh (Feinberg et al., 2024).

2. Sufficient Conditions and Modern Existence Results

Contemporary results systematically weaken classical requirements on compactness, continuity, and uniform integrability. Key conditions ensuring the validity of the ACOE (see especially (Feinberg et al., 2024, Feinberg et al., 2012, Feinberg et al., 2016, Feinberg et al., 2017)) are:

  • Continuity/Compactness:
    • Weak model (W*): cc is KK-inf-compact on X×AX\times A, P(x,a)P(\cdot|x,a) is weakly continuous in (x,a)(x,a).
    • Setwise (S*): For each xx, ac(x,a)a\mapsto c(x,a) is inf-compact; P(Bx,)P(B|x,\cdot) is setwise continuous for all BB.
  • Relative Value Boundedness:
    • mn:=infxXvαn(x)m_n := \inf_{x\in X} v_{\alpha_n}(x) for the discounted value.
    • uαn(x):=vαn(x)mnu_{\alpha_n}(x) := v_{\alpha_n}(x) - m_n.
    • w:=lim infn(1αn)mn<w_* := \liminf_{n\to\infty} (1-\alpha_n) m_n < \infty.
    • Boundedness condition BB': lim infnuαn(x)<\liminf_{n\to\infty} u_{\alpha_n}(x) < \infty for each xx.
    • Stronger BB: supx,nuαn(x)<\sup_{x,n} u_{\alpha_n}(x) < \infty.
  • Equicontinuity/Integrability:
    • (EC) Uniform equicontinuity and a uniform integrable envelope for uαnu_{\alpha_n}.
    • (LEC) Lower-semi-equicontinuity in xx, pointwise limit existence of uαn(x)u_{\alpha_n}(x), and uniform integrability w.r.t. P(x,a)P(\cdot|x,a) in (x,a)(x,a).

Under W* or S*, BB', and LEC, the ACOE is satisfied; (λ,h)(\lambda,h) with h(x):=lim infnuαn(x)h(x):= \liminf_{n\to\infty}u_{\alpha_n}(x) is measurable, and policies selecting minimizers solve the average-cost control problem (Feinberg et al., 2024).

3. Derivation from Discounted to Average Cost, and Proof Techniques

The transition from the discounted-cost optimality equation to the ACOE is critical:

  • The value function vα(x)v_\alpha(x) for 0<α<10<\alpha<1 solves

vα(x)=mina{c(x,a)+αvα(y)P(dyx,a)}v_\alpha(x) = \min_a \left\{ c(x,a) + \alpha \int v_\alpha(y) P(dy|x,a) \right\}

  • Define uα(x):=vα(x)mαu_\alpha(x) := v_\alpha(x) - m_\alpha, wα:=(1α)mαw_\alpha := (1-\alpha)m_\alpha.
  • Under the boundedness assumptions, {uαn}\{u_{\alpha_n}\} is pointwise bounded; diagonal/lower-semicontinuity arguments yield h(x)=lim infnuαn(x)h(x) = \liminf_{n\to\infty} u_{\alpha_n}(x).
  • Weakly continuous/inf-compact conditions permit passage of lim inf\liminf through PP and min\min, so the limiting function satisfies

w+h(x)=minaA(x){c(x,a)+h(y)P(dyx,a)}w_* + h(x) = \min_{a\in A(x)} \left\{ c(x,a) + \int h(y) P(dy|x,a) \right\}

Alternate approaches, such as occupation measure convex-analytic methods (employing ergodic occupation measures), Poisson/relative value iteration (RVI) schemes, and reduction to discounted MDPs (e.g., HV–AG transformation), also appear as foundational derivations (Arapostathis et al., 2019, Feinberg et al., 2015, Feinberg et al., 2017). The vanishing discount approach remains the standard, but split-chain constructions or Lyapunov drift stability hypotheses allow further generalization.

4. Policy Structure and Uniqueness

The ACOE under stated conditions admits solutions where for each xx,

A(x)=argminaA(x){c(x,a)+h(y)P(dyx,a)}A^*(x) = \arg\min_{a \in A(x)} \left\{ c(x,a) + \int h(y)P(dy|x,a) \right\}

A measurable selector φ(x):XA\varphi(x): X \to A choosing a minimizer at each xx defines a deterministic stationary policy that is average-cost optimal. Any such policy solves both the average-cost optimality inequality and the equality, and achieves λ\lambda from every initial state.

The solution (λ,h)(\lambda,h) to the ACOE is unique up to a constant shift in hh; i.e., if (λ,h)(\lambda,h) and (λ,h)(\lambda',h') both solve the ACOE and h,hh,h' are bounded below (e.g., lower semicontinuous), then λ=λ\lambda=\lambda' and h=h+Ch' = h + C. This is a generalization of the classical uniqueness theorem for the Bellman equation in ergodic control (Feinberg et al., 2024).

5. Comparison with Classical and Alternative Conditions

Classically, average-cost optimality analysis relied on:

  • Communicating or unichain structure for finite-state models,
  • Lyapunov (drift) conditions to ensure positive recurrence,
  • Uniform compactness of action sets, and strong Feller continuity of transitions,
  • Uniform equicontinuity of discounted value functions.

Recent advances weaken these requirements, replacing them by:

  • KK-inf-compactness or inf-compactness of costs rather than action set compactness,
  • Weak or setwise continuity instead of strong-Feller continuity,
  • One-sided boundedness in the limit of discounted relative values,
  • Lower-semi-equicontinuity and uniform integrability (LEC) instead of full equicontinuity.

This encompasses a broader range of stochastic control models, such as queueing or inventory systems with noncompact action sets and weak continuity properties, which often fall outside the reach of classical assumptions (Feinberg et al., 2024, Feinberg et al., 2016, Feinberg et al., 2012).

6. Illustrative Examples and Applications

The wide applicability of the ACOE is demonstrated by explicit examples in (Feinberg et al., 2024):

  • Single-Action Indicator–Cost: X=[0,1]X=[0,1], A={a0}A=\{a_0\}, P(0x,a0)=1P(0|x,a_0)=1, c(x,a0)=1x0c(x,a_0)=\mathbf{1}_{x\neq0}. The relative value function h(x)=1x0h(x)=\mathbf{1}_{x\neq0} is lower-semicontinuous, and λ=0\lambda=0.
  • Dirichlet–Cost MDP: X=[0,1]X=[0,1], A={a1}A=\{a_1\}, P(0x,a1)=1P(0|x,a_1)=1, c(x,a1)=D(x)c(x,a_1)=D(x) (Dirichlet function). The bias function h(x)=D(x)h(x)=D(x) fails to be lower-semicontinuous but the ACOE still holds under the weaker integrability and limit assumptions.

Broader applications include the derivation of optimal (s,S)(s,S) policies in inventory systems, mean-field game limit problems, and models with state- or action-dependent control constraints. Computationally, the ACOE provides the foundation for value iteration, policy iteration, and linear programming methods for average-cost control (Feinberg et al., 2024, Feinberg et al., 2016, Arapostathis et al., 2019).

7. Impact and Extensions

The ACOE is essential for the theoretical and computational treatment of Markov control problems under the average-cost criterion. Its solution structure and existence theory underpin modern direct algorithms and facilitate the analysis of ergodic control for noncompact, weakly continuous, and complex stochastic dynamic models. The recent generalizations to weaker boundedness and continuity, as established in (Feinberg et al., 2024), have expanded its reach to previously intractable classes of queueing, inventory, and stochastic network models. Furthermore, its connections with ergodic occupation measures, split-chain Poisson equations, and mean-field limits continue to drive advances in both theory and large-system applications.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Average-Cost Optimality Equation (ACOE).