Average-Cost Optimality Equation (ACOE)
- ACOE is a fundamental condition in Markov decision processes that defines optimality by balancing immediate costs with future value through a dynamic programming equation.
- It is derived from discounted-cost models by taking the limit of relative value functions under conditions like weak continuity and inf-compact cost criteria.
- ACOE underpins applications in queueing, inventory, and mean-field systems, enabling the computation of stationary deterministic optimal policies.
The average-cost optimality equation (ACOE) is a central object in the theory of Markov decision processes (MDPs) and stochastic control, providing a necessary and sufficient condition for a policy to be optimal with respect to long-run average (per-stage) cost. The ACOE connects dynamic programming with ergodic control, is foundational for the structure and computation of optimal policies, and provides the backbone for applications across queueing, inventory control, mean-field systems, and beyond.
1. Formal Statement of the ACOE
Let be a Borel subset of a Polish space (state space), a Borel-action set (possibly noncompact) for each , a one-step cost, and a transition probability kernel on . The average-cost optimality equation is
where is the optimal average cost per unit time, and is a "differential" or "relative" value function. The pair solves the ACOE under the following properties:
- is measurable (typically lower-semicontinuous),
- is inf-compact or is -inf-compact over ,
- The minimum is attained, so measurable selectors exist,
- The integral is finite when .
The ACOE governs the structure of stationary deterministic optimal policies and provides the threshold between Bellman-type optimality inequalities and actual equations. Solutions are unique up to an additive constant in (Feinberg et al., 2024).
2. Sufficient Conditions and Modern Existence Results
Contemporary results systematically weaken classical requirements on compactness, continuity, and uniform integrability. Key conditions ensuring the validity of the ACOE (see especially (Feinberg et al., 2024, Feinberg et al., 2012, Feinberg et al., 2016, Feinberg et al., 2017)) are:
- Continuity/Compactness:
- Weak model (W*): is -inf-compact on , is weakly continuous in .
- Setwise (S*): For each , is inf-compact; is setwise continuous for all .
- Relative Value Boundedness:
- for the discounted value.
- .
- .
- Boundedness condition : for each .
- Stronger : .
- Equicontinuity/Integrability:
- (EC) Uniform equicontinuity and a uniform integrable envelope for .
- (LEC) Lower-semi-equicontinuity in , pointwise limit existence of , and uniform integrability w.r.t. in .
Under W* or S*, , and LEC, the ACOE is satisfied; with is measurable, and policies selecting minimizers solve the average-cost control problem (Feinberg et al., 2024).
3. Derivation from Discounted to Average Cost, and Proof Techniques
The transition from the discounted-cost optimality equation to the ACOE is critical:
- The value function for solves
- Define , .
- Under the boundedness assumptions, is pointwise bounded; diagonal/lower-semicontinuity arguments yield .
- Weakly continuous/inf-compact conditions permit passage of through and , so the limiting function satisfies
- Measurable selection yields the existence of deterministic stationary minimizing controls (Feinberg et al., 2024, Feinberg et al., 2012, Feinberg et al., 2017).
Alternate approaches, such as occupation measure convex-analytic methods (employing ergodic occupation measures), Poisson/relative value iteration (RVI) schemes, and reduction to discounted MDPs (e.g., HV–AG transformation), also appear as foundational derivations (Arapostathis et al., 2019, Feinberg et al., 2015, Feinberg et al., 2017). The vanishing discount approach remains the standard, but split-chain constructions or Lyapunov drift stability hypotheses allow further generalization.
4. Policy Structure and Uniqueness
The ACOE under stated conditions admits solutions where for each ,
A measurable selector choosing a minimizer at each defines a deterministic stationary policy that is average-cost optimal. Any such policy solves both the average-cost optimality inequality and the equality, and achieves from every initial state.
The solution to the ACOE is unique up to a constant shift in ; i.e., if and both solve the ACOE and are bounded below (e.g., lower semicontinuous), then and . This is a generalization of the classical uniqueness theorem for the Bellman equation in ergodic control (Feinberg et al., 2024).
5. Comparison with Classical and Alternative Conditions
Classically, average-cost optimality analysis relied on:
- Communicating or unichain structure for finite-state models,
- Lyapunov (drift) conditions to ensure positive recurrence,
- Uniform compactness of action sets, and strong Feller continuity of transitions,
- Uniform equicontinuity of discounted value functions.
Recent advances weaken these requirements, replacing them by:
- -inf-compactness or inf-compactness of costs rather than action set compactness,
- Weak or setwise continuity instead of strong-Feller continuity,
- One-sided boundedness in the limit of discounted relative values,
- Lower-semi-equicontinuity and uniform integrability (LEC) instead of full equicontinuity.
This encompasses a broader range of stochastic control models, such as queueing or inventory systems with noncompact action sets and weak continuity properties, which often fall outside the reach of classical assumptions (Feinberg et al., 2024, Feinberg et al., 2016, Feinberg et al., 2012).
6. Illustrative Examples and Applications
The wide applicability of the ACOE is demonstrated by explicit examples in (Feinberg et al., 2024):
- Single-Action Indicator–Cost: , , , . The relative value function is lower-semicontinuous, and .
- Dirichlet–Cost MDP: , , , (Dirichlet function). The bias function fails to be lower-semicontinuous but the ACOE still holds under the weaker integrability and limit assumptions.
Broader applications include the derivation of optimal policies in inventory systems, mean-field game limit problems, and models with state- or action-dependent control constraints. Computationally, the ACOE provides the foundation for value iteration, policy iteration, and linear programming methods for average-cost control (Feinberg et al., 2024, Feinberg et al., 2016, Arapostathis et al., 2019).
7. Impact and Extensions
The ACOE is essential for the theoretical and computational treatment of Markov control problems under the average-cost criterion. Its solution structure and existence theory underpin modern direct algorithms and facilitate the analysis of ergodic control for noncompact, weakly continuous, and complex stochastic dynamic models. The recent generalizations to weaker boundedness and continuity, as established in (Feinberg et al., 2024), have expanded its reach to previously intractable classes of queueing, inventory, and stochastic network models. Furthermore, its connections with ergodic occupation measures, split-chain Poisson equations, and mean-field limits continue to drive advances in both theory and large-system applications.