Feinberg-Piunovskiy Theorem in MDPs

Updated 7 October 2025

Feinberg-Piunovskiy Theorem is a result affirming that deterministic stationary policies (or chattering policies in non-atomless settings) achieve all performance vectors in uniformly absorbing Markov Decision Processes.
The theorem utilizes occupation measure analysis, convexity arguments, and Young measure theory to simplify the complexity of policy search in multi-criteria MDPs.
It has significant implications for stochastic control, enabling researchers and practitioners to restrict policy search to structured stationary policies in constrained environments.

The Feinberg-Piunovskiy Theorem delineates the sufficiency of deterministic stationary policies (or, in non-atomless settings, chattering stationary policies) for solving discrete-time, uniformly absorbing Markov Decision Processes (MDPs) with multiple criteria on measurable state spaces. The theorem, originally established for atomless models with Borel state spaces and extended to general measurable spaces, significantly reduces the complexity of policy search by demonstrating that all achievable performance vectors can be attained by deterministic or structured stationary policies. Occupation measure analysis, convexity arguments, and Young measure theory underpin this result, which clarifies the foundational architecture of optimal control in constrained MDPs with absorbing states.

1. Mathematical Setting and Definitions

Consider a discrete-time MDP on a measurable state space $X$ and Borel action space $A$ . An absorbing set $\Delta\subset X$ is defined such that once the process enters $\Delta$ , rewards cease to accumulate. Uniform absorption is the property that the expected entrance time to $\Delta$ is uniformly bounded over all admissible policies. For each policy $\pi$ , the performance is quantified by a vector-valued reward function $r:X\times A\to\mathbb{R}^d$ , and the total expected performance is given via the occupation measure: $\mathcal{R}(\pi) = \int_{X\times A} r(x, a)\ \mu_\pi(dx, da)$ where $\mu_\pi$ represents the occupation measure under $\pi$ , encapsulating the cumulative discounted or time-weighted frequencies with which state-action pairs are visited before absorption.

Atomless models are those for which both the initial distribution and the transition kernels are non-atomic; i.e., they do not assign positive probability to any singleton.

A deterministic stationary policy is a measurable function $\phi:X\to A$ mapping each state to a specific action. A chattering stationary policy (of order $n$ ) at each state mixes among at most $n$ deterministic stationary policies, i.e., for each $x$ , the action is randomized with finite support: $\gamma(\cdot | x) = \sum_{i=1}^n \beta_i(x)\, \delta_{\phi_i(x)}$ where $\beta_i(x)\ge 0$ , $\sum_{i=1}^n \beta_i(x)=1$ , and $\delta_{\phi_i(x)}$ is the Dirac measure.

2. Statement of the Feinberg-Piunovskiy Theorem

The core assertion for atomless uniformly absorbing MDPs with Borel state space and multiple criteria is: $\mathcal{R}(\mathcal{D}) = \mathcal{R}(\Pi)$ where $\mathcal{D}$ is the class of deterministic stationary policies, and $\Pi$ is the set of all admissible policies (including randomized and history-dependent).

This equality implies that every achievable performance vector can be matched by a deterministic stationary policy; search over all admissible policies is unnecessary in the atomless context.

When the atomless condition is omitted, the theorem extends the sufficiency property to chattering stationary policies: $\mathcal{R}(\mathcal{C}_{d+1}) = \mathcal{R}(\Pi)$ where $\mathcal{C}_{d+1}$ denotes the set of chattering stationary policies of order $d+1$ and $d$ is the dimension of rewards. This establishes that performance vectors can be obtained by mixing among at most $d+1$ deterministic stationary policies at each state.

3. Occupation Measure and Extreme Point Analysis

The central technical innovation is the characterization of the set of attainable occupation measures: $O(g, \alpha) = \{ \mu\in O : \int_{X\times A} g(x,a)\, \mu(dx,da) = \alpha \}$ where $g$ is (typically) the vector–valued reward function and $\alpha\in \mathbb{R}^d$ is the target performance vector.

Convex analysis is employed with the following principal results:

For a uniformly absorbing MDP, the extreme points of $O(g, \alpha)$ are occupation measures corresponding to chattering stationary policies of order $d+1$ :

$\mathrm{ext}(O(g, \alpha)) \subset O_{\mathcal{C}_{d+1}}(g, \alpha)$

For atomless models, these extreme points coincide precisely with those induced by deterministic stationary policies:

$\mathrm{ext}(O(g, \alpha)) = O_\mathcal{D}(g, \alpha)$

Since the set $O(g, \alpha)$ is convex and any occupation measure lies within the convex hull of its extreme points, this yields the sufficiency result for deterministic (or chattering, depending on atomlessness) stationary policies.

4. Chattering Policies, Young Measures, and Convex Combinations

A chattering policy is structured by selecting a finite ensemble of deterministic policies and specifying measurable weights at each state. The occupation measure under such a policy, by construction, is a convex combination of occupation measures of the deterministic components.

The formalism mirrors Young measure theory, where kernels with finitely supported measures at each state provide the necessary structure to generate maximal points in the achievable set. In models with $d$ criteria, chattering policies of order $d+1$ suffice because of the dimensionality of convex sets in $\mathbb{R}^d$ (Carathéodory’s theorem).

5. Extensions and Additional Results

The theorem’s modern development (Dufour et al., 6 Oct 2025) generalizes prior work (cf. (Feinberg et al., 2018)) by proving the sufficiency of chattering policies in the absence of atomless transitions and relaxing requirements to measurable state spaces. The occupation measure approach provides a unifying perspective that avoids induction on the number of criteria and reduces reliance on Borel structure.

The result is derived via analysis of the extreme points of convex sets of occupation measures, which are “linearly closed” and “linearly bounded”. These properties are critical for applying convex analysis and Young measure techniques.

6. Significance in Stochastic Control and Applications

The impact of the Feinberg-Piunovskiy Theorem is substantial:

For researchers and practitioners, policy search in uniformly absorbing, multi-criteria discrete-time MDPs can be restricted to deterministic stationary policies or, in non-atomless models, structured mixtures (chattering policies), greatly simplifying theory and computations.
The theorem clarifies the boundaries in policy complexity required for optimality, facilitating advanced numerical methods and symbolic computation for MDP optimization.
The extension to non-atomless and non-Borel spaces widens applicability in economics, engineering, and stochastic optimization, especially in settings involving constraints and multiple objectives.
The occupation measure formalism coupled with convex analysis bridges stochastic control and modern mathematical methods, providing a template for analogous results in broader decision-theoretic models.

7. Summary Table: Sufficiency of Policy Classes

Model Property	Sufficient Policy Class	Maximum Number of Mixtures
Atomless, uniformly absorbing	Deterministic stationary policies	1
Non-atomless, uniformly absorbing	Chattering stationary policies (order $d+1$ )	$d+1$

The table encapsulates the core classification: deterministic stationary policies suffice for atomless models, while chattering stationary policies (mixing among $d+1$ deterministic policies) suffice in the more general case.

The Feinberg-Piunovskiy Theorem is thus a foundational result in the theory of discrete-time MDPs, establishing sharp sufficiency results for optimal policy structures and illuminating the geometry of attainable performance via occupation measures and convexity.