Generic uniqueness of the bias vector of finite stochastic games with perfect information (1610.09651v2)
Abstract: Mean-payoff zero-sum stochastic games can be studied by means of a nonlinear spectral problem. When the state space is finite, the latter consists in finding an eigenpair $(u,\lambda)$ solution of $T(u)=\lambda e + u$, where $T:\mathbb{R}n \to \mathbb{R}n$ is the Shapley (or dynamic programming) operator, $\lambda$ is a scalar, $e$ is the unit vector, and $u \in \mathbb{R}n$. The scalar $\lambda$ yields the mean payoff per time unit, and the vector $u$, called the bias, allows one to determine optimal stationary strategies. The existence of the eigenpair $(u,\lambda)$ is generally related to ergodicity conditions. A basic issue is to understand for which classes of games the bias vector is unique (up to an additive constant). In this paper, we consider perfect-information zero-sum stochastic games with finite state and action spaces, thinking of the transition payments as variable parameters, transition probabilities being fixed. We show that the bias vector, thought of as a function of the transition payments, is generically unique (up to an additive constant). The proof uses techniques of max-plus (or tropical) algebra and nonlinear Perron-Frobenius theory. As an application of our results, we obtain a perturbation scheme allowing one to solve degenerate instances of stochastic games by policy iteration.