Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Counterexample and a Corrective to the Vector Extension of the Bellman Equations of a Markov Decision Process

Published 29 Jun 2023 in math.OC | (2306.16937v3)

Abstract: Under the expected total reward criterion, the optimal value of a finite-horizon Markov decision process can be determined by solving the Bellman equations. The equations were extended by D. J. White to processes with vector rewards in 1982. Using a counterexample, we show that the assumptions underlying this extension fail to guarantee its validity. Analysis of the counterexample leads us to articulate a sufficient condition for White's functional equations to be valid. The condition is shown to be true when the policy space has been refined to include a special class of non-Markovian policies, or when the dynamics of the model are deterministic, or when the decision making horizon does not exceed three time steps. The paper demonstrates that, in general, the solutions to White's equations are sets of Pareto efficient policy returns over the refined policy space. Our results are illustrated with an example.

Authors (1)
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.