Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

Published 12 Feb 2019 in cs.LO | (1902.04538v2)

Abstract: The paper addresses two variants of the stochastic shortest path problem ('optimize the accumulated weight until reaching a goal state') in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights, where the probability mass is redistributed to paths reaching the goal. Both variants constitute useful approaches to the analysis of systems without guarantees on the occurrence of an event of interest (reaching a goal state), but have only been studied in structures with non-negative weights. Our main results are as follows. There are polynomial-time algorithms to check the finiteness of the supremum of the partial or conditional expectations in MDPs with arbitrary integer weights. If finite, then optimal weight-based deterministic schedulers exist. In contrast to the setting of non-negative weights, optimal schedulers can need infinite memory and their value can be irrational. However, the optimal value can be approximated up to an absolute error of $ε$ in time exponential in the size of the MDP and polynomial in $\log(1/ε)$.

Abstract PDF Upgrade to Chat

Citations (4)

View on Semantic Scholar

Summary

The paper introduces polynomial-time algorithms to verify the finiteness of partial and conditional expectations in integer-weighted MDPs.
It demonstrates that optimal weight-based deterministic schedulers exist, despite requiring infinite memory in contrast to non-negative weighted MDPs.
The study leverages LP techniques and supermartingale concepts to approximate optimal values with an absolute error of ε.

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

Introduction

The paper "Partial and Conditional Expectations in Markov Decision Processes with Integer Weights" (1902.04538) explores the optimization of stochastic shortest path (SSP) problems within Markov decision processes (MDPs) characterized by integer weights. SSP problems seek to optimize the accumulated weight until reaching a goal state, an extension of classic shortest path problems in weighted graph structures. This research investigates two key variants: partial expected accumulated weights and conditional expected accumulated weights. The study is grounded in the challenges posed by integer weights, particularly their impact on optimal scheduler strategies in MDPs.

Main Contributions

The study proposes polynomial-time algorithms for verifying the finiteness of the supremum of partial or conditional expectations in MDPs with integer weights. This finding is critical, as optimal weight-based deterministic schedulers are shown to exist given that these supremums are finite. Unlike MDPs with non-negative weights, where optimal schedulers may operate memorylessly, integer-weighted MDPs necessitate infinite memory for optimal scheduling. The research further demonstrates that, despite potential irrational optimal values, these can be approximated with an absolute error of $\epsilon$ in time exponential in the MDP size and polynomial in $\log(1/\epsilon)$ .

Results

Partial and Conditional Expectations

The paper defines partial expectations in MDPs by assigning a weight of zero to paths not reaching the goal state, redistributing the probability mass of conditional expectations to paths achieving the goal. It establishes that if MDPs lack positively weight-divergent end components, then the supremum is finite. Further, the study extends existing linear programming techniques from non-negative MDPs, proving their applicability in approximating optimal values in integer-weighted settings.

Optimal Schedulers

A significant aspect of the paper is the proof of existence for optimal weight-based deterministic schedulers. This finding is underpinned by converting randomized schedulers into deterministic ones without loss of expected value, thus paving the way for compact metric space analysis and upper semi-continuous function mapping. Consequently, optimal scheduling is achievable, albeit requiring an infinite-memory counter due to the oscillatory nature of accumulated weights in integer settings.

Computational Feasibility

The research explores computational strategies for efficient supremum determination, leveraging established theoretical frameworks such as Hordijk and Kallenberg’s supermartingale concept for constraining weight growth within end components. It facilitates a practical bounding approach for assessing accumulated weights, ensuring computational approximations are feasible within polynomial time bounds. This translates into a robust method for achieving $\epsilon$ -accuracy in both partial and conditional expectations, emphasizing practical application within exponentially complex MDPs.

Implications and Future Directions

The theoretical contributions significantly enhance the analytical landscape of MDPs, particularly within complex systems reliant on probabilistic models interlaced with nondeterministic elements. The findings are poised to influence future advancements in MDP model transformations, optimal scheduler design, and performance evaluations across diverse stochastic frameworks. Future work may involve refining scheduler periodicity within integer-weight environments, further optimizing computational thresholds, and addressing fundamental questions regarding the periodic nature of optimal scheduler actions in states subjected to oscillating integer-weight accumulations.

Conclusion

This paper forms a cornerstone contribution to the processing and optimization of MDPs with integer weights, navigating the theoretical intricacies of partial and conditional expectations to produce practical solutions grounded in polynomial-time computation. By bridging prior methodologies from non-negative weight paradigms and introducing innovative techniques, the study not only delineates the contours of optimal scheduling but also opens pathways for refined implementations suitable for real-world stochastic decision processes.

Markdown