Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version) (1706.08100v1)

Published 25 Jun 2017 in cs.AI

Abstract: In Markov Decision Processes (MDPs), the reward obtained in a state depends on the properties of the last state and action. This state dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle such non-Markovian reward function was the subject of two previous lines of work, both using variants of LTL to specify the reward function and then compiling the new model back into a Markovian model. Building upon recent progress in the theories of temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

Authors (3)

Ronen Brafman (3 papers)
Giuseppe De Giacomo (41 papers)
Fabio Patrizi (13 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version) (1706.08100v1)

Summary

Related Papers