Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inferring Probabilistic Reward Machines from Non-Markovian Reward Processes for Reinforcement Learning (2107.04633v2)

Published 9 Jul 2021 in cs.LG, cs.FL, and stat.ML

Abstract: The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process and prove results around its correctness and convergence.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Taylor Dohmen (5 papers)
  2. Noah Topper (3 papers)
  3. George Atia (31 papers)
  4. Andre Beckus (13 papers)
  5. Ashutosh Trivedi (76 papers)
  6. Alvaro Velasquez (56 papers)
Citations (12)