Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov (2401.11325v3)
Abstract: Many Reinforcement Learning algorithms assume a Markov reward function to guarantee optimality. However, not all reward functions are Markov. This paper proposes a framework for mapping non-Markov reward functions into equivalent Markov ones by learning specialized reward automata, Reward Machines. Unlike the general practice of learning Reward Machines, we do not require a set of high-level propositional symbols from which to learn. Rather, we learn hidden triggers, directly from data, that construct them. We demonstrate the importance of learning Reward Machines over their Deterministic Finite-State Automata counterparts given their ability to model reward dependencies. We formalize this distinction in our learning objective. Our mapping process is constructed as an Integer Linear Programming problem. We prove that our mappings form a suitable proxy for maximizing reward expectations. We empirically validate our approach by learning black-box, non-Markov reward functions in the Officeworld domain. Additionally, we demonstrate the effectiveness of learning reward dependencies in a new domain, Breakfastworld.
- Dana Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2):87–106, November 1987.
- The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. arXiv:1207.4708 [cs].
- Richard Bellman. A Markovian Decision Process. Journal of Mathematics and Mechanics, 6(5):679–684, 1957. Publisher: Indiana University Mathematics Department.
- LTLf/LDLf Non-Markovian Rewards. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), April 2018. Number: 1.
- Learning Interpretable Models Expressed in Linear Temporal Logic. Proceedings of the International Conference on Automated Planning and Scheduling, 29:621–630, May 2021.
- LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pages 6065–6073, Macao, China, August 2019. International Joint Conferences on Artificial Intelligence Organization.
- Inverse Reinforcement Learning in Partially Observable Environments. Journal of Machine Learning Research, 12(21):691–730, 2011.
- Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior. January 2020. arXiv:2301.02952 [cs].
- Inferring Probabilistic Reward Machines from Non-Markovian Reward Processes for Reinforcement Learning, March 2022. arXiv:2107.04633 [cs, stat].
- Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning, October 2022. arXiv:2205.15367 [cs].
- Induction and Exploitation of Subgoal Automata for Reinforcement Learning. Journal of Artificial Intelligence Research, 70:1031–1116, March 2021.
- Reinforcement Learning with Non-Markovian Rewards, December 2019. arXiv:1912.02552 [cs].
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
- Deep Recurrent Q-Learning for Partially Observable MDPs. 2015.
- Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, pages 2107–2116. PMLR, July 2018. ISSN: 2640-3498.
- Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning. Journal of Artificial Intelligence Research, 73:173–208, January 2022. arXiv:2010.03950 [cs].
- Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3834–3839, Vancouver, BC, September 2017. IEEE.
- Playing Atari with Deep Reinforcement Learning, December 2013. arXiv:1312.5602 [cs].
- Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, pages 663–670, San Francisco, CA, USA, June 2000. Morgan Kaufmann Publishers Inc.
- Extraction of rules from discrete-time recurrent neural networks. Neural Networks, 9(1):41–52, January 1996.
- Learning Non-Markovian Reward Models in MDPs, January 2020. arXiv:2001.09293 [cs].
- Online Learning of non-Markovian Reward Models:. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence, pages 74–86, Online Streaming, — Select a Country —, 2021. SCITEPRESS - Science and Technology Publications.
- Interpretable Sequence Classification via Discrete Optimization, October 2020. arXiv:2010.02819 [cs].
- Richard S. Sutton. Reinforcement learning an introduction. Adaptive computation and machine learning series. MIT Press, Cambridge, Mass, 1998.
- An inverse reinforcement learning algorithm for semi-Markov decision processes. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–6, November 2017.
- Decision-theoretic planning with non-Markovian rewards. Journal of Artificial Intelligence Research, 25(1):17–74, January 2006.
- Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model. Neural Computation, 7(4):822–844, July 1995.
- Learning Reward Machines for Partially Observable Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292, May 1992.
- Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples, February 2020. arXiv:1711.09576 [cs].
- Joint Inference of Reward Machines and Policies for Reinforcement Learning. Proceedings of the International Conference on Automated Planning and Scheduling, 30:590–598, June 2020.