Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure
Abstract: We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. Unlike previous approaches, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by unrestricted grammars. We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches. We show that this increase in expressive power does not come at the cost of increased automaton complexity. A selection of learning algorithms are presented which exploit automaton structure to improve sample efficiency. We show that the state machines required in our formulation can be specified from natural language task descriptions using LLMs. Empirical results demonstrate that our method outperforms competing approaches in terms of sample efficiency, automaton complexity, and task completion.
- Learning and solving regular decision processes. arXiv preprint arXiv:2003.01008.
- Rudder: Return decomposition for delayed rewards. Advances in Neural Information Processing Systems, 32.
- Agent57: Outperforming the atari human benchmark. In International conference on machine learning, 507–517. PMLR.
- The option keyboard: Combining skills in reinforcement learning. Advances in Neural Information Processing Systems, 32.
- Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems, 13(1-2): 41–77.
- LTLf/LDLf non-markovian rewards. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
- Planning for LTLf/LDLf Goals in Non-Markovian Fully Observable Nondeterministic Domains. In IJCAI, 1602–1608.
- LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. In IJCAI, volume 19, 6065–6073.
- Dietterich, T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research, 13: 227–303.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070.
- Counter machines and counter languages. Mathematical systems theory, 2(3): 265–283.
- Introduction to automata theory, languages, and computation. CNIB.
- Using reward machines for high-level task specification and decomposition in reinforcement learning. In International Conference on Machine Learning, 2107–2116. PMLR.
- Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73: 173–208.
- A composable specification language for reinforcement learning tasks. Advances in Neural Information Processing Systems, 32.
- Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293.
- Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29.
- Optimal control with learned local models: Application to dexterous manipulation. In 2016 IEEE International Conference on Robotics and Automation (ICRA), 378–383. IEEE.
- Learning multi-level hierarchies with hindsight. arXiv preprint arXiv:1712.00948.
- Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3834–3839. IEEE.
- Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341.
- Marcus, G. 2020. The next decade in AI: four steps towards robust artificial intelligence. arXiv preprint arXiv:2002.06177.
- Rebooting AI: Building artificial intelligence we can trust. Vintage.
- Detect, understand, act: A neuro-symbolic hierarchical reinforcement learning framework. Machine Learning, 111(4): 1523–1549.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
- OpenAI. 2023. Blog — openai.com. https://openai.com/blog. [Accessed 23-11-2023].
- Reinforcement learning with hierarchies of machines. Advances in neural information processing systems, 10.
- Neuro-symbolic artificial intelligence. AI Communications, 34(3): 197–209.
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2): 181–211.
- Valiant, L. G. 2008. Knowledge infusion: In pursuit of robustness in artificial intelligence. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
- Q-learning. Machine learning, 8: 279–292.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.