Papers
Topics
Authors
Recent
Search
2000 character limit reached

Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure

Published 18 Dec 2023 in cs.AI | (2312.11364v2)

Abstract: We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. Unlike previous approaches, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by unrestricted grammars. We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches. We show that this increase in expressive power does not come at the cost of increased automaton complexity. A selection of learning algorithms are presented which exploit automaton structure to improve sample efficiency. We show that the state machines required in our formulation can be specified from natural language task descriptions using LLMs. Empirical results demonstrate that our method outperforms competing approaches in terms of sample efficiency, automaton complexity, and task completion.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Learning and solving regular decision processes. arXiv preprint arXiv:2003.01008.
  2. Rudder: Return decomposition for delayed rewards. Advances in Neural Information Processing Systems, 32.
  3. Agent57: Outperforming the atari human benchmark. In International conference on machine learning, 507–517. PMLR.
  4. The option keyboard: Combining skills in reinforcement learning. Advances in Neural Information Processing Systems, 32.
  5. Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems, 13(1-2): 41–77.
  6. LTLf/LDLf non-markovian rewards. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  7. Planning for LTLf/LDLf Goals in Non-Markovian Fully Observable Nondeterministic Domains. In IJCAI, 1602–1608.
  8. LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. In IJCAI, volume 19, 6065–6073.
  9. Dietterich, T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research, 13: 227–303.
  10. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070.
  11. Counter machines and counter languages. Mathematical systems theory, 2(3): 265–283.
  12. Introduction to automata theory, languages, and computation. CNIB.
  13. Using reward machines for high-level task specification and decomposition in reinforcement learning. In International Conference on Machine Learning, 2107–2116. PMLR.
  14. Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73: 173–208.
  15. A composable specification language for reinforcement learning tasks. Advances in Neural Information Processing Systems, 32.
  16. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293.
  17. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29.
  18. Optimal control with learned local models: Application to dexterous manipulation. In 2016 IEEE International Conference on Robotics and Automation (ICRA), 378–383. IEEE.
  19. Learning multi-level hierarchies with hindsight. arXiv preprint arXiv:1712.00948.
  20. Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3834–3839. IEEE.
  21. Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341.
  22. Marcus, G. 2020. The next decade in AI: four steps towards robust artificial intelligence. arXiv preprint arXiv:2002.06177.
  23. Rebooting AI: Building artificial intelligence we can trust. Vintage.
  24. Detect, understand, act: A neuro-symbolic hierarchical reinforcement learning framework. Machine Learning, 111(4): 1523–1549.
  25. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
  26. OpenAI. 2023. Blog — openai.com. https://openai.com/blog. [Accessed 23-11-2023].
  27. Reinforcement learning with hierarchies of machines. Advances in neural information processing systems, 10.
  28. Neuro-symbolic artificial intelligence. AI Communications, 34(3): 197–209.
  29. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2): 181–211.
  30. Valiant, L. G. 2008. Knowledge infusion: In pursuit of robustness in artificial intelligence. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
  31. Q-learning. Machine learning, 8: 279–292.

Summary

  • The paper introduces Counting Reward Automata, a novel state machine that models diverse reward functions beyond regular languages.
  • It presents a tailored Q-learning variant that leverages the automaton structure to significantly boost sample efficiency in long-horizon tasks.
  • Empirical and theoretical analyses demonstrate that CRAs enable integration of symbolic structures, leading to improved task performance and optimal policy convergence.

Introduction to Counting Reward Automata

Reinforcement Learning (RL) has made strides in various complex domains, but its success has been mainly limited to tasks of short duration or "short-horizon tasks." End-to-end learning, while powerful, restricts agents from accessing problem structures known to the system designers, leading to inefficiencies in learning. Recognizing this, researchers have made efforts to integrate neural and symbolic methods, aiming to combine the strengths of both worlds. Hierarchical reinforcement learning (HRL) and state machine-based approaches suggest promising directions. However, these methods come with challenges, particularly in long-horizon plans where they often fall short due to their inherent limitations in task expression capability.

The Innovation of Counting Reward Automata

The paper introduces a novel state machine variant termed as Counting Reward Automata (CRA), designed to represent any reward function that a computer algorithm can handle. Counting Reward Automata possess the flexibility of symbolic systems and can express a broader range of tasks compared to existing approaches that are constrained by their dependency on regular languages. Counting Reward Automata can encapsulate reward functions as formal languages, transcending the expressive limits of regular languages. Agents supplied with a CRA abstract machine can tackle a more diverse set of tasks and utilize learning algorithms that maximize sample efficiency by harnessing the CRA structure. Additionally, the paper elaborates on learning algorithms for CRA-modeled tasks and their convergence to optimal policies.

Enhancing Reinforcement Learning

The presented method outdoes competing approaches in sample efficiency, automaton complexity, and tasks completion. CRAs allow specifying a task from a natural language description, leveraging the capacity of LLMs to generate intuitive state machines. The learning algorithms utilize automaton structure, such as counterfactual reasoning, to boost sample efficiency and task performance. The paper includes a variant of Q-learning tailored to the CRA framework, highlighting its utility in complex, long-horizon domains.

Conclusion and Applications

In conclusion, the paper postulates that Counting Reward Automata facilitate the modeling of diverse reward functions for RL, substantiated by both theoretical proofs and empirical evaluation. The paper demonstrates how expert knowledge can be naturally subsumed within CRA by leveraging natural language task descriptions and LLMs to specify tasks succinctly. These advancements represent a significant leap in addressing long-horizon RL tasks, enabling agents to execute more complex strategies with improved efficiency and a clear comprehension of the tasks at hand.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.