Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure

Published 18 Dec 2023 in cs.AI | (2312.11364v2)

Abstract: We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. Unlike previous approaches, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by unrestricted grammars. We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches. We show that this increase in expressive power does not come at the cost of increased automaton complexity. A selection of learning algorithms are presented which exploit automaton structure to improve sample efficiency. We show that the state machines required in our formulation can be specified from natural language task descriptions using LLMs. Empirical results demonstrate that our method outperforms competing approaches in terms of sample efficiency, automaton complexity, and task completion.

Abstract PDF HTML Upgrade to Chat

References (31)

Summary

The paper introduces Counting Reward Automata, a novel state machine that models diverse reward functions beyond regular languages.
It presents a tailored Q-learning variant that leverages the automaton structure to significantly boost sample efficiency in long-horizon tasks.
Empirical and theoretical analyses demonstrate that CRAs enable integration of symbolic structures, leading to improved task performance and optimal policy convergence.

Introduction to Counting Reward Automata

Reinforcement Learning (RL) has made strides in various complex domains, but its success has been mainly limited to tasks of short duration or "short-horizon tasks." End-to-end learning, while powerful, restricts agents from accessing problem structures known to the system designers, leading to inefficiencies in learning. Recognizing this, researchers have made efforts to integrate neural and symbolic methods, aiming to combine the strengths of both worlds. Hierarchical reinforcement learning (HRL) and state machine-based approaches suggest promising directions. However, these methods come with challenges, particularly in long-horizon plans where they often fall short due to their inherent limitations in task expression capability.

The Innovation of Counting Reward Automata

The paper introduces a novel state machine variant termed as Counting Reward Automata (CRA), designed to represent any reward function that a computer algorithm can handle. Counting Reward Automata possess the flexibility of symbolic systems and can express a broader range of tasks compared to existing approaches that are constrained by their dependency on regular languages. Counting Reward Automata can encapsulate reward functions as formal languages, transcending the expressive limits of regular languages. Agents supplied with a CRA abstract machine can tackle a more diverse set of tasks and utilize learning algorithms that maximize sample efficiency by harnessing the CRA structure. Additionally, the paper elaborates on learning algorithms for CRA-modeled tasks and their convergence to optimal policies.

Enhancing Reinforcement Learning

The presented method outdoes competing approaches in sample efficiency, automaton complexity, and tasks completion. CRAs allow specifying a task from a natural language description, leveraging the capacity of LLMs to generate intuitive state machines. The learning algorithms utilize automaton structure, such as counterfactual reasoning, to boost sample efficiency and task performance. The paper includes a variant of Q-learning tailored to the CRA framework, highlighting its utility in complex, long-horizon domains.

Conclusion and Applications

In conclusion, the paper postulates that Counting Reward Automata facilitate the modeling of diverse reward functions for RL, substantiated by both theoretical proofs and empirical evaluation. The paper demonstrates how expert knowledge can be naturally subsumed within CRA by leveraging natural language task descriptions and LLMs to specify tasks succinctly. These advancements represent a significant leap in addressing long-horizon RL tasks, enabling agents to execute more complex strategies with improved efficiency and a clear comprehension of the tasks at hand.