Goals as Reward-Producing Programs (2405.13242v3)

Published 21 May 2024 in cs.AI

Abstract: People are remarkably capable of generating their own goals, beginning with child's play and continuing into adulthood. Despite considerable empirical and computational work on goals and goal-oriented behavior, models are still far from capturing the richness of everyday human goals. Here, we bridge this gap by collecting a dataset of human-generated playful goals (in the form of scorable, single-player games), modeling them as reward-producing programs, and generating novel human-like goals through program synthesis. Reward-producing programs capture the rich semantics of goals through symbolic operations that compose, add temporal constraints, and allow for program execution on behavioral traces to evaluate progress. To build a generative model of goals, we learn a fitness function over the infinite set of possible goal programs and sample novel goals with a quality-diversity algorithm. Human evaluators found that model-generated goals, when sampled from partitions of program space occupied by human examples, were indistinguishable from human-created games. We also discovered that our model's internal fitness scores predict games that are evaluated as more fun to play and more human-like.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a novel framework that recasts human goal generation as reward-producing programs using a domain-specific language and evolutionary strategies.
The paper employs a generative model with contrastive learning and the MAP-Elites algorithm to generate diverse, human-like goal structures validated by human raters.
The paper highlights practical and theoretical implications for AI and cognitive science by enabling more flexible, goal-oriented systems.

Goals as Reward-Producing Programs: An Overview

In the context of understanding human goal-setting and creativity, the paper "Goals as Reward-Producing Programs" by Davidson, Todd, Togelius, Gureckis, and Lake proposes a novel approach by conceptualizing goals as reward-producing programs. This approach captures the inherent richness and complexity of human-generated goals, which are often beyond the scope of traditional reinforcement learning models that focus on simpler, task-specific goals.

Key Components and Methodology

The research involves several key components:

Dataset Collection: The paper begins by collecting a dataset of goals from human participants. These participants were asked to create single-player games within a simulated 3D environment, AI2-THOR, which resembles a child's bedroom filled with toys and other objects. The dataset consists of 98 games, each described in natural language and then manually translated into a domain-specific language (DSL).
Domain-Specific Language (DSL): The DSL used in this paper is inspired by the Planning Domain Definition Language (PDDL) and includes constructs for specifying goals, preferences, and scoring rules. The DSL allows for the representation of goals as structured programs that can be executed to evaluate an agent's progress. This representation facilitates compositional reuse of goal components and makes the semantics of goals explicit.
Generative Model: To generate novel human-like goals, the authors devise the Goal Program Generator (GPG) model. This model learns a fitness function over the space of possible goals, guided by features that capture cognitive aspects such as common sense and compositionality. The fitness function is optimized through a contrastive learning approach that distinguishes human-generated goals from corrupted examples.
Quality-Diversity Algorithm: The model utilizes the MAP-Elites algorithm, a quality-diversity evolutionary algorithm that maintains a diverse set of high-fitness samples across different behavioral niches. The approach ensures that the generated goals cover a broad spectrum of possible goal types, rather than converging on a narrow set of solutions.

Behavioral Findings

The behavioral component of the paper reveals several insights into human goal generation:

Common Sense: Participants exhibit an intuitive understanding of physical properties when creating goals. For example, balls are commonly used in throwing games, while blocks are used in stacking games.
Compositionality: Participants often reuse common structural elements across different goals, indicating a compositional approach to goal creation. However, there is also a significant amount of creativity, as evidenced by the long tail of unique goal structures.

Modelling Results

The GPG model successfully generates novel and diverse goals that score highly on the learned fitness function. Evaluation by human raters indicates that goals generated by the model from regions of program space close to human examples are judged to be nearly indistinguishable from human-generated goals. Moreover, the model's internal fitness scores correlate with human judgments of how fun and human-like the games are.

Implications and Future Directions

This research has significant implications for both practical and theoretical aspects of AI and cognitive science:

Practical Implications: The framework and model can be applied to create more sophisticated goal-oriented AI agents, particularly in fields such as automated game design and reinforcement learning. By incorporating human-like goals, AI systems could achieve more flexible and generalizable behaviors.
Theoretical Implications: The program-based representation of goals aligns with cognitive theories suggesting that human goal generation is a combinatorial process involving the reuse of smaller components. This perspective could inform future cognitive models that aim to capture the richness of human goal-directed behavior.

Conclusion

"Goals as Reward-Producing Programs" presents a comprehensive and innovative framework for understanding and modeling human goal generation. By leveraging symbolic representations and evolutionary algorithms, the paper bridges the gap between the simplicity of current reinforcement learning goals and the complexity of human-generated goals. The insights and methodologies proposed in this paper pave the way for future research that could further enhance our understanding of human cognition and improve the capabilities of goal-oriented AI systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/togelius/status/1796656865520021792

https://twitter.com/guyd33/status/1796644541711134813