Interesting Object, Curious Agent: Learning Task-Agnostic Exploration (2111.13119v1)

Published 25 Nov 2021 in cs.LG

Abstract: Common approaches for task-agnostic exploration learn tabula-rasa --the agent assumes isolated environments and no prior knowledge or experience. However, in the real world, agents learn in many environments and always come with prior experiences as they explore new ones. Exploration is a lifelong process. In this paper, we propose a paradigm change in the formulation and evaluation of task-agnostic exploration. In this setup, the agent first learns to explore across many environments without any extrinsic goal in a task-agnostic manner. Later on, the agent effectively transfers the learned exploration policy to better explore new environments when solving tasks. In this context, we evaluate several baseline exploration strategies and present a simple yet effective approach to learning task-agnostic exploration policies. Our key idea is that there are two components of exploration: (1) an agent-centric component encouraging exploration of unseen parts of the environment based on an agent's belief; (2) an environment-centric component encouraging exploration of inherently interesting objects. We show that our formulation is effective and provides the most consistent exploration across several training-testing environment pairs. We also introduce benchmarks and metrics for evaluating task-agnostic exploration strategies. The source code is available at https://github.com/sparisi/cbet/.

Citations (47)

View on Semantic Scholar

Summary

The paper introduces C-BET, a new intrinsic reward method combining agent- and environment-centric exploration strategies.
It demonstrates that transferring exploration knowledge across diverse environments enables agents to identify goal states before extrinsic rewards.
Experimental results in MiniGrid and Habitat show that C-BET outperforms traditional techniques in enhancing exploration efficiency.

Intelligent Object Exploration: Task-Agnostic Exploration Paradigms

This paper introduces a transformative paradigm in the field of Reinforcement Learning (RL) for intelligent agents by focusing on task-agnostic exploration across multiple environments. It challenges the conventional standalone task-agnostic exploration paradigm which assumes isolated environments where agents begin their exploration without prior knowledge, akin to a tabula-rasa state. The research posits that this setup fails to reflect the lifelong learning process inherent to human exploration and proposes a more realistic approach where agents leverage prior experiences to explore novel environments more efficiently.

Methodology

The authors propose a distinction in exploration strategies by incorporating agent-centric and environment-centric components:

Agent-Centric Exploration: Encourages the exploration of unseen areas based on the agent’s belief and interaction with its environment.
Environment-Centric Exploration: Focuses on the inherent interestingness of environmental components which might be universally relevant for any agent.

Two key components enable this setup:

Change-Based Exploration Transfer (C-BET): A novel approach comprising intrinsic reward structures, combining rare environmental changes and less visited states to sustain exploration motivation.
Random Reset of Counts: To prevent the decay of intrinsic rewards over extensive exploration periods, ensuring continuous learning from diverse experiences.

Experimental Framework

Experiments are conducted on MiniGrid, a procedurally generated gridworld, and Habitat, a visually intricate simulator. They utilize Markov Decision Processes (MDP) where multiple environments are navigated through with interaction components like keys, doors, and boxes. The setups—SingleEnv (one-to-many) and MultiEnv (many-to-many)—illustrate the transfer efficacy and generalization capability of the exploration policies trained with the C-BET intrinsic reward compared to traditional approaches.

Results and Insights

Results demonstrate that C-BET not only enhances unique interactions within environments but also identifies goal states even before extrinsic reward learning. This signifies its potential to solve complex environments more efficiently. C-BET outperforms baselines across environments, particularly when examining obscure components or complex tasks, showcasing its capacity to leverage pre-learning experiences from diverse setups.

Implications and Future Directions

The implications extend to both theoretical advancements and practical applications in AI:

Theoretical: The disentanglement of exploration from exploitation represents a significant shift in RL methodologies, inviting further research into multi-environmental learning and continuous exploration.
Practical: Deploying agents capable of lifelong learning could advance applications in domains like robotics, autonomous systems, and dynamic environments requiring adaptive exploration capabilities.

Future research could focus on enhancing continuous space reward mechanisms, refining environment-centric explorations, and optimizing dynamic resets for count data, addressing challenges in stochastic environments and aligning exploration strategies with safety constraints in real-world scenarios. These developments could synergize the capabilities of AI towards more robust, reliable, and efficient autonomous systems functioning beyond simplistic tabula-rasa exploration strategies.

PDF Markdown

Related Papers

GitHub

GitHub - sparisi/cbet: Change-Based Exploration Transfer (35 stars)
GitHub - sparisi/cbet: Change-Based Exploration Transfer (35 stars)