ELLA: Exploration through Learned Language Abstraction (2103.05825v2)

Published 10 Mar 2021 in cs.CL, cs.AI, cs.LG, and cs.RO

Abstract: Building agents capable of understanding language instructions is critical to effective and robust human-AI collaboration. Recent work focuses on training these agents via reinforcement learning in environments with synthetic language; however, instructions often define long-horizon, sparse-reward tasks, and learning policies requires many episodes of experience. We introduce ELLA: Exploration through Learned Language Abstraction, a reward shaping approach geared towards boosting sample efficiency in sparse reward environments by correlating high-level instructions with simpler low-level constituents. ELLA has two key elements: 1) A termination classifier that identifies when agents complete low-level instructions, and 2) A relevance classifier that correlates low-level instructions with success on high-level tasks. We learn the termination classifier offline from pairs of instructions and terminal states. Notably, in departure from prior work in language and abstraction, we learn the relevance classifier online, without relying on an explicit decomposition of high-level instructions to low-level instructions. On a suite of complex BabyAI environments with varying instruction complexities and reward sparsity, ELLA shows gains in sample efficiency relative to language-based shaping and traditional RL methods.

Citations (52)

View on Semantic Scholar

Summary

The paper introduces ELLA (Exploration through Learned Language Abstraction), a method enhancing sample efficiency in sparse reward RL by using language instruction hierarchies to provide intermediate rewards.
ELLA employs a Termination Classifier to detect task completion and a Relevance Classifier to link low-level tasks to high-level goals, enabling effective reward shaping.
Empirical results in BabyAI environments show ELLA significantly improves exploration and performance compared to baselines, suggesting its potential for complex tasks and human-AI interaction.

Exploration through Learned Language Abstraction (ELLA)

The paper discusses ELLA, a novel approach to enhancing sample efficiency in reinforcement learning scenarios characterized by sparse reward structures and long-horizon tasks. ELLA aims to facilitate efficient exploration by leveraging the inherent structure within language instructions, correlating high-level commands with corresponding low-level tasks to provide intermediate reward signals.

Key Components of ELLA

ELLA is built around two primary classifiers:

Termination Classifier: This classifier determines when an agent has successfully completed a low-level task. It is trained offline using pairs of instructions and terminal states. The termination states of simpler tasks provide a straightforward basis for shaping rewards without requiring demonstration of complex tasks.
Relevance Classifier: Unlike the termination classifier, this component is trained online. It assesses which low-level tasks are pertinent to achieving high-level goals by examining successful task trajectories. The relevance classifier is trained by exploring successful trajectories and adjusting the policy based on the encountered low-level tasks linked to the high-level objectives.

Reward Shaping Approach

The reward shaping mechanism in ELLA introduces additional rewards when relevant low-level tasks are completed during the pursuit of the high-level task. This mechanism is governed by:

A bounded reward bonus to prevent distraction during learning.
An approach of neutralization for successful trajectories ensuring that the return matches the original sparse reward environment, thus maintaining policy invariance.

Empirical Evaluation

The effectiveness of ELLA is demonstrated in several BabyAI environments, where it consistently improves sample efficiency compared to traditional reinforcement learning methods and prior language-based reward shaping techniques. Notably, ELLA performs exceptionally well in environments with spatial sparsity and reward bottlenecks, offering better exploration strategies.

Implications and Future Directions

The implications of this research span both practical applications and theoretical advancements in AI. The approach exploits language hierarchies without needing explicit instruction decompositions, making it flexible and potentially applicable to more complex tasks outside synthetic environments. Future developments could include extending ELLA to work with natural language instructions and integrating intrinsic motivation methods to reward different aspects of exploration more effectively.

In conclusion, ELLA presents a significant step towards utilizing language abstraction to boost exploration in sparse reward environments, offering a promising direction for improving human-AI collaboration in practical scenarios through more effective interpretation and execution of complex language instructions.

Related Papers

YouTube

Show All Videos