Grounding Language to Autonomously-Acquired Skills via Goal Generation (2006.07185v3)

Published 12 Jun 2020 in cs.AI, cs.LG, and stat.ML

Abstract: We are interested in the autonomous acquisition of repertoires of skills. Language-conditioned reinforcement learning (LC-RL) approaches are great tools in this quest, as they allow to express abstract goals as sets of constraints on the states. However, most LC-RL agents are not autonomous and cannot learn without external instructions and feedback. Besides, their direct language condition cannot account for the goal-directed behavior of pre-verbal infants and strongly limits the expression of behavioral diversity for a given language input. To resolve these issues, we propose a new conceptual approach to language-conditioned RL: the Language-Goal-Behavior architecture (LGB). LGB decouples skill learning and language grounding via an intermediate semantic representation of the world. To showcase the properties of LGB, we present a specific implementation called DECSTR. DECSTR is an intrinsically motivated learning agent endowed with an innate semantic representation describing spatial relations between physical objects. In a first stage (G -> B), it freely explores its environment and targets self-generated semantic configurations. In a second stage (L -> G), it trains a language-conditioned goal generator to generate semantic goals that match the constraints expressed in language-based inputs. We showcase the additional properties of LGB w.r.t. both an end-to-end LC-RL approach and a similar approach leveraging non-semantic, continuous intermediate representations. Intermediate semantic representations help satisfy language commands in a diversity of ways, enable strategy switching after a failure and facilitate language grounding.

Authors (5)

Ahmed Akakzia (6 papers)
Cédric Colas (27 papers)
Pierre-Yves Oudeyer (95 papers)
Mohamed Chetouani (36 papers)
Olivier Sigaud (56 papers)

Citations (3)

View on Semantic Scholar

Summary

The paper presents a novel LGB architecture that decouples skill learning from language grounding to foster autonomous, self-supervised exploration.
It employs semantic environment representations and a c-VAE within the DECSTR framework to map language commands to semantic goals.
Experiments demonstrate improved behavioral diversity and success compared to traditional end-to-end language-conditioned RL approaches.

Decoupling Skill Learning and Language Grounding in Reinforcement Learning

The paper, entitled "Grounding Language to Autonomously-Acquired Skills via Goal Generation," introduces a novel architectural framework known as the Language-Goal-Behavior (LGB) architecture. This framework addresses the challenge of autonomous skill acquisition in agents by decoupling skill learning from language grounding processes. The LGB architecture leverages a semantic representation of the environment to facilitate this decoupling, offering a fresh perspective on the integration of language-conditioned reinforcement learning (RL) in developmental robotics.

The central motivation behind this work is to construct agents capable of self-supervised exploration and skill acquisition in open-ended environments while simultaneously learning to follow language commands from a tutor. Traditional language-conditioned RL approaches, such as end-to-end systems, often convolve skill learning and language understanding in ways that limit behavioral diversity and adaptability. Conversely, the LGB architecture separates these two facets through an innate semantic representation, thereby enhancing the agent's ability to satisfy language commands in a diversified manner and to adapt strategies post-failure.

The paper demonstrates the efficacy of the LGB architecture through an implementation named DECSTR (DEep sets and Curriculum with SemanTic goal Representations). DECSTR exemplifies an intrinsically motivated learning agent with inherent semantic capabilities describing spatial relations among objects. This agent undergoes two distinct phases: an initial exploration phase to build a repertoire of skills and a subsequent language grounding phase to map language inputs to semantic goals. The results indicate that this architectural strategy allows for greater behavioral diversity and adaptability compared to traditional approaches.

The experiments conducted validate the premises of the LGB architecture. During the skill learning phase, DECSTR was able to autonomously discover and master all reachable semantic configurations in a manipulation environment. The semantic representations, based on spatial predicates, facilitated a more opportunistic approach to goal achievement, reducing the need for domain-specific knowledge and engineered curricula. In the language grounding phase, the use of a Conditional Variational Auto-Encoder (c-vae) allowed DECSTR to successfully generate semantic goals from language commands, showcasing its ability to generalize and adapt to novel instructions and configurations.

A critical analysis of DECSTR against a constructed baseline—language-conditioned RL without the semantic intermediary—reveals that the LGB approach significantly enhances performance in terms of both success rates and behavioral diversity. Furthermore, by incorporating semantic representations, DECSTR achieved competitive results in skill learning while maintaining superior adaptability when integrating language commands.

The introduction of intermediate semantic representations holds profound implications for the field of developmental robotics and AI. By parallelizing skill learning and language acquisition, the LGB architecture embodies a more human-like developmental process, aligning with principles observed in pre-verbal infants. This decoupling fosters an agent's proactive goal setting and outound reasoning, essential for advancing autonomous systems capable of complex task execution and human-robot interaction.

Future advancements may focus on extending the LGB framework to accommodate varying object quantities and types, potentially employing graph neural networks to handle complexities arising in different domains. Moreover, integrating simultaneous skill and language learning phases represents a promising avenue for creating more robust developmental sequences in robotic systems.

In conclusion, this paper presents a compelling argument and methodology for redefining how autonomous agents can learn and ground language through intrinsic motivation and semantic abstraction. The LGB architecture not only paves the way for more adaptive and diversified behaviors but also suggests a paradigm shift in the design of intelligent systems capable of flexible and resilient learning trajectories.

PDF Markdown

Related Papers

YouTube

Show All Videos