Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks (2210.04476v2)

Published 10 Oct 2022 in cs.RO, cs.CL, and cs.LG

Abstract: Demonstrations and natural language instructions are two common ways to specify and teach robots novel tasks. However, for many complex tasks, a demonstration or language instruction alone contains ambiguities, preventing tasks from being specified clearly. In such cases, a combination of both a demonstration and an instruction more concisely and effectively conveys the task to the robot than either modality alone. To instantiate this problem setting, we train a single multi-task policy on a few hundred challenging robotic pick-and-place tasks and propose DeL-TaCo (Joint Demo-Language Task Conditioning), a method for conditioning a robotic policy on task embeddings comprised of two components: a visual demonstration and a language instruction. By allowing these two modalities to mutually disambiguate and clarify each other during novel task specification, DeL-TaCo (1) substantially decreases the teacher effort needed to specify a new task and (2) achieves better generalization performance on novel objects and instructions over previous task-conditioning methods. To our knowledge, this is the first work to show that simultaneously conditioning a multi-task robotic manipulation policy on both demonstration and language embeddings improves sample efficiency and generalization over conditioning on either modality alone. See additional materials at https://deltaco-robot.github.io/

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (2)

Albert Yu (7 papers)
Raymond J. Mooney (35 papers)

Citations (14)

View on Semantic Scholar

GitHub

Intro | DeL-TaCo

Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks (2210.04476v2)

Related Papers

GitHub