"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy (2301.02555v1)

Published 6 Jan 2023 in cs.RO, cs.AI, cs.CL, cs.HC, and cs.LG

Abstract: Systems for language-guided human-robot interaction must satisfy two key desiderata for broad adoption: adaptivity and learning efficiency. Unfortunately, existing instruction-following agents cannot adapt, lacking the ability to incorporate online natural language supervision, and even if they could, require hundreds of demonstrations to learn even simple policies. In this work, we address these problems by presenting Language-Informed Latent Actions with Corrections (LILAC), a framework for incorporating and adapting to natural language corrections - "to the right," or "no, towards the book" - online, during execution. We explore rich manipulation domains within a shared autonomy paradigm. Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot: language is an input to a learned model that produces a meaningful, low-dimensional control space that the human can use to guide the robot. Each real-time correction refines the human's control space, enabling precise, extended behaviors - with the added benefit of requiring only a handful of demonstrations to learn. We evaluate our approach via a user study where users work with a Franka Emika Panda manipulator to complete complex manipulation tasks. Compared to existing learned baselines covering both open-loop instruction following and single-turn shared autonomy, we show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users because of its reliability, precision, and ease of use.

PDF Abstract

Online Language Corrections for Robotic Manipulation via Shared Autonomy

The paper "No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy" presents a framework known as Language-Informed Latent Actions with Corrections (LILAC). This framework addresses the critical challenges in the field of human-robot interaction, specifically focusing on the adaptivity and learning efficiency of language-guided robotic manipulation systems. The authors critique existing instruction-following agents for their inability to incorporate online natural language supervision efficiently, often requiring extensive amounts of demonstrations to learn rudimentary policies.

Key Contributions:

Shared Autonomy and Online Corrections: LILAC extends shared autonomy principles, allowing humans to impart real-time language corrections such as "to the right" or "no, towards the book" during task execution. This approach fosters increased adaptability, enabling robots to refine their control strategies dynamically.
Sample Efficiency: The paper places particular emphasis on the minimal data requirements of LILAC, spotlighting its potential for learning complex manipulation tasks from merely a handful of demonstrations (10-20), a marked improvement over typical autonomous learning strategies that demand thousands of examples.
Empirical Evaluation: Through a comprehensive user paper utilizing a Franka Emika Panda manipulator, the authors illustrate the superior task completion rates achieved with LILAC compared to traditional methods that incorporate either open-loop instruction following or single-turn shared autonomy. Users preferred LILAC due to its enhanced reliability, precision, and ease of use.

Technical Approach:

Latent Action Space: LILAC constructs a meaningful, low-dimensional control space informed by language inputs and refined by corrections. The actual implementation uses learned basis vectors in robot action space orthonormalized using a modified Gram-Schmidt process, ensuring the user's control interface remains intuitive and effective.
Gating Mechanism with GPT-3: A pivotal component of the architecture is the language-derived gating mechanism. Employing GPT-3 for gating allows the model to discern varying levels of state dependence for task instructions versus corrections, thus optimizing the fusion of language and state information during task execution.

Implications and Future Directions:

The implications of LILAC extend beyond practical robotic manipulation, offering insights into future advancements in AI systems where human-like adaptability is prioritized. The introduced framework could inspire the development of systems that leverage minimal data for robust performance, dynamically incorporating human feedback in real-time, enhancing both user experience and system efficiency.

Challenges and Areas for Improvement:

The framework as it stands focuses primarily on directional and referential corrections, but more complex language phenomena such as anaphora and context-specific instructions remain difficult to handle. Future work could expand the capabilities of the system to accommodate these nuances, ensuring robustness across diverse linguistic inputs.

In conclusion, the advancements brought forward by LILAC in the domain of human-robot interaction open avenues for scalable, adaptive, and efficient robotic systems capable of collaborating with humans in complex environments. By addressing adaptivity through shared autonomy and leveraging minimal data for learning, the paper sets a precedent for the evolution of language-guided robotics and interactive AI technologies.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yuchen Cui (19 papers)
Siddharth Karamcheti (26 papers)
Raj Palleti (2 papers)
Nidhya Shivakumar (1 paper)
Percy Liang (239 papers)
Dorsa Sadigh (162 papers)

Citations (65)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/mattwichrowski/status/1750895041915105491

YouTube

Show All Videos