Toward Zero-Shot User Intent Recognition in Shared Autonomy (2501.08389v1)

Published 14 Jan 2025 in cs.RO and cs.HC

Abstract: A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being able to assist the user. We propose and study a zero-shot, vision-only shared autonomy (VOSA) framework designed to allow robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks with unknown and dynamically changing object locations. To demonstrate the effectiveness of our VOSA framework, we instantiate a simple version of VOSA on a Kinova Gen3 manipulator and evaluate our system by conducting a user study on three tabletop manipulation tasks. The performance of VOSA matches that of an oracle baseline model that receives privileged knowledge of possible human intents while also requiring significantly less effort than unassisted teleoperation. In more realistic settings, where the set of possible human intents is fully or partially unknown, we demonstrate that VOSA requires less human effort and time than baseline approaches while being preferred by a majority of the participants. Our results demonstrate the efficacy and efficiency of using off-the-shelf vision algorithms to enable flexible and beneficial shared control of a robot manipulator. Code and videos available here: https://sites.google.com/view/zeroshot-sharedautonomy/home.

Summary

The paper introduces VOSA, a zero-shot, vision-only framework inferring user intent without pretraining for shared autonomy in dynamic environments.
User studies show VOSA matches an oracle baseline and significantly reduces human effort compared to teleoperation, preferred by users in dynamic settings.
This research contributes a novel, efficient, and adaptive shared autonomy framework applicable to real-world dynamic environments, such as assistive robotics.

Toward Zero-Shot User Intent Recognition in Shared Autonomy

This paper addresses a significant challenge in the field of shared autonomy: enabling high-degree-of-freedom (DoF) robotic systems to assist users without extensive pretraining or demonstrations. The authors propose a zero-shot, vision-only shared autonomy (VOSA) framework to infer user intent and facilitate task completion, specifically in scenarios with unknown or dynamically changing object locations.

The current shared autonomy paradigms often rely on either a predefined set of possible human intents or the need for significant human-robot interaction and demonstration to learn these intents. This approach can be impractical and limited in dynamic or unforeseen environments. To mitigate these issues, the VOSA framework leverages end-effector vision for zero-shot intent estimation and employs a blended control strategy to assist users in manipulation tasks.

The authors validate the efficacy of their method through empirical evaluation on a Kinova Gen3 manipulator and a user paper involving three tabletop manipulation tasks. The paper results indicate that VOSA not only matches the performance of an oracle baseline, which has privileged access to knowledge about possible human intents, but also requires significantly less human effort compared to traditional teleoperation.

Key outcomes of the paper include VOSA's competitive performance in realistic settings where possible human intents are unknown. Participants in the user paper preferred VOSA over baseline approaches in such scenarios, favoring its adaptation capabilities and reduced human effort requirement. The findings suggest that VOSA effectively uses off-the-shelf vision algorithms for enhanced shared control, potentially impacting practical applications in assistive robotics and beyond.

The paper also touches upon the broader implications of this research, highlighting potential practical and theoretical advancements in shared autonomy systems. VOSA's methodology represents a step toward scalable, adaptive robotics that can operate with minimal prior knowledge, paving the way for more intuitive human-robot collaborations. Future developments could incorporate more sophisticated vision algorithms and arbitration strategies to optimize user intent inference and robotic assistance.

This research contributes significantly to the discourse on shared autonomy by presenting a novel framework that promises efficiency and adaptability without extensive a priori knowledge or pretraining, making it pertinent for real-world applications in dynamic environments.