Generative Expressive Robot Behaviors using Large Language Models (2401.14673v2)

Published 26 Jan 2024 in cs.RO

Abstract: People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from LLMs and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.

References (49)

Citations (18)

View on Semantic Scholar

Summary

The paper presents GenEM’s main contribution of using LLMs to translate language instructions into socially expressive robot motions.
The methodology employs chain-of-thought prompting and modular LLM components to iteratively refine behaviors with user feedback.
Empirical evaluations show that GenEM can outperform traditional animator-crafted motions across different robot platforms.

Introduction

In the field of human-robot interaction, the adeptness of robots at displaying expressive behaviors akin to humans holds pivotal implications for effective communication. Stemming from this notion, the paper proposes an innovative methodology utilizing LLMs to engender expressive robot motions, characterized by adaptability and compositionality. This approach manifests in a system named Generative Expressive Motion (GenEM), which leverages LLMs' vast contextual comprehension to generate nuanced and interpretive expressions on robotic platforms through few-shot prompting and social reasoning.

Prior Work

The discussion of related work reveals that while rule-based and template approaches offer structure to behavior generation, they are often hamstrung by limits to expressivity and adaptability across modalities and human preferences. Conversely, data-driven techniques promise flexibility and adaptive potential but wrestle with efficiency, being tethered to expansive, specialized datasets for each behavioral context. Rooted in these insights, GenEM seeks to eschew such constraints by employing the extensive social context embedded within LLMs, combined with the latter's proficiency in generating motion in response to instructions.

Generative Expressive Motion

GenEM's formulation is a multifaceted process. Starting with language instructions encoding the desired behavior or social context, the system employs a chain-of-thought prompting to elucidate human-like expressive motions. Following this, it crafts robot-specific expressive motions using the robot's capabilities within its API landscape. The incremental transformation—from language instructions to robot motion—hinges on successive prompts that marshal various LLM modules. Each module is calibrated for distinct aspects of behavior translation, stratifying the generation process into manageable, logically consecutive stages. One of the system's salient features is its ability to incorporate and adapt to iterative human feedback, continuously refining the expressiveness of robotic behavior.

Empirical Evaluation

The authors executed user studies juxtaposing behaviors crafted by GenEM against those produced by professional animators. Results indicate that GenEM-derived behaviors, especially when enriched with user feedback, received positive user reception, and in certain instances, were favored over the animator-crafted behaviors. Subsequent experiments on a mobile robot and a simulated quadrupedal robot across different tasks corroborated GenEM's efficacy, showcasing that it can yield behaviors that both align with social norms and are modifiable based on user feedback. Further, the experiments articulated GenEM's versatility across robot embodiments, where the skill sets were composable and could be adapted to multiple robot platforms, thereby illustrating GenEM's potential for scalability and broad applicability.

Conclusion

The narrative of the paper culminates by accentuating GenEM's innovative use of LLMs to expedite the creation of social and adaptable robot behaviors. However, it sagely outlines the limitations, including the need for physical interaction studies and a mechanism for learning and adapting to individual user preferences over time. Suggesting pathways for future work, the paper deliberates on the potential of extending behavior generation to multi-turn interactions and a broader action space. The GenEM system stands as an exemplar for how the fusion of robotics and advancements in AI, specifically LLMs, can dramatically shift paradigms in human-robot interaction, potentially heralding a new age in the natural integration of robots within societal fabric.

PDF Markdown

Related Papers

GitHub

Generative Expressive Robot Behaviors using Large Language Models

Tweets

https://twitter.com/_akhaliq/status/1751804949267148988

https://twitter.com/karthikm0/status/1768328572156969087

https://twitter.com/kashifcreations/status/1751877773134815731

https://twitter.com/fly51fly/status/1751919097162735644