Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative Expressive Robot Behaviors using Large Language Models (2401.14673v2)

Published 26 Jan 2024 in cs.RO

Abstract: People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from LLMs and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In Conference on Robot Learning. PMLR, 287–318.
  2. Amir Aly and Adriana Tapus. 2013. A model for synthesizing a combined verbal and nonverbal behavior based on personality traits in human-robot interaction. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 325–332.
  3. Modeling of natural human-robot encounters. In 2008 ieee/rsj international conference on intelligent robots and systems. IEEE, 2623–2629.
  4. Design and evaluation of an end-user friendly tool for robot programming. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 185–191.
  5. Iterative design of a system for programming socially interactive service robots. In Social Robotics: 8th International Conference, ICSR 2016, Kansas City, MO, USA, November 1-3, 2016 Proceedings 8. Springer, 919–929.
  6. No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. 93–101.
  7. Interaction Templates: A Data-Driven Approach for Authoring Robot Programs. In PLATEAU: 12th Annual Workshop at theIntersection of PL and HCI.
  8. Geppetto: Enabling semantic design of expressive robot behaviors. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
  9. Modeling interaction structure for robot imitation learning of human social behavior. IEEE Transactions on Human-Machine Systems 49, 3 (2019), 219–231.
  10. A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022).
  11. Design of robot teaching assistants through multi-modal human-robot interactions. In Robotics in Education: Latest Results and Developments. Springer, 274–286.
  12. Guy Hoffman and Wendy Ju. 2014. Designing robots with movement in mind. Journal of Human-Robot Interaction 3, 1 (2014), 91–122.
  13. Chien-Ming Huang and Bilge Mutlu. 2012. Robot behavior toolkit: generating effective social behaviors for robots. In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. 25–32.
  14. Chien-Ming Huang and Bilge Mutlu. 2013. The repertoire of robot behavior: Enabling robots to achieve interaction goals through social behavior. Journal of Human-Robot Interaction 2, 2 (2013), 80–102.
  15. Chien-Ming Huang and Bilge Mutlu. 2014. Learning-based modeling of multimodal behaviors for humanlike robots. In Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction. 57–64.
  16. Inner Monologue: Embodied Reasoning through Planning with Language Models. In Conference on Robot Learning. PMLR, 1769–1782.
  17. Training socially engaging robots: modeling backchannel behaviors with batch reinforcement learning. IEEE Transactions on Affective Computing 13, 4 (2022), 1840–1853.
  18. May I help you? Design of human-like polite approaching behavior. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction. 35–42.
  19. Jessie: Synthesizing social robot behaviors for personalized neurorehabilitation and beyond. In Proceedings of the 2020 ACM/IEEE international conference on human-robot interaction. 121–130.
  20. Toward Grounded Social Reasoning. arXiv preprint arXiv:2306.08651 (2023).
  21. Expressing robot incapability. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction. 87–95.
  22. Reward Design with Language Models. In International Conference on Learning Representations (ICLR).
  23. Trigger-action programming for personalising humanoid robot behaviour. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.
  24. Animated cassie: A dynamic relatable robotic character. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3739–3746.
  25. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9493–9500.
  26. Text2Motion: From Natural Language Instructions to Feasible Plans. Auton. Robots 47, 8 (Nov 2023), 1345–1365. https://doi.org/10.1007/s10514-023-10131-7
  27. Data-driven HRI: Learning social behaviors by example from human–human interaction. IEEE Transactions on Robotics 32, 4 (2016), 988–1008.
  28. Generating robotic emotional body language with variational autoencoders. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 545–551.
  29. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837 (2022).
  30. Large Language Models as General Pattern Machines. In Proceedings of the 7th Conference on Robot Learning (CoRL).
  31. Learning backchanneling behaviors for a social robot via data augmentation from human-human conversations. In Conference on robot learning. PMLR, 513–525.
  32. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  33. Data-Driven Communicative Behaviour Generation: A Survey. ACM Transactions on Human-Robot Interaction (2023).
  34. Bodystorming human-robot interactions. In proceedings of the 32nd annual ACM symposium on user Interface software and technology. 479–491.
  35. Authoring and verifying human-robot interactions. In Proceedings of the 31st annual acm symposium on user interface software and technology. 75–86.
  36. Transforming robot programs based on social context. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–12.
  37. Sketching Robot Programs On the Fly. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. 584–593.
  38. Figaro: A tabletop authoring environment for human-robot interaction. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.
  39. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523–11530.
  40. Teaching robots to span the space of functional expressive motion. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 13406–13413.
  41. Affective robot movement generation using cyclegans. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 534–535.
  42. MoveAE: modifying affective robot movements using classifying variational autoencoders. In Proceedings of the 2020 ACM/IEEE international conference on human-robot interaction. 481–489.
  43. Expressing thought: improving robot readability with animation principles. In Proceedings of the 6th international conference on Human-robot interaction. 69–76.
  44. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
  45. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  46. Comparing human robot interaction scenarios using live and video based methods: towards a novel methodological approach. In 9th IEEE International Workshop on Advanced Motion Control, 2006. IEEE, 750–755.
  47. TidyBot: Personalized Robot Assistance with Large Language Models. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 3546–3553. https://doi.org/10.1109/IROS55552.2023.10341577
  48. Language to Rewards for Robotic Skill Synthesis. In Proceedings of the 7th Conference on Robot Learning (CoRL).
  49. Allan Zhou and Anca D Dragan. 2018. Cost functions for robot motion style. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3632–3639.
Citations (18)

Summary

  • The paper presents GenEM’s main contribution of using LLMs to translate language instructions into socially expressive robot motions.
  • The methodology employs chain-of-thought prompting and modular LLM components to iteratively refine behaviors with user feedback.
  • Empirical evaluations show that GenEM can outperform traditional animator-crafted motions across different robot platforms.

Introduction

In the field of human-robot interaction, the adeptness of robots at displaying expressive behaviors akin to humans holds pivotal implications for effective communication. Stemming from this notion, the paper proposes an innovative methodology utilizing LLMs to engender expressive robot motions, characterized by adaptability and compositionality. This approach manifests in a system named Generative Expressive Motion (GenEM), which leverages LLMs' vast contextual comprehension to generate nuanced and interpretive expressions on robotic platforms through few-shot prompting and social reasoning.

Prior Work

The discussion of related work reveals that while rule-based and template approaches offer structure to behavior generation, they are often hamstrung by limits to expressivity and adaptability across modalities and human preferences. Conversely, data-driven techniques promise flexibility and adaptive potential but wrestle with efficiency, being tethered to expansive, specialized datasets for each behavioral context. Rooted in these insights, GenEM seeks to eschew such constraints by employing the extensive social context embedded within LLMs, combined with the latter's proficiency in generating motion in response to instructions.

Generative Expressive Motion

GenEM's formulation is a multifaceted process. Starting with language instructions encoding the desired behavior or social context, the system employs a chain-of-thought prompting to elucidate human-like expressive motions. Following this, it crafts robot-specific expressive motions using the robot's capabilities within its API landscape. The incremental transformation—from language instructions to robot motion—hinges on successive prompts that marshal various LLM modules. Each module is calibrated for distinct aspects of behavior translation, stratifying the generation process into manageable, logically consecutive stages. One of the system's salient features is its ability to incorporate and adapt to iterative human feedback, continuously refining the expressiveness of robotic behavior.

Empirical Evaluation

The authors executed user studies juxtaposing behaviors crafted by GenEM against those produced by professional animators. Results indicate that GenEM-derived behaviors, especially when enriched with user feedback, received positive user reception, and in certain instances, were favored over the animator-crafted behaviors. Subsequent experiments on a mobile robot and a simulated quadrupedal robot across different tasks corroborated GenEM's efficacy, showcasing that it can yield behaviors that both align with social norms and are modifiable based on user feedback. Further, the experiments articulated GenEM's versatility across robot embodiments, where the skill sets were composable and could be adapted to multiple robot platforms, thereby illustrating GenEM's potential for scalability and broad applicability.

Conclusion

The narrative of the paper culminates by accentuating GenEM's innovative use of LLMs to expedite the creation of social and adaptable robot behaviors. However, it sagely outlines the limitations, including the need for physical interaction studies and a mechanism for learning and adapting to individual user preferences over time. Suggesting pathways for future work, the paper deliberates on the potential of extending behavior generation to multi-turn interactions and a broader action space. The GenEM system stands as an exemplar for how the fusion of robotics and advancements in AI, specifically LLMs, can dramatically shift paradigms in human-robot interaction, potentially heralding a new age in the natural integration of robots within societal fabric.

Github Logo Streamline Icon: https://streamlinehq.com