Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning (2410.23234v1)

Published 30 Oct 2024 in cs.RO and cs.AI

Abstract: This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of human non-verbal communication. To address this gap, our approach leverages the in-context learning capability of LLMs to dynamically generate socially appropriate gesture motion sequences for human-robot interaction. We use this framework to generate 10 different expressive gestures and conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++, against those by human operators. The results demonstrate that our approach either matches or surpasses human performance in generating understandable and natural robot motions under certain scenarios. We also provide design implications for future research to consider a set of variables when generating expressive robotic gestures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. J. Urakami and K. Seaborn, “Nonverbal cues in human-robot interaction: A communication studies perspective,” ACM Trans. Hum. Robot Interact., Dec. 2022.
  2. M. Salem and K. Dautenhahn, “23 social signal processing in social robotics,” Social signal processing, p. 317, 2017.
  3. S. Saunderson and G. Nejat, “How robots influence humans: A survey of nonverbal communication in social human–robot interaction,” International Journal of Social Robotics, vol. 11, no. 4, pp. 575–608, 2019.
  4. U. Zabala, I. Rodriguez, J. M. Martínez-Otzeta, and E. Lazkano, “Expressing robot personality through talking body language,” Appl. Sci. (Basel), vol. 11, no. 10, p. 4639, May 2021.
  5. L. Takayama, D. Dooley, and W. Ju, “Expressing thought: improving robot readability with animation principles,” in Proceedings of the 6th international conference on Human-robot interaction.   New York, NY, USA: ACM, Mar. 2011.
  6. S. Gross, B. Krenn, and M. Scheutz, “The reliability of non-verbal cues for situated reference resolution and their interplay with language: implications for human robot interaction,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction.   New York, NY, USA: ACM, Nov. 2017.
  7. J. Gray, G. Hoffman, S. O. Adalgeirsson, M. Berlin, and C. Breazeal, “Expressive, interactive robots: Tools, techniques, and insights based on collaborations,” in HRI 2010 Workshop: What do collaborations with the arts have to say about HRI, 2010, pp. 21–28.
  8. T. B. Brown, “Language models are few-shot learners,” arXiv preprint arXiv:2005.14165, 2020.
  9. M. Salem, K. Rohlfing, S. Kopp, and F. Joublin, “A friendly gesture: Investigating the effect of multimodal robot behavior in human-robot interaction,” in 2011 ro-man.   IEEE, 2011, pp. 247–252.
  10. J. Rios-Martinez, A. Spalanzani, and C. Laugier, “From proxemics theory to socially-aware navigation: A survey,” International Journal of Social Robotics, vol. 7, pp. 137–153, 2015.
  11. H. Admoni and B. Scassellati, “Social eye gaze in human-robot interaction: a review,” Journal of Human-Robot Interaction, vol. 6, no. 1, pp. 25–63, 2017.
  12. C. L. Bethel and R. R. Murphy, “Survey of non-facial/non-verbal affective expressions for appearance-constrained robots,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 1, pp. 83–92, 2007.
  13. M. Suguitan, R. Gomez, and G. Hoffman, “MoveAE: Modifying affective robot movements using classifying variational autoencoders,” in Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction.   New York, NY, USA: ACM, Mar. 2020.
  14. I. Rodriguez, J. M. Martínez-Otzeta, I. Irigoien, and E. Lazkano, “Spontaneous talking gestures using generative adversarial networks,” Robot. Auton. Syst., vol. 114, no. C, p. 57–65, Apr. 2019. [Online]. Available: https://doi.org/10.1016/j.robot.2018.11.024
  15. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Proceedings of the 36th International Conference on Neural Information Processing Systems, ser. NIPS ’22, no. Article 1800.   Red Hook, NY, USA: Curran Associates Inc., Apr. 2024, pp. 24 824–24 837.
  16. J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 9493–9500.
  17. C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, “Large language models for human-robot interaction: A review,” Biomimetic Intelligence and Robotics, p. 100131, 2023.
  18. C. Y. Kim, C. P. Lee, and B. Mutlu, “Understanding large-language model (llm)-powered human-robot interaction,” in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 371–380.
  19. Z. Wang, P. Reisert, E. Nichols, and R. Gomez, “Ain’t misbehavin’-using llms to generate expressive robot behavior in conversations with the tabletop robot haru,” in Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 1105–1109.
  20. C. Gkournelos, C. Konstantinou, and S. Makris, “An llm-based approach for enabling seamless human-robot collaboration in assembly,” CIRP Annals, 2024.
  21. Y.-J. Wang, B. Zhang, J. Chen, and K. Sreenath, “Prompt a robot to walk with large language models,” Conference on Decision and Control (CDC), Dec. 2024.
  22. K. Mahadevan, J. Chien, N. Brown, Z. Xu, C. Parada, F. Xia, A. Zeng, L. Takayama, and D. Sadigh, “Generative expressive robot behaviors using large language models,” in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 482–491.
  23. S. Mirchandani, F. Xia, P. Florence, brian ichter, D. Driess, M. G. Arenas, K. Rao, D. Sadigh, and A. Zeng, “Large language models as general pattern machines,” in 7th Annual Conference on Robot Learning, 2023. [Online]. Available: https://openreview.net/forum?id=RcZMI8MSyE
  24. N. Di Palo and E. Johns, “Keypoint action tokens enable in-context imitation learning in robotics,” Robotics: Science and Systems (RSS) 2024, 2024.
  25. J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  26. “Gr-1,” https://www.fftai.com/products-gr1, fOURIER.
  27. P. Ekman and W. V. Friesen, “The repertoire of nonverbal behavior: Categories, origins, usage, and coding,” semiotica, vol. 1, no. 1, pp. 49–98, 1969.
  28. S. Gallagher, “Empathy and theories of direct perception,” in The Routledge handbook of philosophy of empathy.   Routledge, 2017, pp. 158–168.
  29. M. E. Kiger and L. Varpio, “Thematic analysis of qualitative data: Amee guide no. 131,” Medical teacher, vol. 42, no. 8, pp. 846–854, 2020.

Summary

  • The paper introduces EMOTION, a novel framework leveraging in-context learning of LLMs and VLMs to generate expressive, contextually appropriate motion sequences for humanoid robots.
  • User studies demonstrate that EMOTION performs comparably to human control in understandability and naturalness, with the enhanced EMOTION++ version exceeding human baselines in specific scenarios.
  • Integrating iterative human feedback mechanisms in EMOTION++ proves vital for significantly improving the adaptability, precision, and perceived naturalness of generated robotic gestures.

Expressive Motion Sequence Generation for Humanoid Robots via In-Context Learning

This paper presents a novel framework, EMOTION, engineered to generate expressive motion sequences for humanoid robots, leveraging the in-context learning capacity of LLMs. The primary goal is to enrich humanoid robots with human-like non-verbal communication skills to facilitate more intuitive human-robot interactions.

Conceptual Framework

EMOTION employs LLMs to dynamically generate gesture motion sequences specially tailored for various human-robot interaction scenarios. It operates by interpreting social contexts through visual and linguistic cues and subsequently generating contextually appropriate motion sequences. Key to its functionality is the use of LLMs and vision-LLMs (VLMs) to input social cues—be it images or textual prompts—and dynamically produce expressive robot gestures. The architecture also incorporates human feedback mechanisms in its advanced version, EMOTION++, which refines the generated sequences to align better with human-perceived naturalness and understandability.

Methodological Approaches

The EMOTION framework capitalizes on the intrinsic learning capabilities of LLMs to simulate human-like expressiveness through robots. Consisting principally of two stages, the framework first explores social context analysis using VLMs to extract actionable insights from visual and textual data. The second phase involves the actual generation of motion sequences using LLMs, guided by minimal human demonstration data for calibration.

The architecture is designed with flexibility to allow iterative improvements through human feedback, admitting both high-level and explicit command adjustments to further enhance the motion sequences. This adaptability is critical, considering the physical and mechanical complexity involved in expressing nuanced gestures through humanoid robots.

Experimental Results and User Studies

The efficacy of EMOTION and the influence of human feedback incorporated in EMOTION++ were evaluated in comprehensive user studies. The findings highlight that EMOTION manages to perform comparably to human-controlled sequences in terms of understandability and naturalness. Significantly, EMOTION++ not only matches the human baseline but also surpasses it in certain contexts, demonstrating enhanced engagements when producing nuanced gestures. The results emphasize a notable variance in perception across different types of gestures; this variance underscores the necessity of contextual and gestural specificity in designing expressive robot behaviors.

Implications and Future Directions

The research elucidates vital design considerations for future robotic systems aiming to seamlessly integrate into social contexts. Key insights include:

  • Hand Position and Movement Patterns: Ensuring that hand positioning and movement trajectories are intuitive and reflective of human expressions significantly impacts perceived naturalness.
  • Finger Articulation and Gesture Speed: Attention to finger movements and the pacing of gestures can profoundly influence the clarity and natural perception of expressed emotions.
  • User-Centric Feedback Mechanisms: The integration of iterative human feedback substantially enhances the adaptability and precision of generated motion sequences, indicating a path forward for personalization in robotic interactions.

Future development directions proposed include refining model efficiency to expedite real-time interactions and expanding the contextual understanding capabilities of LLMs. The exploration of human feedback mechanisms points to a potential research trajectory aiming to minimize biases and ensure personalized human-robot communication.

Overall, the paper bridges a critical gap in humanoid robot interaction design by transcending pre-defined motion sequences and venturing into more fluid and contextually relevant gesture generation, potentially altering the landscape of social robotics.

Youtube Logo Streamline Icon: https://streamlinehq.com