EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning (2410.23234v1)
Abstract: This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of human non-verbal communication. To address this gap, our approach leverages the in-context learning capability of LLMs to dynamically generate socially appropriate gesture motion sequences for human-robot interaction. We use this framework to generate 10 different expressive gestures and conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++, against those by human operators. The results demonstrate that our approach either matches or surpasses human performance in generating understandable and natural robot motions under certain scenarios. We also provide design implications for future research to consider a set of variables when generating expressive robotic gestures.
- J. Urakami and K. Seaborn, “Nonverbal cues in human-robot interaction: A communication studies perspective,” ACM Trans. Hum. Robot Interact., Dec. 2022.
- M. Salem and K. Dautenhahn, “23 social signal processing in social robotics,” Social signal processing, p. 317, 2017.
- S. Saunderson and G. Nejat, “How robots influence humans: A survey of nonverbal communication in social human–robot interaction,” International Journal of Social Robotics, vol. 11, no. 4, pp. 575–608, 2019.
- U. Zabala, I. Rodriguez, J. M. Martínez-Otzeta, and E. Lazkano, “Expressing robot personality through talking body language,” Appl. Sci. (Basel), vol. 11, no. 10, p. 4639, May 2021.
- L. Takayama, D. Dooley, and W. Ju, “Expressing thought: improving robot readability with animation principles,” in Proceedings of the 6th international conference on Human-robot interaction. New York, NY, USA: ACM, Mar. 2011.
- S. Gross, B. Krenn, and M. Scheutz, “The reliability of non-verbal cues for situated reference resolution and their interplay with language: implications for human robot interaction,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction. New York, NY, USA: ACM, Nov. 2017.
- J. Gray, G. Hoffman, S. O. Adalgeirsson, M. Berlin, and C. Breazeal, “Expressive, interactive robots: Tools, techniques, and insights based on collaborations,” in HRI 2010 Workshop: What do collaborations with the arts have to say about HRI, 2010, pp. 21–28.
- T. B. Brown, “Language models are few-shot learners,” arXiv preprint arXiv:2005.14165, 2020.
- M. Salem, K. Rohlfing, S. Kopp, and F. Joublin, “A friendly gesture: Investigating the effect of multimodal robot behavior in human-robot interaction,” in 2011 ro-man. IEEE, 2011, pp. 247–252.
- J. Rios-Martinez, A. Spalanzani, and C. Laugier, “From proxemics theory to socially-aware navigation: A survey,” International Journal of Social Robotics, vol. 7, pp. 137–153, 2015.
- H. Admoni and B. Scassellati, “Social eye gaze in human-robot interaction: a review,” Journal of Human-Robot Interaction, vol. 6, no. 1, pp. 25–63, 2017.
- C. L. Bethel and R. R. Murphy, “Survey of non-facial/non-verbal affective expressions for appearance-constrained robots,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 1, pp. 83–92, 2007.
- M. Suguitan, R. Gomez, and G. Hoffman, “MoveAE: Modifying affective robot movements using classifying variational autoencoders,” in Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. New York, NY, USA: ACM, Mar. 2020.
- I. Rodriguez, J. M. Martínez-Otzeta, I. Irigoien, and E. Lazkano, “Spontaneous talking gestures using generative adversarial networks,” Robot. Auton. Syst., vol. 114, no. C, p. 57–65, Apr. 2019. [Online]. Available: https://doi.org/10.1016/j.robot.2018.11.024
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Proceedings of the 36th International Conference on Neural Information Processing Systems, ser. NIPS ’22, no. Article 1800. Red Hook, NY, USA: Curran Associates Inc., Apr. 2024, pp. 24 824–24 837.
- J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 9493–9500.
- C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, “Large language models for human-robot interaction: A review,” Biomimetic Intelligence and Robotics, p. 100131, 2023.
- C. Y. Kim, C. P. Lee, and B. Mutlu, “Understanding large-language model (llm)-powered human-robot interaction,” in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 371–380.
- Z. Wang, P. Reisert, E. Nichols, and R. Gomez, “Ain’t misbehavin’-using llms to generate expressive robot behavior in conversations with the tabletop robot haru,” in Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 1105–1109.
- C. Gkournelos, C. Konstantinou, and S. Makris, “An llm-based approach for enabling seamless human-robot collaboration in assembly,” CIRP Annals, 2024.
- Y.-J. Wang, B. Zhang, J. Chen, and K. Sreenath, “Prompt a robot to walk with large language models,” Conference on Decision and Control (CDC), Dec. 2024.
- K. Mahadevan, J. Chien, N. Brown, Z. Xu, C. Parada, F. Xia, A. Zeng, L. Takayama, and D. Sadigh, “Generative expressive robot behaviors using large language models,” in Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, 2024, pp. 482–491.
- S. Mirchandani, F. Xia, P. Florence, brian ichter, D. Driess, M. G. Arenas, K. Rao, D. Sadigh, and A. Zeng, “Large language models as general pattern machines,” in 7th Annual Conference on Robot Learning, 2023. [Online]. Available: https://openreview.net/forum?id=RcZMI8MSyE
- N. Di Palo and E. Johns, “Keypoint action tokens enable in-context imitation learning in robotics,” Robotics: Science and Systems (RSS) 2024, 2024.
- J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- “Gr-1,” https://www.fftai.com/products-gr1, fOURIER.
- P. Ekman and W. V. Friesen, “The repertoire of nonverbal behavior: Categories, origins, usage, and coding,” semiotica, vol. 1, no. 1, pp. 49–98, 1969.
- S. Gallagher, “Empathy and theories of direct perception,” in The Routledge handbook of philosophy of empathy. Routledge, 2017, pp. 158–168.
- M. E. Kiger and L. Varpio, “Thematic analysis of qualitative data: Amee guide no. 131,” Medical teacher, vol. 42, no. 8, pp. 846–854, 2020.