Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Driving Animatronic Robot Facial Expression From Speech (2403.12670v3)

Published 19 Mar 2024 in cs.RO and cs.CV

Abstract: Animatronic robots hold the promise of enabling natural human-robot interaction through lifelike facial expressions. However, generating realistic, speech-synchronized robot expressions poses significant challenges due to the complexities of facial biomechanics and the need for responsive motion synthesis. This paper introduces a novel, skinning-centric approach to drive animatronic robot facial expressions from speech input. At its core, the proposed approach employs linear blend skinning (LBS) as a unifying representation, guiding innovations in both embodiment design and motion synthesis. LBS informs the actuation topology, facilitates human expression retargeting, and enables efficient speech-driven facial motion generation. This approach demonstrates the capability to produce highly realistic facial expressions on an animatronic face in real-time at over 4000 fps on a single Nvidia RTX 4090, significantly advancing robots' ability to replicate nuanced human expressions for natural interaction. To foster further research and development in this field, the code has been made publicly available at: \url{https://github.com/library87/OpenRoboExp}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. T. Fong, I. Nourbakhsh, and K. Dautenhahn, “A survey of socially interactive robots,” Robot. Auton. Syst., vol. 42, pp. 143–166, 2003.
  2. C. Breazeal, K. Dautenhahn, and T. Kanda, “Social robotics,” Springer handbook of robotics, pp. 1935–1972, 2016.
  3. S. Saunderson and G. Nejat, “How robots influence humans: A survey of nonverbal communication in social human–robot interaction,” Int. J. Soc. Robot., vol. 11, pp. 575–608, 2019.
  4. N. Lazzeri, D. Mazzei, M. Ben Moussa, N. Magnenat-Thalmann, and D. De Rossi, “The influence of dynamics and speech on understanding humanoid facial expressions,” IJARS, vol. 15, no. 4, 2018.
  5. J. D. Lomas, A. Lin, S. Dikker, D. Forster, M. L. Lupetti, G. Huisman, J. Habekost, C. Beardow, P. Pandey, N. Ahmad, et al., “Resonance as a design strategy for ai and social robots,” Frontiers in neurorobotics, vol. 16, p. 850489, 2022.
  6. J. Złotowski, D. Proudfoot, K. Yogeeswaran, and C. Bartneck, “Anthropomorphism: opportunities and challenges in human–robot interaction,” Int. J. Soc. Robot., vol. 7, pp. 347–360, 2015.
  7. K. Berns and J. Hirth, “Control of facial expressions of the humanoid robot head roman,” in IROS, pp. 3119–3124, IEEE, 2006.
  8. J.-H. Oh, D. Hanson, W.-S. Kim, Y. Han, J.-Y. Kim, and I.-W. Park, “Design of android type humanoid robot albert hubo,” in IROS, pp. 1428–1433, IEEE, 2006.
  9. T. Hashimoto, S. Hitramatsu, T. Tsuji, and H. Kobayashi, “Development of the face robot saya for rich facial expressions,” in SICE-ICASE Int. Joint Conf., pp. 5423–5428, IEEE, 2006.
  10. D. Mazzei, N. Lazzeri, D. Hanson, and D. De Rossi, “Hefes: An hybrid engine for facial expressions synthesis to control human-like androids and avatars,” in BioRob, pp. 195–200, IEEE, 2012.
  11. C.-Y. Lin, C.-C. Huang, and L.-C. Cheng, “An expressional simplified mechanism in anthropomorphic face robot design,” Robotica, vol. 34, no. 3, pp. 652–670, 2016.
  12. W. T. Asheber, C.-Y. Lin, and S. H. Yen, “Humanoid head face mechanism with expandable facial expressions,” IJARS, vol. 13, no. 1, p. 29, 2016.
  13. Z. Faraj, M. Selamet, C. Morales, P. Torres, M. Hossain, B. Chen, and H. Lipson, “Facially expressive humanoid robotic face,” HardwareX, vol. 9, p. e00117, 2021.
  14. Z. Yan, Y. Song, R. Zhou, L. Wang, Z. Wang, and Z. Dai, “Facial expression realization of humanoid robot head and strain-based anthropomorphic evaluation of robot facial expressions,” Biomimetics, vol. 9, no. 3, p. 122, 2024.
  15. F. Ren and Z. Huang, “Automatic facial expression learning method based on humanoid robot xin-ren,” IEEE Trans. Hum.-Mach. Syst., vol. 46, no. 6, pp. 810–821, 2016.
  16. H.-J. Hyung, H. U. Yoon, D. Choi, D.-Y. Lee, and D.-W. Lee, “Optimizing android facial expressions using genetic algorithms,” Applied Sciences, vol. 9, no. 16, p. 3379, 2019.
  17. B. Chen, Y. Hu, L. Li, S. Cummings, and H. Lipson, “Smile like you mean it: Driving animatronic robotic face with learned models,” in ICRA, pp. 2739–2746, IEEE, 2021.
  18. D. Yang, W. Sato, Q. Liu, T. Minato, S. Namba, and S. Nishida, “Optimizing facial expressions of an android robot effectively: a bayesian optimization approach,” in Humanoids, pp. 542–549, 2022.
  19. B. Tang, R. Cao, R. Chen, X. Chen, B. Hua, and F. Wu, “Automatic generation of robot facial expressions with preferences,” in ICRA, pp. 7606–7613, IEEE, 2023.
  20. U. Zarins, Anatomy of Facial Expression. Exonicus, 2017.
  21. Y. Zhou, X. Han, E. Shechtman, J. Echevarria, E. Kalogerakis, and D. Li, “Makelttalk: speaker-aware talking-head animation,” TOG, vol. 39, no. 6, pp. 1–15, 2020.
  22. B. Liang, Y. Pan, Z. Guo, H. Zhou, Z. Hong, X. Han, J. Han, J. Liu, E. Ding, and J. Wang, “Expressive talking head generation with granular audio-visual control,” in CVPR, pp. 3387–3396, 2022.
  23. W. Zhang, X. Cun, X. Wang, Y. Zhang, X. Shen, Y. Guo, Y. Shan, and F. Wang, “Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation,” in CVPR, pp. 8652–8661, 2023.
  24. J. Wang, K. Zhao, Y. Ma, S. Zhang, Y. Zhang, Y. Shen, D. Zhao, and J. Zhou, “Facecomposer: A unified model for versatile facial content creation,” NIPS, vol. 36, 2024.
  25. T. Karras, T. Aila, S. Laine, A. Herva, and J. Lehtinen, “Audio-driven facial animation by joint end-to-end learning of pose and emotion,” TOG, vol. 36, no. 4, pp. 1–12, 2017.
  26. D. Cudeiro, T. Bolkart, C. Laidlaw, A. Ranjan, and M. J. Black, “Capture, learning, and synthesis of 3d speaking styles,” in CVPR, pp. 10101–10111, 2019.
  27. A. Richard, M. Zollhöfer, Y. Wen, F. De la Torre, and Y. Sheikh, “Meshtalk: 3d face animation from speech using cross-modality disentanglement,” in ICCV, pp. 1173–1182, 2021.
  28. Y. Fan, Z. Lin, J. Saito, W. Wang, and T. Komura, “Faceformer: Speech-driven 3d facial animation with transformers,” in CVPR, pp. 18770–18780, 2022.
  29. R. Daněček, K. Chhatre, S. Tripathi, Y. Wen, M. Black, and T. Bolkart, “Emotional speech-driven animation with content-emotion disentanglement,” in SIGGRAPH Asia, pp. 1–13, 2023.
  30. J. P. Lewis, K. Anjyo, T. Rhee, M. Zhang, F. H. Pighin, and Z. Deng, “Practice and theory of blendshape facial models.,” Eurographics, vol. 1, no. 8, p. 2, 2014.
  31. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” vol. 33, pp. 12449–12460, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Boren Li (6 papers)
  2. Hang Li (277 papers)
  3. Hangxin Liu (32 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com