The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents (2309.15311v2)
Abstract: Previous studies regarding the perception of emotions for embodied virtual agents have shown the effectiveness of using virtual characters in conveying emotions through interactions with humans. However, creating an autonomous embodied conversational agent with expressive behaviors presents two major challenges. The first challenge is the difficulty of synthesizing the conversational behaviors for each modality that are as expressive as real human behaviors. The second challenge is that the affects are modeled independently, which makes it difficult to generate multimodal responses with consistent emotions across all modalities. In this work, we propose a conceptual framework, ACTOR (Affect-Consistent mulTimodal behaviOR generation), that aims to increase the perception of affects by generating multimodal behaviors conditioned on a consistent driving affect. We have conducted a user study with 199 participants to assess how the average person judges the affects perceived from multimodal behaviors that are consistent and inconsistent with respect to a driving affect. The result shows that among all model conditions, our affect-consistent framework receives the highest Likert scores for the perception of driving affects. Our statistical analysis suggests that making a modality affect-inconsistent significantly decreases the perception of driving affects. We also observe that multimodal behaviors conditioned on consistent affects are more expressive compared to behaviors with inconsistent affects. Therefore, we conclude that multimodal emotion conditioning and affect consistency are vital to enhancing the perception of affects for embodied conversational agents.
- Hervé Abdi and Lynne J Williams. 2010. Tukey’s honestly significant difference (HSD) test. Encyclopedia of research design 3, 1 (2010), 1–5.
- Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 487–496.
- Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 6111 (2012), 1225–1229.
- Farnaz Badiee and David Kaufman. 2015. Design evaluation of a simulation for teacher education. Sage Open 5, 2 (2015), 2158244015592454.
- Kirsten Bergmann and Stefan Kopp. 2009. Increasing the expressiveness of virtual agents: autonomous generation of speech and gesture for spatial description tasks. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. 361–368.
- Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21). Association for Computing Machinery, New York, NY, USA.
- Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents** This work has been supported in part by ARO Grants W911NF1910069 and W911NF1910315, and Intel. Code and additional materials available at: https://gamma. umd. edu/t2g. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR). IEEE, 1–10.
- Intrapersonal dependencies in multimodal behavior. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8.
- Enhancing conversational agents with empathic abilities. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 41–47.
- Personality analysis of embodied conversational agents. In Proceedings of the 18th International Conference on Intelligent Virtual Agents. 227–232.
- Che-Jui Chang. 2020. Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion. https://doi.org/10.48550/ARXIV.2009.14668
- The IVI Lab entry to the GENEA Challenge 2022–A Tacotron2 based method for co-speech gesture generation with locality-constraint attention mechanism. In Proceedings of the 2022 International Conference on Multimodal Interaction. 784–789.
- Disentangling audio content and emotion with adaptive instance normalization for expressive facial animation synthesis. Computer Animation and Virtual Worlds 33, 3-4 (2022), e2076. https://doi.org/10.1002/cav.2076 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cav.2076
- Dynamic Face Movement Texture Enhances the Perceived Realism of Facial Expressions of Emotion. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–3.
- Combining facial and postural expressions of emotions in a virtual character. In International Workshop on Intelligent Virtual Agents. Springer, 287–300.
- Affect-Driven Dialog Generation. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2019-01-01). 3734–3743. https://aclweb.org/anthology/papers/N/N19/N19-1374/
- Aimer: Appraisal interpersonal model of emotion regulation, affective virtual students to support teachers training. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 182–184.
- Steve DiPaola and Özge Nilay Yalçin. 2019. A multi-layer artificial intelligence and sensing based affective conversational embodied agent. In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, 91–92.
- Perform: Perceptual approach for adding ocean personality to human motion using laban movement analysis. ACM Transactions on Graphics (TOG) 36, 1 (2016), 1–16.
- Paul Ekman. 1999. Basic emotions. Handbook of cognition and emotion 98, 45-60 (1999), 16.
- Unreal Engine. 2021. MetaHuman Creator.
- Emotion capture: Emotionally expressive characters for games. In Proceedings of motion on games. 53–60.
- Pica: Proactive intelligent conversational agent for interactive narratives. In Proceedings of the 18th International Conference on Intelligent Virtual Agents. 141–146.
- Understanding the predictability of gesture parameters from speech and their perceptual importance. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8.
- ExpressGesture: Expressive gesture generation from speech through database matching. Computer Animation and Virtual Worlds 32, 3-4 (2021), e2016.
- Human or Robot? Investigating voice, appearance and gesture motion realism of conversational social agents. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 76–83.
- Learning individual styles of conversational gesture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3497–3506.
- Learning speech-driven 3d conversational gestures from video. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 101–108.
- An end-to-end conversational style matching agent. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 111–118.
- IBM. 2015. IBM Text to Speech. https://www.ibm.com/watson. Accessed: 2022-03-05.
- Impact of personality on nonverbal behavior generation. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8.
- Domain authoring assistant for intelligent virtual agents. arXiv preprint arXiv:1904.03266 (2019).
- Data analysis: A model comparison approach to regression, ANOVA, and beyond. Routledge.
- Evaluating the authoring complexity of interactive narratives with interactive behaviour trees. Foundations of Digital Games (2015).
- Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1–12.
- Towards a common framework for multimodal generation: The behavior markup language. In International workshop on intelligent virtual agents. Springer, 205–217.
- Jina Lee and Stacy Marsella. 2006. Nonverbal behavior generator for embodied conversational agents. In International Workshop on Intelligent Virtual Agents. Springer, 243–255.
- Controllable emotion transfer for end-to-end speech synthesis. In 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 1–5.
- Building a generative space of facial expressions of emotions using psychological data-driven methods. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–3.
- Speech-based Gesture Generation for Robots and Embodied Agents: A Scoping Review. In Proceedings of the 9th International Conference on Human-Agent Interaction. 31–38.
- Behavior matching in multimodal communication is synchronized. Cognitive science 36, 8 (2012), 1404–1426.
- Robert R McCrae and Oliver P John. 1992. An introduction to the five-factor model and its applications. Journal of personality 60, 2 (1992), 175–215.
- Evaluating the emotional content of human motions on real and virtual characters. In Proceedings of the 5th symposium on Applied perception in graphics and visualization. 67–74.
- David McNeill. 1992. Hand and Mind: What Gestures Reveal About Thought. (1992).
- A framework for integrating gesture generation models into interactive conversational agents. arXiv preprint arXiv:2102.12302 (2021).
- Nvidia. 2021. Omniverse Audio2Face.
- Virtual Intimacy, this little something between us: a study about Human perception of intimate behaviors in Embodied Conversational Agents. In Proceedings of the 18th international conference on intelligent virtual agents. 165–172.
- Qualtrics. 2021. Qualtrics. Qualtrics, Provo, Utah, USA. http://www.qualtrics.com
- Eva: Generating emotional behavior of virtual agents using expressive features of gait and gaze. In ACM symposium on applied perception 2019. 1–10.
- Automating the production of communicative gestures in embodied characters. Frontiers in psychology 9 (2018), 1144.
- James A Russell. 1980. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161.
- A personality-based emotional model for embodied conversational agents: Effects on perceived social presence and game experience of users. Entertainment Computing 32 (2019), 100313.
- Marc Schröder. 2001. Emotional speech synthesis: A review. In Seventh European Conference on Speech Communication and Technology. Citeseer.
- Adapt: the agent developmentand prototyping testbed. IEEE Transactions on Visualization and Computer Graphics 20, 7 (2013), 1035–1047.
- An interdependent model of personality, motivation, emotion, and mood for intelligent virtual agents. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 65–72.
- An emotionally aware embodied conversational agent. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2250–2252.
- A conversational agent framework with multi-modal personality expression. ACM Transactions on Graphics (TOG) 40, 1 (2021), 1–16.
- Gesture and speech in interaction: An overview. , 209–232 pages.
- Bernard L Welch. 1947. The generalization of ‘STUDENT’S’problem when several different population varlances are involved. Biometrika 34, 1-2 (1947), 28–35.
- A review of evaluation practices of gesture generation in embodied conversational agents. IEEE Transactions on Human-Machine Systems (2022).
- Özge Nilay Yalçın. 2020. Empathy framework for embodied conversational agents. Cognitive Systems Research 59 (2020), 123–132.
- Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–16.
- Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots. In Proc. of The International Conference in Robotics and Automation (ICRA).
- Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory. In AAAI.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.