Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition (2109.07149v1)

Published 15 Sep 2021 in cs.MM, cs.AI, cs.LG, and cs.SD

Abstract: Automatic emotion recognition (AER) based on enriched multimodal inputs, including text, speech, and visual clues, is crucial in the development of emotionally intelligent machines. Although complex modality relationships have been proven effective for AER, they are still largely underexplored because previous works predominantly relied on various fusion mechanisms with simply concatenated features to learn multimodal representations for emotion classification. This paper proposes a novel hierarchical fusion graph convolutional network (HFGCN) model that learns more informative multimodal representations by considering the modality dependencies during the feature fusion procedure. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. We verified the interpretable capabilities of the proposed method by projecting the emotional states to a 2D valence-arousal (VA) subspace. Extensive experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets, IEMOCAP and MELD.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shuyun Tang (4 papers)
  2. Zhaojie Luo (5 papers)
  3. Guoshun Nan (33 papers)
  4. Yuichiro Yoshikawa (12 papers)
  5. Ishiguro Hiroshi (2 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.