Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal aggregation of audio-visual modalities for emotion recognition (2007.04364v1)

Published 8 Jul 2020 in cs.CV and cs.AI

Abstract: Emotion recognition has a pivotal role in affective computing and in human-computer interaction. The current technological developments lead to increased possibilities of collecting data about the emotional state of a person. In general, human perception regarding the emotion transmitted by a subject is based on vocal and visual information collected in the first seconds of interaction with the subject. As a consequence, the integration of verbal (i.e., speech) and non-verbal (i.e., image) information seems to be the preferred choice in most of the current approaches towards emotion recognition. In this paper, we propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality. We show that our proposed method outperforms other methods from the literature and human accuracy rating. The experiments are conducted over the open-access multimodal dataset CREMA-D.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Andreea Birhala (2 papers)
  2. Catalin Nicolae Ristea (1 paper)
  3. Anamaria Radoi (2 papers)
  4. Liviu Cristian Dutu (2 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.