Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Framewise approach in multimodal emotion recognition in OMG challenge (1805.01369v1)

Published 3 May 2018 in cs.AI, cs.CL, and cs.CV

Abstract: In this report we described our approach achieves $53\%$ of unweighted accuracy over $7$ emotions and $0.05$ and $0.09$ mean squared errors for arousal and valence in OMG emotion recognition challenge. Our results were obtained with ensemble of single modality models trained on voice and face data from video separately. We consider each stream as a sequence of frames. Next we estimated features from frames and handle it with recurrent neural network. As audio frame we mean short $0.4$ second spectrogram interval. For features estimation for face pictures we used own ResNet neural network pretrained on AffectNet database. Each short spectrogram was considered as a picture and processed by convolutional network too. As a base audio model we used ResNet pretrained in speaker recognition task. Predictions from both modalities were fused on decision level and improve single-channel approaches by a few percent

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Grigoriy Sterling (1 paper)
  2. Andrey Belyaev (5 papers)
  3. Maxim Ryabov (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.