SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild (1901.02839v2)

Published 9 Jan 2019 in cs.HC, cs.AI, and cs.CV

Abstract: Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal and (dis)liking intensity estimation.

Citations (182)

View on Semantic Scholar

Summary

The paper introduces a novel dataset that enhances emotion research using over 2000 minutes of natural audio-visual recordings from diverse participants.
It provides detailed annotations, including facial action units, vocal cues, and continuous emotion metrics like valence and arousal.
Baseline experiments using SVM, RF, LSTM-RNN, and DCNN validate SEWA DB, establishing new benchmarks for affective computing studies.

SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild

The paper delineates the creation and significance of the SEWA Database (SEWA DB), which serves as an extensive and rich resource for researchers engaged in audio-visual emotion and sentiment research within uncontrolled, real-world environments. It emphasizes addressing the challenges posed by existing databases, which often operate within constrained and controlled environments, thereby limiting their applicability to real-world settings.

Contributions of SEWA DB

The SEWA DB stands out due to several distinct features that make it ideally suited for affective computing studies:

Diversity in Subjects: It contains over 2000 minutes of data from 398 participants across six different cultures, maintaining an equitable gender balance and age diversity ranging from 18 to 65 years. This ensures the database's capacity for broad demographic representation.
Naturalistic Contexts: The recordings occur in real-world settings while subjects interact naturally, watching advertisements and discussing them via video chat, thereby capturing spontaneous behavioral responses rather than induced or acted ones.
Rich Annotations: The database includes detailed annotations encompassing facial landmarks, facial action units (FAUs), various vocalizations, valence, arousal, liking/disliking, and social signals like agreement and mimicry, offering comprehensive data for multifaceted emotion analysis.
Cross-Cultural Considerations: It uniquely supports large-scale cultural studies by providing diverse cultural data, a feature lacking in many existing datasets.

Methodology and Experiments

To support its utility in automatic behavior analysis, the paper provides baseline experiments focusing on automatic FAU detection and estimation of continuous emotion dimensions including valence, arousal, and liking/disliking intensity. The experimental setup leverages established machine learning techniques such as Support Vector Machines (SVM), Random Forests (RF), and Long Short Term Memory Recurrent Neural Networks (LSTM-RNN), as well as deep convolutional neural networks (DCNN) to validate the database's robustness and establish benchmarks for future research.

Implications and Future Directions

SEWA DB is poised to significantly contribute to the field of affective computing, not only advancing the development of emotion-aware systems suitable for deployment in everyday interfaces but also enhancing cross-cultural and interdisciplinary understanding of human affective behavior. Its extensive, real-world recordings shift the paradigm from laboratory-controlled studies to more applicable, wild environments, thus promising improvements in generalization of machine learning models.

Practically, SEWA DB is expected to aid in the creation of more responsive and emotionally intelligent systems, paving the way for enhancements in various digital interactions and human-computer interfaces. Theoretically, it encourages comprehensive investigations into the dynamics of human emotions across different cultures, thereby enriching socio-psychological studies.

In summary, SEWA DB provides a foundational step toward advancing audio-visual sentiment analysis in natural conditions, and its open-access nature ensures widespread utilization and continued evolution in this vibrant research domain.

PDF Markdown