AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild (1708.03985v4)

Published 14 Aug 2017 in cs.CV

Abstract: Automated affective computing in the wild setting is a challenging problem in computer vision. Existing annotated databases of facial expressions in the wild are small and mostly cover discrete emotions (aka the categorical model). There are very limited annotated facial databases for affective computing in the continuous dimensional model (e.g., valence and arousal). To meet this need, we collected, annotated, and prepared for public distribution a new database of facial emotions in the wild (called AffectNet). AffectNet contains more than 1,000,000 facial images from the Internet by querying three major search engines using 1250 emotion related keywords in six different languages. About half of the retrieved images were manually annotated for the presence of seven discrete facial expressions and the intensity of valence and arousal. AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models. Two baseline deep neural networks are used to classify images in the categorical model and predict the intensity of valence and arousal. Various evaluation metrics show that our deep neural network baselines can perform better than conventional machine learning methods and off-the-shelf facial expression recognition systems.

View on arXiv

Authors (3)

Ali Mollahosseini (11 papers)
Behzad Hasani (6 papers)
Mohammad H. Mahoor (35 papers)

Citations (1,480)

View on Semantic Scholar

Summary

Overview of "AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild"

The paper entitled "AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild" introduces a rich dataset aimed at addressing the challenging problem of affective computing in unconstrained environments. Developed by Mollahosseini, Hasani, and Mahoor, AffectNet is the most extensive public dataset of annotated facial images, thereby facilitating research in both categorical and dimensional models of emotion recognition.

Data Collection and Annotation

The creation of AffectNet involved querying three major search engines (Google, Bing, and Yahoo) using 1250 emotion-related keywords in six different languages. This produced over 1,000,000 facial images. The database is noteworthy for its sizeable annotated subset, encompassing approximately 450,000 images. These images are manually labeled with seven discrete emotions (neutral, happy, sad, surprise, fear, disgust, and anger) as well as continuous valence and arousal values. This dual-model annotation is particularly significant as it supports research in both mainstream affective computing models.

Methodology

The images were systematically organized and processed. Facial landmarks were automatically detected, and annotations for emotion categories and valence-arousal values were meticulously carried out by trained human annotators. The collected data show a diverse representation of facial expressions. Half of the images were manually annotated, making AffectNet markedly larger than previous databases.

Baseline Results

Two baselines were established using deep neural networks (DNNs), specifically the AlexNet architecture, for both categorical emotion classification and valence-arousal prediction.

Categorical Model

For the categorical model, the authors explored four training methods to handle imbalanced datasets: imbalanced learning, down-sampling, up-sampling, and weighted-loss. The weighted-loss method outperformed the other approaches with notable improvements in under-represented classes. The F1-scores for the weighted-loss approach were quantified, demonstrating enhanced accuracy, particularly in classes like contempt and disgust.

Dimensional Model

In predicting valence and arousal, separate AlexNet models were employed using a Euclidean (L2) loss function. The deep CNN baseline predicted valence and arousal more accurately than traditional Support Vector Regression (SVR) systems, with notable performance metrics such as Root Mean Square Error (RMSE) and Concordance Correlation Coefficient (CCC) indicating higher prediction reliability.

Comparative Analysis

The performance of the AffectNet-trained DNNs was compared against existing systems, including the Microsoft Cognitive Services emotion API. While the latter performed well on neutral and happy expressions, it lagged significantly in accurately identifying other categories like fear and contempt. The superior results with AffectNet-trained models underline the robustness and effectiveness of the database.

Implications and Future Work

The creation and public availability of AffectNet is a seminal contribution to the field of affective computing. The scale and diversity of the dataset are expected to accelerate progress in automated facial expression recognition and affective computing by providing a substantial amount of in-the-wild data. AffectNet allows researchers to refine models that handle real-world variations in facial expressions, thereby moving closer to developing robust, affect-aware systems.

Speculatively, the integration of affective models trained on AffectNet with other modalities such as voice and physiological signals could lead to holistic emotion recognition systems. Further, advancements in deep learning architectures specifically optimized for affective computing could leverage this dataset to push the boundaries in both theoretical and applied domains.

In conclusion, AffectNet represents a significant leap forward in the data-driven research of facial expressions, valence, and arousal. This comprehensive dataset not only bridges existing gaps but also sets a new standard in the landscape of affective computing research.

PDF Markdown

Related Papers

Find Related Papers