Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution (1608.01041v2)

Published 3 Aug 2016 in cs.CV

Abstract: Crowd sourcing has become a widely adopted scheme to collect ground truth labels. However, it is a well-known problem that these labels can be very noisy. In this paper, we demonstrate how to learn a deep convolutional neural network (DCNN) from noisy labels, using facial expression recognition as an example. More specifically, we have 10 taggers to label each input image, and compare four different approaches to utilizing the multiple labels: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. We show that the traditional majority voting scheme does not perform as well as the last two approaches that fully leverage the label distribution. An enhanced FER+ data set with multiple labels for each face image will also be shared with the research community.

Authors (4)

Emad Barsoum (41 papers)
Cha Zhang (23 papers)
Cristian Canton Ferrer (32 papers)
Zhengyou Zhang (21 papers)

Citations (652)

View on Semantic Scholar

Summary

The paper introduces four innovative methods to integrate crowd-sourced label distributions into deep network training, with techniques like PLD and CEL yielding a 1% accuracy improvement.
It employs a customized VGG13 network on the FER+ dataset to demonstrate that probabilistic approaches outperform majority voting and multi-label learning.
The study underscores the potential of these methods to enhance facial expression recognition systems in practical human-computer interaction applications.

Overview of Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution

The paper presents a paper on improving facial expression recognition by leveraging crowd-sourced label distributions for training deep convolutional neural networks (DCNNs). It addresses the inherent noise in crowd-sourced data and explores multiple strategies for utilizing such data effectively.

Methodology

Facial expression recognition has traditionally depended on labels derived from either expert coders or crowd-sourcing platforms. The latter, while cost-efficient, often results in noisy data. This paper introduces four approaches to utilize labels from 10 taggers per image in the Facial Expression Recognition Plus (FER+) dataset:

Majority Voting (MV): Utilizes the label with the highest votes as the true label.
Multi-Label Learning (ML): Considers the potential for images to express multiple emotions by focusing on emotions tagged frequently enough by the crowd.
Probabilistic Label Drawing (PLD): Randomly selects an emotion for each image based on the label distribution, thus addressing variability over multiple training epochs.
Cross-Entropy Loss (CEL): Directly uses the label distribution in the loss function to guide learning.

Experimental Results

The experiments are conducted using a customized VGG13 network structure on the FER+ dataset. PLD and CEL demonstrate superior performance over MV and ML, improving accuracy in predicting the majority emotions, showcasing a 1% gain. The results suggest that incorporating label distributions directly into model training (as with PLD and CEL) yields robustness against the noise typical in crowd-sourced labels.

A detailed analysis via a confusion matrix highlights issues with certain emotions, such as disgust and contempt, presumably due to underrepresentation in the dataset. However, most other emotions are accurately classified.

Implications and Future Work

The paper contributes to facial expression recognition by demonstrating effective methods to handle noisy labels through utilizing crowd-sourced data distributions rather than relying on a single label. This approach enhances the flexibility and reliability of DCNN models used in emotion recognition tasks.

Practically, the method could improve emotion recognition systems integrated into human-computer interaction applications, providing more natural and intuitive interfaces. Theoretically, this paper sets a precedent for further inquiry into incorporating probabilistic approaches for noisy labeling situations.

Future research could explore optimizing parameter settings for more challenging emotions, expand on additional DCNN architectures, and investigate the potential of semi-supervised learning to further mitigate the labeling noise using the diverse data compositions from crowd-sourcing.

The provision of the FER+ dataset for public use invites further validation and potential expansions by the wider research community, promising progressive advancements in emotion recognition capabilities.

PDF Markdown

Related Papers

GitHub

GitHub - microsoft/FERPlus: This is the FER+ new label annotations for the Emotion FER dataset. (637 stars)