Deep Facial Expression Recognition: A Survey (1804.08348v2)

Published 23 Apr 2018 in cs.CV

Abstract: With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

PDF Abstract

Deep Facial Expression Recognition: A Comprehensive Survey

The paper "Deep Facial Expression Recognition: A Survey" by Shan Li and Weihong Deng provides a thorough examination of the field of facial expression recognition (FER), emphasizing the transition from traditional methods to deep learning approaches. This survey not only presents an overview of datasets and algorithms but also addresses the intrinsic challenges specific to FER and speculates on future directions in the field. Here, we summarize the key points and insights from the paper.

Overview of Datasets for FER

The paper begins by discussing the various datasets available for FER research. Given that deep neural networks require substantial amounts of diverse training data, selecting appropriate datasets is crucial for training robust FER models. Several datasets have been constructed to capture basic expressions in lab-controlled and in-the-wild conditions. Notable datasets include:

CK+: Extensively used, lab-controlled, containing high-resolution image sequences.
MMI: Contains both posed and spontaneous expressions under different illumination conditions.
FER2013: A large-scale dataset collected from the web providing substantial in-the-wild conditions.
AffectNet: One of the largest datasets with manually annotated labels from real-world scenarios.

Other datasets, such as RAF-DB, TFD, and EmotioNet, also contribute significant training data but span varied environments and expression classes.

Key Challenges and Techniques in FER

Two principal issues in FER are the lack of sufficient labeled training data and expression-unrelated variations, such as changes in illumination, head pose, and identity bias. The paper discusses the following approaches to address these challenges:

Data Augmentation and Normalization

Data augmentation is essential to mitigate overfitting due to limited training data. Techniques such as rotation, scaling, noise addition, and histogram equalization are widely employed. Additionally, methods like pose normalization using 3D models and face frontalization using GANs help handle variations in head pose.

Network Architectures

The paper details various deep network architectures employed in FER:

CNNs: These are predominant in FER, leveraging hierarchical feature learning for effective facial representation.
DBNs and DAEs: Used for unsupervised feature learning, these networks help initialize deep networks in data-scarce regimes.
RNNs: Including LSTMs, suitable for capturing temporal dynamics in video-based FER.
GANs: Explored for augmenting training data and handling identity variations.

Specialized Techniques

Several advanced techniques are reviewed:

Multitask Learning: Utilized to disentangle the elements of facial expressions by leveraging auxiliary tasks such as facial landmark detection and AU detection.
Network Ensemble: Combining multiple networks to enhance robustness and performance.
Cascaded Networks: Sequentially stacking different models to capture hierarchical dependencies from low-level features to high-level representations.
Expression Intensity-Invariant Networks: Designed to handle varying expression intensities by learning correlations between peak and non-peak expressions.

Practical and Theoretical Implications

The practical implications of the survey are significant for applications in human-computer interaction, surveillance, and healthcare. On the theoretical side, integrating multiple modalities and exploring large-scale datasets for robust FER are pivotal. Moreover, addressing the dataset biases and imbalanced class distributions is critical for advancing the generalizability of FER systems.

Future Directions

The paper highlights several future directions:

Construction of Comprehensive Datasets: Including large-scale, diverse datasets with detailed annotations.
Integration of Multimodal Data: Combining visual data with audio and physiological signals to enhance recognition accuracy.
Advanced Generative Models: Using GANs and VAEs for data synthesis and augmentation to overcome annotation bottlenecks.
Cross-Dataset Generalization: Developing methods to enhance cross-dataset performance and reduce dataset biases.

Conclusion

In conclusion, this survey comprehensively covers the transition and advancements in FER due to deep learning. By systematically addressing the challenges and proposing advanced methods, it sets a foundation for future research and practical applications in deep facial expression recognition. The paper emphasizes the importance of sufficient training data, robust network architectures, and handling variations in real-world conditions to develop state-of-the-art FER systems.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Shan Li (30 papers)
Weihong Deng (71 papers)

Citations (1,169)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos