Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications (1606.03237v1)

Published 10 Jun 2016 in cs.CV

Abstract: Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research.

Citations (469)

View on Semantic Scholar

Summary

The paper presents a taxonomy for AFER, detailing face localization, registration, feature extraction, and expression recognition across various imaging modalities.
The paper demonstrates robust techniques like CNNs, SVMs, and fusion strategies to overcome challenges in diverse capture conditions and dynamic expressions.
The paper discusses the implications for affective computing and outlines future directions in multimodal integration and continuous emotion mapping.

Overview of Automatic Facial Expression Recognition: RGB, 3D, Thermal, and Multimodal Approaches

This paper presents a comprehensive survey of techniques for automatic facial expression recognition (AFER), highlighting methods in RGB, 3D, thermal, and multimodal domains. The exploration provides a taxonomy and historical evolution, while also addressing practical applications and future research pathways.

Taxonomy and Techniques

The paper proposes a taxonomy that categorizes AFER into several salient components: Face Localization, Face Registration, Feature Extraction, Expression Recognition, and Multimodal Fusion.

Face Localization and Registration:
- RGB: Techniques like AdaBoost, SVMs, and CNNs for face detection; ASM and AAM for registration.
- 3D: Utilization of ICP and curvature matching for registration.
- Thermal: Image segmentation approaches using radiance information.
Feature Extraction:
- Predesigned Features: Geometric and appearance-based, targeting both global and local aspects.
- Learned Features: CNNs and DBNs, optimizing feature extraction alongside classification.
Expression Recognition:
- Divided into static and dynamic approaches, using models such as HMM, SVMs, and neural networks for predicting categorical and continuous expressions.
Multimodal Approaches:
- Fusion Strategies: Early, late, and sequential fusion methods are evaluated to enhance the robustness and comprehensiveness of recognition systems.

Datasets and Evaluations

The survey details essential datasets used in AFER studies, emphasizing variations in expressions, capture environments, and subject demographics. These include CK+, MMI, Multi-PIE for RGB, and Bosphorus, BU-3DFE for 3D. Thermal imaging datasets such as NVIE also contribute to comprehensive evaluations.

Trends and Challenges

Intensity Estimation: Recognized as crucial for understanding nuanced expressions, studies often explore regression directly from features rather than classification margins.
Microexpression Analysis: Focuses on capturing brief, low-intensity expressions with high-speed cameras and sophisticated dynamic features.
Non-primary Expression Analysis: Extends beyond basic emotions to infer complex mental states and personal traits, often integrating multimodal data.
Expressions in Naturalistic Environments: Address challenges of variable lighting, head poses, and spontaneity, leveraging dynamic models and learned feature representations.

Implications and Future Directions

The paper indicates that AFER has significant implications in affective computing, human-computer interaction, and psychological assessment. The future of AFER lies in enhancing multimodal integration, improving robustness to environmental variations, and developing deeper understanding through continuous emotion mapping. Advanced neural architectures and richer datasets will drive the next wave of innovations in this domain.

PDF Markdown