A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances
The paper provides a comprehensive survey on the domain of affective computing, focusing on emotion models, databases, and recent methodological advancements. Affective computing, situated at the intersection of emotion recognition and sentiment analysis, has gained momentum, particularly due to the availability of extensive public databases and advancements in ML and deep learning (DL) technologies.
Emotion Models and Databases
The authors categorize emotion models into two principal types: discrete and dimensional. Discrete models classify emotions into distinct categories, as exemplified by Ekman's six basic emotions (e.g., anger, happiness) and Plutchik's wheel model. In contrast, dimensional models, such as the Valence-Arousal-Dominance (VAD) model, represent emotions in a continuous, multi-dimensional space, which better captures the complexities of affective states.
The review also identifies five categories of databases critical for the training and evaluation of affective computing systems—textual, speech/audio, visual, physiological, and multimodal. Each category encompasses a range of datasets, including established benchmarks like IMDB, EmoDB, CK+, and DEAP, each providing unique challenges and opportunities for model development and evaluation.
Unimodal Affect Recognition
The paper explores unimodal affect recognition, bifurcating it into ML-based and DL-based approaches. Textual sentiment analysis, speech emotion recognition (SER), visual emotion recognition, and physiological emotion recognition (primarily EEG and ECG) are systematically dissected. For textual analysis, the paper emphasizes the transition from traditional ML approaches, which relied heavily on feature engineering, to DL models capable of learning discriminative features directly from data. In SER and visual emotion recognition, DL models outperform traditional techniques in learning complex representations from raw input data.
Multimodal Affective Analysis
In addressing multimodal affective analysis, the authors categorize approaches based on the type of fusion strategy employed—feature-level, decision-level, model-level, and hybrid fusion. They further differentiate approaches by modality combinations, such as multi-physical (e.g., visual-audio), multi-physiological, and physical-physiological fusions. This comprehensive treatment highlights the superior performance of multimodal systems over unimodal counterparts, attributed to their ability to capture richer affective cues.
Discussion and Implications
The review provides a detailed discussion on the implications of combining various modalities and fusion strategies. It highlights the strengths of DL models in handling complex, high-dimensional affective data, though these models are significantly reliant on large-scale, high-quality datasets. The authors also discuss the limitations of current systems in real-world scenarios, pointing to the necessity for robust databases and advanced fusion strategies that incorporate rule-based or statistical knowledge.
Future Directions
The paper identifies critical areas for future research, including the development of comprehensive databases that encompass diverse scenarios and annotations, the exploration of zero-shot and unsupervised learning for robust affective analysis, and the integration of affective computing systems in practical applications like robotics. These considerations underline the dynamic and rapidly evolving nature of affective computing research.
In summary, the paper serves as a valuable resource for understanding the landscape of affective computing, providing insights into current methodologies, highlighting practical challenges, and proposing directions for future exploration.