- The paper introduces DMUE, a novel framework that mitigates annotation ambiguity in FER by mining latent label distributions and estimating pairwise uncertainty.
- It employs auxiliary training branches to probe sample ambiguity without increasing inference complexity, ensuring efficient deployment.
- Experimental evaluations on RAF-DB and AffectNet demonstrate improved accuracy (89.42% and 63.11%, respectively), enhancing FER reliability.
Latent Distribution Mining and Pairwise Uncertainty Estimation for FER
This paper addresses a significant challenge in Facial Expression Recognition (FER): annotation ambiguity. Due to subjective annotations and inherent inter-class similarities, correctly identifying facial expressions in FER is often difficult. The authors propose a novel approach, termed DMUE, which addresses this problem through two main mechanisms: latent Distribution Mining and pairwise Uncertainty Estimation.
Annotation Ambiguity in FER
FER models typically rely on large-scale datasets for training. However, these datasets often contain subjective annotations, resulting from diverse interpretations of expressions among annotators. This subjective nature leads to annotation ambiguity. Such ambiguities can prevent models from learning robust visual features and achieving higher performance. Previous attempts to resolve this involved suppressing uncertain samples or altering training processes based on label distribution learning. However, these methods usually cannot fully resolve ambiguities when considered from an isolated instance perspective.
DMUE: Addressing Annotation Ambiguity
DMUE introduces a multi-branch learning framework, wherein auxiliary branches are used to probe and describe the latent distribution of each sample in the label space. This approach enables the model to be trained on a representation that reflects ambiguity in the data effectively. Additionally, DMUE employs a pairwise uncertainty estimation, capitalizing on the semantic features' relationships between samples. This allows the estimation of an ambiguity extent metric, which guides the model in dynamically adjusting the training focus between original annotations and mined label distributions.
The multi-branch learning facilitates latent distribution mining, providing substantial benefits to FER models. The training process doesn't change the inference time or complexity, as the auxiliary branches are only active during training and are later discarded. This ensures that the deployment remains efficient without incurring additional computational overheads.
Experimental Evaluation
Experiments are conducted using real-world benchmarks such as AffectNet, RAF-DB, and synthetic noisy datasets. The DMUE approach consistently achieves superior performance, demonstrating its efficacy in mitigating annotation ambiguities across various test sets. On RAF-DB, DMUE sets new benchmarks in accuracy at 89.42%, while on AffectNet, it similarly improves performance to 63.11%.
Implications and Future Directions
The introduction of DMUE significantly advances the current FER landscape by providing a robust mechanism to address annotation ambiguity. This has practical implications for applications where FER systems are integrated, such as service robots, driver monitoring, and human-computer interaction, enhancing their reliability and robustness.
From a theoretical standpoint, the latent distribution methodology provides a promising avenue for further developments, potentially extending into learning frameworks involving other types of ambiguous data. Additionally, the proposed uncertainty estimation methodology can inspire novel approaches to dynamic learning adaptation based on more sophisticated measures of data or model uncertainty.
Future research could explore expanding DMUE's applicability beyond facial expressions to other problems characterized by intrinsic ambiguity or noisy labels, such as sentiment analysis over textual data or real-time action recognition in videos. Further exploration into integrating DMUE with transformer-based models like Vision Transformers might also yield captivating insights.