Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition (2104.00232v1)

Published 1 Apr 2021 in cs.CV

Abstract: Due to the subjective annotation and the inherent interclass similarity of facial expressions, one of key challenges in Facial Expression Recognition (FER) is the annotation ambiguity. In this paper, we proposes a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives: the latent Distribution Mining and the pairwise Uncertainty Estimation. For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space. For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space. The proposed method is independent to the backbone architectures, and brings no extra burden for inference. The experiments are conducted on the popular real-world benchmarks and the synthetic noisy datasets. Either way, the proposed DMUE stably achieves leading performance.

Citations (169)

Summary

  • The paper introduces DMUE, a novel framework that mitigates annotation ambiguity in FER by mining latent label distributions and estimating pairwise uncertainty.
  • It employs auxiliary training branches to probe sample ambiguity without increasing inference complexity, ensuring efficient deployment.
  • Experimental evaluations on RAF-DB and AffectNet demonstrate improved accuracy (89.42% and 63.11%, respectively), enhancing FER reliability.

Latent Distribution Mining and Pairwise Uncertainty Estimation for FER

This paper addresses a significant challenge in Facial Expression Recognition (FER): annotation ambiguity. Due to subjective annotations and inherent inter-class similarities, correctly identifying facial expressions in FER is often difficult. The authors propose a novel approach, termed DMUE, which addresses this problem through two main mechanisms: latent Distribution Mining and pairwise Uncertainty Estimation.

Annotation Ambiguity in FER

FER models typically rely on large-scale datasets for training. However, these datasets often contain subjective annotations, resulting from diverse interpretations of expressions among annotators. This subjective nature leads to annotation ambiguity. Such ambiguities can prevent models from learning robust visual features and achieving higher performance. Previous attempts to resolve this involved suppressing uncertain samples or altering training processes based on label distribution learning. However, these methods usually cannot fully resolve ambiguities when considered from an isolated instance perspective.

DMUE: Addressing Annotation Ambiguity

DMUE introduces a multi-branch learning framework, wherein auxiliary branches are used to probe and describe the latent distribution of each sample in the label space. This approach enables the model to be trained on a representation that reflects ambiguity in the data effectively. Additionally, DMUE employs a pairwise uncertainty estimation, capitalizing on the semantic features' relationships between samples. This allows the estimation of an ambiguity extent metric, which guides the model in dynamically adjusting the training focus between original annotations and mined label distributions.

The multi-branch learning facilitates latent distribution mining, providing substantial benefits to FER models. The training process doesn't change the inference time or complexity, as the auxiliary branches are only active during training and are later discarded. This ensures that the deployment remains efficient without incurring additional computational overheads.

Experimental Evaluation

Experiments are conducted using real-world benchmarks such as AffectNet, RAF-DB, and synthetic noisy datasets. The DMUE approach consistently achieves superior performance, demonstrating its efficacy in mitigating annotation ambiguities across various test sets. On RAF-DB, DMUE sets new benchmarks in accuracy at 89.42%, while on AffectNet, it similarly improves performance to 63.11%.

Implications and Future Directions

The introduction of DMUE significantly advances the current FER landscape by providing a robust mechanism to address annotation ambiguity. This has practical implications for applications where FER systems are integrated, such as service robots, driver monitoring, and human-computer interaction, enhancing their reliability and robustness.

From a theoretical standpoint, the latent distribution methodology provides a promising avenue for further developments, potentially extending into learning frameworks involving other types of ambiguous data. Additionally, the proposed uncertainty estimation methodology can inspire novel approaches to dynamic learning adaptation based on more sophisticated measures of data or model uncertainty.

Future research could explore expanding DMUE's applicability beyond facial expressions to other problems characterized by intrinsic ambiguity or noisy labels, such as sentiment analysis over textual data or real-time action recognition in videos. Further exploration into integrating DMUE with transformer-based models like Vision Transformers might also yield captivating insights.