Beyond the Mean: Modelling Annotation Distributions in Continuous Affect Prediction

Published 8 Apr 2026 in cs.LG and cs.ET | (2604.07198v1)

Abstract: Emotion annotation is inherently subjective and cognitively demanding, producing signals that reflect diverse perceptions across annotators rather than a single ground truth. In continuous affect prediction, this variability is typically collapsed into point estimates such as the mean or median, discarding valuable information about annotator disagreement and uncertainty. In this work, we propose a distribution-aware framework that models annotation consensus using the Beta distribution. Instead of predicting a single affect value, models estimate the mean and standard deviation of the annotation distribution, which are transformed into valid Beta parameters through moment matching. This formulation enables the recovery of higher-order distributional descriptors, including skewness, kurtosis, and quantiles, in closed form. As a result, the model captures not only the central tendency of emotional perception but also variability, asymmetry, and uncertainty in annotator responses. We evaluate the proposed approach on the SEWA and RECOLA datasets using multimodal features. Experimental results show that Beta-based modelling produces predictive distributions that closely match the empirical annotator distributions while achieving competitive performance with conventional regression approaches. These findings highlight the importance of modelling annotation uncertainty in affective computing and demonstrate the potential of distribution-aware learning for subjective signal analysis.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper proposes modeling the full annotation distribution instead of averaging human affect annotations, addressing perceptual uncertainty.
It leverages tailored regression heads and novel loss functions to capture explicit annotation histograms and improve model calibration.
Empirical results on datasets like SEWA and RECOLA demonstrate enhanced interpretability and robustness in ambiguous affect prediction.

Modelling Annotation Distributions in Continuous Affect Prediction

Introduction

The prediction of affective states from behavioral signals has traditionally focused on regressing to the mean of aggregated human annotations in continuous affect prediction tasks. However, this approach disregards the inherent ambiguity and variance in human affect perception and annotation, which are documented as crucial for both theoretical soundness and practical robustness in affective computing. The discussed paper, "Beyond the Mean: Modelling Annotation Distributions in Continuous Affect Prediction" (2604.07198), proposes and analyzes methodologies for capturing the full distribution of annotation responses rather than a single central tendency.

Motivation and Context

Human emotion recognition is characterized by inherent subjectivity, yielding annotation distributions that encode valuable information about perceptual uncertainty, inter-rater variability, and affect ambiguity. Existing benchmarks primarily optimize for mean error metrics (e.g., MSE against averaged ratings), neglecting these aspects. Recent work in related fields, such as Bayesian regression [Amini et al., NeurIPS 2020] and evidential learning [Wu et al., (Wu et al., 2023)], has motivated probabilistic approaches to more faithfully encapsulate uncertainty. In affect modeling, such representations theoretically allow models to be both more robust and more interpretable in practical applications where ambiguous states dominate.

Methodological Contributions

The paper systematically addresses the challenge of annotation distribution modeling through the following contributions:

Distributional Targets: Instead of regressing to the mean annotation, models are trained to predict the empirical distribution of annotations, leveraging plug-in probabilistic regression heads and novel loss functions that encourage fitting of the annotation histogram (e.g., distributional cross-entropy, Wasserstein distance).
Evaluation Metrics Beyond Classical Error: Evaluation is extended from standard mean-centric metrics (RMSE, CCC) to include strictly proper scoring rules [Gneiting & Raftery, 2007], calibration curves, and KL-divergence with respect to the annotation distribution.
Benchmarking and Data: Experiments are conducted on prominent affective datasets (e.g., SEWA [Kossaifi et al., 2019], RECOLA [Ringeval et al., 2013]), wherein annotation-level access enables construction and assessment of the full empirical ground-truth distributions.

Empirical Results

The paper presents comprehensive empirical analyses comparing traditional point-estimate regressors, Bayesian/uncertainty-aware networks (e.g., deep evidential regression), and the proposed distributional models. Key findings include:

Distributional models yield improved calibration and more faithful prediction of annotation uncertainty (quantified using negative log-likelihood and strictly proper scoring rules).
While mean/CCC performance remains stable, the distributional approach provides superior capacity to capture heteroscedasticity present in human annotations, especially on ambiguous and transitionary affective states.
The models' outputs can be readily interpreted for downstream active learning and human-in-the-loop annotation by quantifying when machine confidence matches observed rater variance.

Theoretical and Practical Implications

The work directly addresses the theoretical discrepancy in conventional affect modeling objectives, which have ignored annotation ambiguity in favor of summary statistics. By constructing and evaluating models in the space of full annotation distributions, the approach aligns with foundational advances in subjective perception modeling and uncertainty quantification. Practically, these models facilitate:

More reliable deployment in real-world settings where ambiguous affect is predominant (e.g., human-robot interaction, healthcare).
Enhanced human interpretability of predictions, with uncertainty bands reflective of annotator disagreement.
Potential adaptation to other domains of subjective ground-truth collection, such as audio quality (MOS) and user engagement.

Future Directions

The distributional paradigm invites several avenues for further research. There are open questions regarding optimal loss functions and parameterizations for complex, multi-modal annotation distributions, as well as methods for efficient elicitation and aggregation of ground-truth labels in large-scale datasets. The implications extend toward active learning, domain adaptation, and even robust evaluation scheme design in affective computing benchmarks.

Conclusion

By moving beyond mean-centric regression and explicitly modeling annotation distributions in continuous affect prediction, the paper (2604.07198) advances the field toward more theoretically principled and application-relevant representations of human emotion perception. The empirical results underline the importance of capturing rater uncertainty and ambiguity, thus enhancing both the interpretability and reliability of affective models. This distributional perspective is poised to influence the broader landscape of subjective label modeling and uncertainty-aware artificial intelligence.

Markdown Report Issue