Explicit modeling of presentation context in objective audio quality metrics

Develop an explicit modeling framework to incorporate presentation context—such as mixed-condition trials where identical signals and distortions are presented alongside different stereo processing modes (e.g., SHmix and QNmix combining Mid/Side and Left/Right degradations)—into objective audio quality metrics, so that predicted quality scores reflect top-down contextual influences observed in MUSHRA listening tests.

Background

The study reports that nearly all evaluated objective audio quality metrics performed poorly on mixed-presentation experiments (SHmix and QNmix), even though several of the same metrics predicted quality reliably when the same distortions were presented in homogeneous contexts (e.g., QNLR, QNMS, SHLR, SHMS). This indicates that listeners’ judgments depend strongly on the presentation context, a top-down factor not currently captured by most models.

The authors note that this context effect confounds objective metrics, implying that existing systems fail to implicitly or explicitly account for presentation context in their analysis. They highlight the need for future developments to address this limitation and suggest a potential data-driven approach that leverages ground truth sets with varied presentation contexts to map bottom-up distortion metrics to quality scores.

References

Future developments in audio quality metrics need to address this limitation. Although it is not clear how presentation context can be explicitly modeled into the metrics (a top-down process), a potential solution may include a data-driven approach that incorporates ground truth sets with different presentation contexts to map the different distortion metrics (i.e., bottom-up processes) into a quality score estimate.

Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality  (2512.10689 - Delgado et al., 11 Dec 2025) in Discussion, Influence of presentation context