Reference-free perceptual metrics for audio separation

Develop reference-free objective evaluation metrics for audio source separation that correlate strongly with human judgments of separation quality, with particular emphasis on vocals and complex mixtures where existing measures underperform.

Background

The paper criticizes widely used distortion-based metrics (e.g., SDR/SI-SDR) for their weak correspondence to perceptual judgments, especially in in-the-wild content where clean references are unavailable. Although SAM Audio Judge is introduced as a reference-free predictor aligned with human ratings, the authors acknowledge that broader development of reliable, reference-free metrics remains unresolved.

This open area is particularly acute for vocal sources and complex acoustic mixtures, where current metrics fail to capture perceptual nuances, underscoring the need for new objective measures that generalize across domains and artifact types.

References

Developing better reference-free objective metrics that correlate with human judgments—especially for vocals and complex mixtures—remains an open area~ptm{stoter2024bakeoff}.

SAM Audio: Segment Anything in Audio  (2512.18099 - Shi et al., 19 Dec 2025) in Section 3.1 (Subjective Evaluation Design), Limitations