Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
46 tokens/sec
GPT-5 Medium
19 tokens/sec
GPT-5 High Premium
32 tokens/sec
GPT-4o
87 tokens/sec
DeepSeek R1 via Azure Premium
98 tokens/sec
GPT OSS 120B via Groq Premium
435 tokens/sec
Kimi K2 via Groq Premium
207 tokens/sec
2000 character limit reached

Aligning Text-to-Music Evaluation with Human Preferences (2503.16669v1)

Published 20 Mar 2025 in cs.SD, cs.AI, and eess.AS

Abstract: Despite significant recent advances in generative acoustic text-to-music (TTM) modeling, robust evaluation of these models lags behind, relying in particular on the popular Fr\'echet Audio Distance (FAD). In this work, we rigorously study the design space of reference-based divergence metrics for evaluating TTM models through (1) designing four synthetic meta-evaluations to measure sensitivity to particular musical desiderata, and (2) collecting and evaluating on MusicPrefs, the first open-source dataset of human preferences for TTM systems. We find that not only is the standard FAD setup inconsistent on both synthetic and human preference data, but that nearly all existing metrics fail to effectively capture desiderata, and are only weakly correlated with human perception. We propose a new metric, the MAUVE Audio Divergence (MAD), computed on representations from a self-supervised audio embedding model. We find that this metric effectively captures diverse musical desiderata (average rank correlation 0.84 for MAD vs. 0.49 for FAD and also correlates more strongly with MusicPrefs (0.62 vs. 0.14).

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube