General-purpose MOS prediction across diverse speech domains
Determine whether a single automatic mean opinion score (MOS) prediction model can achieve consistently high performance across heterogeneous speech domains and listening test contexts—such as French text-to-speech synthesis (Blizzard Challenge 2023), singing voice conversion (SVCC 2023), and noisy/enhanced speech (TMHINT-QI(S))—when trained on the same dataset without per-domain adaptation, thereby establishing truly general-purpose MOS prediction capability.
Sponsor
References
The most important result was that most teams' scores for the different tracks are very different, and no team had high scores on all tracks using the same model trained on the same data, indicating that general-purpose MOS prediction can still be considered an open research problem.