Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains (2310.02640v3)

Published 4 Oct 2023 in eess.AS

Abstract: We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech. This year, we emphasize real-world and challenging zero-shot out-of-domain MOS prediction with three tracks for three different voice evaluation scenarios. Ten teams from industry and academia in seven different countries participated. Surprisingly, we found that the two sub-tracks of French text-to-speech synthesis had large differences in their predictability, and that singing voice-converted samples were not as difficult to predict as we had expected. Use of diverse datasets and listener information during training appeared to be successful approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “The VoiceMOS Challenge 2022,” in Proc. Interspeech, 2022, pp. 4536–4540.
  2. “How do voices from past speech synthesis challenges compare today?,” in Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 2021, pp. 183–188.
  3. “NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets,” in Proc. Interspeech, 2021, pp. 2127–2131.
  4. “ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications,” in Proc. Interspeech, 2022, pp. 3308–3312.
  5. “The singing voice conversion challenge 2023,” arXiv preprint arXiv:2306.14422, 2023.
  6. “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54–70, 2022.
  7. A Black and Keiichi Tokuda, “The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common databases,” in Proc. Interspeech, 2005, pp. 77–80.
  8. “The Blizzard Challenge 2023,” in Proc. 18th Blizzard Challenge Workshop, Grenoble, France, August 29 2023, https://www.synsig.org/index.php/Blizzard_Challenge_2023.
  9. “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” in Proc. ICASSP, 2018.
  10. “Fastspeech 2: Fast and high-quality end-to-end text to speech,” in Proc. International Conference on Learning Representations, 2021.
  11. “The Voice Conversion Challenge 2016,” in Proc. Interspeech, 2016, pp. 1632–1636.
  12. “The Voice Conversion Challenge 2018: Promoting development of parallel and nonparallel methods,” in Proc. Odyssey The Speaker and Language Recognition Workshop, 2018, pp. 195–202.
  13. “Voice Conversion Challenge 2020 - Intra-lingual semi-parallel and cross-lingual voice conversion -,” in Proc. Joint Workshop for the BC and VCC 2020, 2020, pp. 80–98.
  14. “A study on incorporating Whisper for robust speech assessment,” 2023.
  15. “SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis,” in Proc. Interspeech 2022, 2022, pp. 2388–2392.
  16. “Ressources for End-to-End French Text-to-Speech Blizzard challenge,” Jan. 2023, https://doi.org/10.5281/zenodo.7560290.
  17. “Generalization ability of MOS prediction networks,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 8442–8446.
  18. “UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,” in Proc. Interspeech 2022, 2022, pp. 4521–4525.
  19. “LDNet: unified listener dependent modeling in MOS prediction for synthetic speech,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 896–900.
  20. “SpeechLMScore: evaluating speech generation using speech language model,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
Citations (17)

Summary

We haven't generated a summary for this paper yet.