Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 43 tok/s
GPT-5 High 37 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 466 tok/s Pro
Kimi K2 225 tok/s Pro
2000 character limit reached

Can we reconstruct a dysarthric voice with the large speech model Parler TTS? (2506.04397v1)

Published 4 Jun 2025 in eess.AS, cs.CL, and cs.SD

Abstract: Speech disorders can make communication hard or even impossible for those who develop them. Personalised Text-to-Speech is an attractive option as a communication aid. We attempt voice reconstruction using a large speech model, with which we generate an approximation of a dysarthric speaker's voice prior to the onset of their condition. In particular, we investigate whether a state-of-the-art large speech model, Parler TTS, can generate intelligible speech while maintaining speaker identity. We curate a dataset and annotate it with relevant speaker and intelligibility information, and use this to fine-tune the model. Our results show that the model can indeed learn to generate from the distribution of this challenging data, but struggles to control intelligibility and to maintain consistent speaker identity. We propose future directions to improve controllability of this class of model, for the voice reconstruction task.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.