RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer (2508.17031v1)

Published 23 Aug 2025 in cs.SD and cs.CL

Abstract: We propose a method for the task of text-conditioned speech insertion, i.e. inserting a speech sample in an input speech sample, conditioned on the corresponding complete text transcript. An example use case of the task would be to update the speech audio when corrections are done on the corresponding text transcript. The proposed method follows a transformer-based non-autoregressive approach that allows speech insertions of variable lengths, which are dynamically determined during inference, based on the text transcript and tempo of the available partial input. It is capable of maintaining the speaker's voice characteristics, prosody and other spectral properties of the available speech input. Results from our experiments and user study on LibriTTS show that our method outperforms baselines based on an existing adaptive text to speech method. We also provide numerous qualitative results to appreciate the quality of the output from the proposed method.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (3)

Tweets

https://twitter.com/ArxivSound/status/1960258372370080205

alphaXiv

RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer (13 likes, 0 questions)