Continuous Speech Tokenizer in Text To Speech (2410.17081v2)

Published 22 Oct 2024 in cs.SD, cs.CL, and eess.AS

Abstract: The fusion of speech and language in the era of LLMs has garnered significant attention. Discrete speech token is often utilized in text-to-speech tasks for speech compression and portability, which is convenient for joint training with text and have good compression efficiency. However, we found that the discrete speech tokenizer still suffers from information loss. Therefore, we propose a simple yet effective continuous speech tokenizer named Cont-SPT, and a text-to-speech model based on continuous speech tokens. Our results show that the speech LLM based on the continuous speech tokenizer has better continuity and higher estimated Mean Opinion Scores (MoS). This enhancement is attributed to better information preservation rate of the continuous speech tokenizer across both low and high frequencies in the frequency domain. The code and resources for Cont-SPT can be found in https://github.com/Yixing-Li/Continuous-Speech-Tokenizer

Citations (1)

View on Semantic Scholar

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Continuous Speech Tokenizer in Text To Speech (2410.17081v2)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (5)