Papers
Topics
Authors
Recent
Search
2000 character limit reached

PolyVoice: Language Models for Speech to Speech Translation

Published 5 Jun 2023 in cs.CL and eess.AS | (2306.02982v2)

Abstract: We propose PolyVoice, a LLM-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two LLMs: a translation LLM and a speech synthesis LLM. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio LLM. This grants our framework the ability to preserve the voice characteristics and the speaking style of the original speech. We examine our system on Chinese $\rightarrow$ English and English $\rightarrow$ Spanish pairs. Experimental results show that our system can generate speech with high translation quality and audio quality. Speech samples are available at https://speechtranslation.github.io/polyvoice.

Citations (17)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

GitHub