Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer (2309.07566v2)

Published 14 Sep 2023 in cs.SD, cs.AI, and eess.AS

Abstract: Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech. Meanwhile, the scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer during translation. We design an S2ST pipeline with style-transfer capability on the basis of discrete self-supervised speech representations and codec units. The acoustic LLM we introduce for style transfer leverages self-supervised in-context learning, acquiring style transfer ability without relying on any speaker-parallel data, thereby overcoming data scarcity. By using extensive training data, our model achieves zero-shot cross-lingual style transfer on previously unseen source languages. Experiments show that our model generates translated speeches with high fidelity and speaker similarity. Audio samples are available at http://stylelm.github.io/ .

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yongqi Wang (24 papers)
  2. Jionghao Bai (1 paper)
  3. Rongjie Huang (62 papers)
  4. Ruiqi Li (44 papers)
  5. Zhiqing Hong (13 papers)
  6. Zhou Zhao (218 papers)
Citations (2)