Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Effectiveness of Acoustic BPE in Decoder-Only TTS (2407.03892v1)

Published 4 Jul 2024 in cs.SD, cs.AI, and eess.AS

Abstract: Discretizing speech into tokens and generating them by a decoder-only model have been a promising direction for text-to-speech (TTS) and spoken LLMing (SLM). To shorten the sequence length of speech tokens, acoustic byte-pair encoding (BPE) has emerged in SLM that treats speech tokens from self-supervised semantic representations as characters to further compress the token sequence. But the gain in TTS has not been fully investigated, and the proper choice of acoustic BPE remains unclear. In this work, we conduct a comprehensive study on various settings of acoustic BPE to explore its effectiveness in decoder-only TTS models with semantic speech tokens. Experiments on LibriTTS verify that acoustic BPE uniformly increases the intelligibility and diversity of synthesized speech, while showing different features across BPE settings. Hence, acoustic BPE is a favorable tool for decoder-only TTS.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Bohan Li (88 papers)
  2. Feiyu Shen (6 papers)
  3. Yiwei Guo (30 papers)
  4. Shuai Wang (466 papers)
  5. Xie Chen (166 papers)
  6. Kai Yu (202 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com