Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer (2307.16171v1)

Published 30 Jul 2023 in cs.SD, cs.AI, cs.MM, and eess.AS

Abstract: Despite rapid progress in the voice style transfer (VST) field, recent zero-shot VST systems still lack the ability to transfer the voice style of a novel speaker. In this paper, we present HierVST, a hierarchical adaptive end-to-end zero-shot VST model. Without any text transcripts, we only use the speech dataset to train the model by utilizing hierarchical variational inference and self-supervised representation. In addition, we adopt a hierarchical adaptive generator that generates the pitch representation and waveform audio sequentially. Moreover, we utilize unconditional generation to improve the speaker-relative acoustic capacity in the acoustic representation. With a hierarchical adaptive structure, the model can adapt to a novel voice style and convert speech progressively. The experimental results demonstrate that our method outperforms other VST models in zero-shot VST scenarios. Audio samples are available at \url{https://hiervst.github.io/}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sang-Hoon Lee (24 papers)
  2. Ha-Yeong Choi (7 papers)
  3. Hyung-Seok Oh (9 papers)
  4. Seong-Whan Lee (132 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.