Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines (2111.03811v3)

Published 6 Nov 2021 in cs.SD, cs.AI, and eess.AS

Abstract: Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice conversion. We aim to obtain intermediate representations for speaker-content disentanglement of speech to better remove speaker information and get pure content information. Accordingly, our proposed framework contains a module that removes the speaker information from the acoustic feature of the source speaker. Moreover, speaker information control is added to our system to maintain the voice cloning performance. The proposed system is evaluated by subjective and objective metrics. Results show that our proposed system significantly reduces the trade-off problem in zero-shot voice conversion, while it also manages to have high spoofing power to the speaker verification system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Haozhe Zhang (17 papers)
  2. Zexin Cai (19 papers)
  3. Xiaoyi Qin (27 papers)
  4. Ming Li (787 papers)
Citations (14)