The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge (2404.16619v1)

Published 25 Apr 2024 in cs.SD and eess.AS

Abstract: This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder and flow-based decoder with Transformer blocks. In addition, we denoise the few-shot data, mix up them with pre-training data, and adopt a speaker-balanced sampling strategy to guarantee effective fine-tuning for target speakers. The official evaluations in track 1 show that our system achieves the best speaker similarity MOS of 4.25 and obtains considerable naturalness MOS of 3.97.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (5)

Authors (5)

Yixuan Zhou (30 papers)
Shuoyi Zhou (4 papers)
Shun Lei (21 papers)
Zhiyong Wu (171 papers)
Menglin Wu (3 papers)

The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge (2404.16619v1)

Related Papers