Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge (2406.17801v1)

Published 22 Jun 2024 in cs.SD, cs.CL, and eess.AS

Abstract: This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers. Evaluation included both mono-lingual and cross-lingual synthesis across all seven languages, with subjective tests assessing naturalness and speaker similarity. Our system uses the VITS2 architecture, augmented with a multi-lingual ID and a BERT model to enhance contextual language comprehension. In Track 1, where no additional data usage was permitted, our model achieved a Speaker Similarity score of 4.02. In Track 2, which allowed the use of extra data, it attained a Speaker Similarity score of 4.17.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xiaopeng Wang (53 papers)
  2. Yi Lu (145 papers)
  3. Xin Qi (36 papers)
  4. Zhiyong Wang (120 papers)
  5. Yuankun Xie (19 papers)
  6. Shuchen Shi (14 papers)
  7. Ruibo Fu (54 papers)

Summary

We haven't generated a summary for this paper yet.