Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion (2207.01454v1)

Published 4 Jul 2022 in eess.AS, cs.CL, and cs.LG

Abstract: In this paper, we propose GlowVC: a multilingual multi-speaker flow-based model for language-independent text-free voice conversion. We build on Glow-TTS, which provides an architecture that enables use of linguistic features during training without the necessity of using them for VC inference. We consider two versions of our model: GlowVC-conditional and GlowVC-explicit. GlowVC-conditional models the distribution of mel-spectrograms with speaker-conditioned flow and disentangles the mel-spectrogram space into content- and pitch-relevant dimensions, while GlowVC-explicit models the explicit distribution with unconditioned flow and disentangles said space into content-, pitch- and speaker-relevant dimensions. We evaluate our models in terms of intelligibility, speaker similarity and naturalness for intra- and cross-lingual conversion in seen and unseen languages. GlowVC models greatly outperform AutoVC baseline in terms of intelligibility, while achieving just as high speaker similarity in intra-lingual VC, and slightly worse in the cross-lingual setting. Moreover, we demonstrate that GlowVC-explicit surpasses both GlowVC-conditional and AutoVC in terms of naturalness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Magdalena Proszewska (8 papers)
  2. Grzegorz Beringer (6 papers)
  3. Daniel Sáez-Trigueros (6 papers)
  4. Thomas Merritt (16 papers)
  5. Abdelhamid Ezzerg (8 papers)
  6. Roberto Barra-Chicote (24 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.