Handling overlapping character sets in multilingual TTS
Determine effective strategies to prevent performance degradation in CosyVoice 2’s multilingual text-to-speech synthesis for languages with overlapping character sets (e.g., Chinese–Japanese overlap), ensuring accurate pronunciation and naturalness under such conditions.
References
For languages with overlapping character sets, synthesis performance may degrade, presenting an open challenge for future research.
— CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
(2412.10117 - Du et al., 13 Dec 2024) in Section Limitations