2000 character limit reached
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling (2103.14574v7)
Published 26 Mar 2021 in cs.SD and eess.AS
Abstract: This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatically. Experimental results show that Parallel Tacotron 2 outperforms baselines in subjective naturalness in several diverse multi speaker evaluations. Its duration control capability is also demonstrated.
- Isaac Elias (5 papers)
- Heiga Zen (36 papers)
- Jonathan Shen (13 papers)
- Yu Zhang (1400 papers)
- Ye Jia (33 papers)
- RJ Skerry-Ryan (21 papers)
- Yonghui Wu (115 papers)