Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis

Published 4 Apr 2019 in cs.CL | (1904.02373v1)

Abstract: Speech style control and transfer techniques aim to enrich the diversity and expressiveness of synthesized speech. Existing approaches model all speech styles into one representation, lacking the ability to control a specific speech feature independently. To address this issue, we introduce a novel multi-reference structure to Tacotron and propose intercross training approach, which together ensure that each sub-encoder of the multi-reference encoder independently disentangles and controls a specific style. Experimental results show that our model is able to control and transfer desired speech styles individually.