Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech (2005.09271v1)

Published 19 May 2020 in cs.CL, cs.SD, and eess.AS

Abstract: Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre. In this paper, we propose approaches to improving accent conversion applicability, as well as quality. First of all, we assume no reference speech is available at the conversion stage, and hence we employ an end-to-end text-to-speech system that is trained on native speech to generate native reference speech. To improve the quality and accent of the converted speech, we introduce reference encoders which make us capable of utilizing multi-source information. This is motivated by acoustic features extracted from native reference and linguistic information, which are complementary to conventional phonetic posteriorgrams (PPGs), so they can be concatenated as features to improve a baseline system based only on PPGs. Moreover, we optimize model architecture using GMM-based attention instead of windowed attention to elevate synthesized performance. Experimental results indicate when the proposed techniques are applied the integrated system significantly raises the scores of acoustic quality (30$\%$ relative increase in mean opinion score) and native accent (68$\%$ relative preference) while retaining the voice identity of the non-native speaker.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Wenjie Li (183 papers)
  2. Benlai Tang (10 papers)
  3. Xiang Yin (99 papers)
  4. Yushi Zhao (3 papers)
  5. Wei Li (1122 papers)
  6. Kang Wang (72 papers)
  7. Hao Huang (155 papers)
  8. Yuxuan Wang (239 papers)
  9. Zejun Ma (78 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.