Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model

Published 23 Dec 2020 in cs.SD and eess.AS | (2012.12612v2)

Abstract: This letter presents an incremental text-to-speech (TTS) method that performs synthesis in small linguistic units while maintaining the naturalness of output speech. Incremental TTS is generally subject to a trade-off between latency and synthetic speech quality. It is challenging to produce high-quality speech with a low-latency setup that does not make much use of an unobserved future sentence (hereafter, "lookahead"). To resolve this issue, we propose an incremental TTS method that uses a pseudo lookahead generated with a LLM to take the future contextual information into account without increasing latency. Our method can be regarded as imitating a human's incremental reading and uses pretrained GPT2, which accounts for the large-scale linguistic knowledge, for the lookahead generation. Evaluation results show that our method 1) achieves higher speech quality than the method taking only observed information into account and 2) achieves a speech quality equivalent to waiting for the future context observation.

Abstract PDF Upgrade to Chat

Citations (13)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections