Pre-trained Language Model Representations for Language Generation (1903.09722v2)

Published 22 Mar 2019 in cs.CL

Abstract: Pre-trained LLM representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

View on arXiv

Authors (3)

Sergey Edunov (26 papers)
Alexei Baevski (39 papers)
Michael Auli (73 papers)

Citations (129)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Pre-trained Language Model Representations for Language Generation (1903.09722v2)

Summary

Related Papers