Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-trained Language Model Representations for Language Generation (1903.09722v2)

Published 22 Mar 2019 in cs.CL

Abstract: Pre-trained LLM representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sergey Edunov (26 papers)
  2. Alexei Baevski (39 papers)
  3. Michael Auli (73 papers)
Citations (129)

Summary

We haven't generated a summary for this paper yet.