Pretrained Language Models for Text Generation: A Survey (2105.10311v2)

Published 21 May 2021 in cs.CL and cs.AI

Abstract: Text generation has become one of the most important yet challenging tasks in NLP. The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained LLMs (PLMs). In this paper, we present an overview of the major advances achieved in the topic of PLMs for text generation. As the preliminaries, we present the general task definition and briefly describe the mainstream architectures of PLMs for text generation. As the core content, we discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. We further summarize several important fine-tuning strategies for text generation. Finally, we present several future directions and conclude this paper. Our survey aims to provide text generation researchers a synthesis and pointer to related research.

PDF Abstract

Overview of Pretrained LLMs for Text Generation

The paper "Pretrained LLMs for Text Generation: A Survey" provides a comprehensive examination of the use of pretrained LLMs (PLMs) in various text generation tasks within NLP. It systematically reviews the key advancements in adapting PLMs for text generation tasks, providing an insightful synthesis for researchers in this domain.

PLMs have transformed text generation by enabling models to be pretrained on large unlabelled textual datasets before being fine-tuned for specific tasks, thus overcoming the limitations posed by smaller task-specific datasets. This transfer learning approach allows models like BERT, GPT, and T5 to encode substantial linguistic knowledge and provide state-of-the-art results when adapted to a variety of text generation tasks.

Task Definitions and Applications

Text generation tasks vary widely and can involve differing input forms such as random noise, discrete attributes, structured data, multimedia, and text sequences. Typical applications include machine translation, summarization, and dialogue systems. The paper categorizes these based on their input data types and provides detailed examples such as unconditional generation, attribute-based generation, and data-to-text generation.

Architectures in Text Generation PLMs

The architectures of PLMs for text generation are primarily divided into encoder-decoder Transformers and decoder-only Transformers. Encoder-decoder models, such as BART and T5, leverage the typical enc-dec architecture to perform sophisticated sequence-to-sequence tasks. Meanwhile, models like GPT use a decoder-only architecture, optimizing for tasks requiring the generation of text from given prompts.

These architectures enable PLMs to efficiently handle the core requirement of text generation: modeling semantic mappings from inputs to outputs, regardless of the nature of the input data—be it unstructured text, structured data, or multimedia inputs.

Input Data Types and Challenges

Unstructured Input: This commonly involves textual data, where PLMs like BERT are used for encoding. Hierarchical transformers are employed for long documents, and cross-lingual LLMs are developed to handle multiple languages.

Structured Input: Technologies like linearization of input structures (e.g., graphs as sequences) and auxiliary tasks for preserving structural information are utilized in tasks such as KG-to-text and table-to-text generation, addressing the challenges of data sparsity and representation alignment.

Multimedia Input: The paper discusses cross-modal trained models such as VideoBERT, aimed at tasks like video captioning and speech recognition, showcasing the broad applicability and versatility of PLMs beyond text-only data.

Output Text Properties

When generating text, satisfying specific properties is crucial:

Relevance: Ensures the generated content is contextually aligned with the input, especially important for dialogue systems.
Faithfulness: Refers to the accuracy of the content in reflecting the input's facts, critical in summarization tasks.
Order-preservation: Maintains the semantic order across languages, essential for accurate machine translation.

These properties highlight the necessity for PLMs to generate coherent, accurate, and contextually appropriate text.

Fine-tuning Strategies

The survey identifies multiple fine-tuning strategies such as domain adaptation and few-shot learning, emphasizing the importance of tailoring PLMs to specific text generation tasks. These strategies encompass various views including data-centric, task-centric, and model-oriented approaches to address domain-specific challenges and support better generalization.

Implications and Future Directions

The survey provides several insights into the theoretical and practical implications of employing PLMs in text generation. It suggests future research directions such as enhancing model architectures, achieving controllable text generation, and compressing PLMs to make them more efficient.

The survey advocates for continued exploration of language-agnostic PLMs and the ethical use of these models, particularly in addressing bias and ensuring the fidelity of generated content. These developments are crucial for advancing the application and efficacy of PLMs in diverse text generation tasks across languages and domains.

In summary, the paper serves as a valuable reference point for researchers, summarizing current advancements and charting a course for future investigations into the application of PLMs in the field of text generation.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Junyi Li (92 papers)
Tianyi Tang (30 papers)
Wayne Xin Zhao (196 papers)
Ji-Rong Wen (299 papers)

Citations (167)

View on Semantic Scholar