Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise (2212.11685v2)

Published 22 Dec 2022 in cs.CL and cs.LG

Abstract: In this paper, we introduce a novel dIffusion LLM pre-training framework for text generation, which we call GENIE. GENIE is a large-scale pretrained diffusion LLM that consists of an encoder and a diffusion-based decoder, which can generate text by gradually transforming a random noise sequence into a coherent text sequence. To pre-train GENIE on a large-scale language corpus, we design a new continuous paragraph denoise objective, which encourages the diffusion-decoder to reconstruct a clean text paragraph from a corrupted version, while preserving the semantic and syntactic coherence. We evaluate GENIE on four downstream text generation benchmarks, namely XSum, CNN/DailyMail, Gigaword, and CommonGen. Our experimental results show that GENIE achieves comparable performance with the state-of-the-art autoregressive models on these benchmarks, and generates more diverse text samples. The code and models of GENIE are available at https://github.com/microsoft/ProphetNet/tree/master/GENIE.

PDF Abstract

Text Generation with Diffusion LLMs: A Pre-training Approach with Continuous Paragraph Denoise

The paper introduces GENIE, a diffusion LLM framework designed to enhance text generation through a novel pre-training method called Continuous Paragraph Denoise (CPD). In contrast to traditional autoregressive models that sequentially predict tokens, GENIE adopts a diffusion-based approach whereby text is generated by iteratively refining random noise into coherent language sequences. Here, key elements, results, and implications of this research are discussed.

Methodology

GENIE utilizes a sequence-to-sequence framework composed of a bidirectional encoder and a diffusion-based decoder. The encoder processes an input text into hidden representations, which guide the decoder as it gradually denoises Gaussian noise into readable text. This paradigm diverges from conventional autoregressive models by processing sequences non-autoregressively, allowing for inherently diverse outputs due to the stochastic nature of the denoising process.

The CPD pre-training method is integral to GENIE's innovation. It involves predicting noise added to paragraphs rather than individual tokens, allowing GENIE to understand and retain paragraph-level coherence more effectively. This pre-training task leverages large-scale unlabeled text corpora without relying on manually annotated data, enabling the model to learn robust language patterns and structures.

Experimental Findings

GENIE's performance was evaluated on four widely recognized text generation benchmarks: XSum, CNN/DailyMail, Gigaword, and CommonGen. Experimental results reveal that GENIE matches or exceeds state-of-the-art autoregressive models across these tasks, showcasing notable improvements in diversity metrics. For example, GENIE achieved an OVERALL ROUGE improvement of 33.2 on XSum, surpassing other competitive NAR and semi-NAR models. Furthermore, it shows promise in maintaining high-quality generation even with increased diversification.

The paper also discusses the efficacy of pre-training, noting significant advancements as training progresses. The CPD approach substantially enhances performance, indicating its suitability for capturing intricate language features that aid in coherent, diverse output generation.

Implications and Future Directions

The research establishes diffusion models as a viable alternative to autoregressive text generation paradigms, challenging the assumption that such methods are unsuitable for complex language tasks due to slower convergence rates. GENIE's success suggests that diffusion-based frameworks can efficiently leverage large-scale data to produce rich and varied text, expanding the potential for text generation applications that benefit from diverse outputs, such as creative writing and scenario simulation.

Future research could explore optimizing the balance between generation diversity and coherence, potentially via hybrid models that integrate strengths from both autoregressive and diffusion methodologies. Additionally, the impact of varying diffusion time steps further emphasizes the need for strategies that optimize generation quality without sacrificing computational efficiency.

In conclusion, this paper advances the landscape of text generation by positioning diffusion LLMs as a competitive tool, particularly for applications demanding diverse text outputs. The introduction of CPD as a pre-training method signifies a critical step towards realizing the full potential of diffusion frameworks in natural language processing.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Zhenghao Lin (14 papers)
Yeyun Gong (78 papers)
Yelong Shen (83 papers)
Tong Wu (228 papers)
Zhihao Fan (28 papers)
Chen Lin (75 papers)
Nan Duan (172 papers)
Weizhu Chen (128 papers)

Citations (47)

View on Semantic Scholar

Related Papers

Find Related Papers