SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control (2210.17432v2)

Published 31 Oct 2022 in cs.CL and cs.LG

Abstract: Despite the growing success of diffusion models in continuous-valued domains (e.g., images), similar efforts for discrete domains such as text have yet to match the performance of autoregressive LLMs. In this work, we present SSD-LM -- a diffusion-based LLM with two key design choices. First, SSD-LM is semi-autoregressive, iteratively generating blocks of text, allowing for flexible output length at decoding time while enabling local bidirectional context updates. Second, it is simplex-based, performing diffusion on the natural vocabulary space rather than a learned latent space, allowing us to incorporate classifier guidance and modular control using off-the-shelf classifiers without any adaptation. We evaluate SSD-LM on unconstrained text generation benchmarks, and show that it matches or outperforms strong autoregressive GPT-2 models across standard quality and diversity metrics, while vastly outperforming diffusion-based baselines. On controlled text generation, SSD-LM also outperforms competitive baselines, with an extra advantage in modularity.

PDF HTML Abstract

Overview of the Ssd-LM Model: Advancing Text Generation Through Diffusion and Semi-autoregression

The paper introduces Ssd-LM, a diffusion-based LLM that seeks to address the challenges associated with applying diffusion processes to text generation. Recognizing the limitations of current diffusion models in discrete text domains, the work introduces two key innovations: semi-autoregressive generation and simplex-based diffusion.

Ssd-LM departs from traditional autoregressive LLMs (AR-LMs) by iteratively generating blocks of text, allowing for greater flexibility in output length while retaining the benefits of autoregressive setups, such as context awareness. This approach strikes a novel balance between token-by-token autoregressive generation and non-autoregressive models, which produce entire sequences simultaneously and require predefined sequence lengths. Moreover, this model enables refinement within token blocks, circumventing one of the principal drawbacks of conventional token-level autoregressive generation where earlier tokens cannot be modified once generated.

The model performs diffusion directly on the vocabulary simplex rather than latent spaces, enabling seamless integration of classifier guidance for controlled text generation. This design choice allows Ssd-LM to leverage existing classifiers without adaptation, supporting modularity in controlled generation tasks.

Key Findings and Implications

Benchmarking Against AR-LMs: Ssd-LM demonstrates superior or comparable performance compared to GPT-2 models in unconstrained text generation tasks, notably achieving high scores across quality metrics like MAUVE and diversity metrics, including Dist-n and Zipf coefficients.
Flexibility and Modularity: The model excels in controlled text generation by integrating off-the-shelf sentiment classifiers without retraining, a feature that distinguishes it from earlier diffusion-based approaches which are constrained by the necessity to develop custom classifiers owing to different input space representations.
Controlled Generation: On sentiment-guided generation tasks, Ssd-LM achieves high target accuracy and balanced performance across relevance and fluency measures compared to previous text generation approaches that require customized architectures or control functions.

Theoretical Contributions

The paper extends the theoretical framework of diffusion models by adapting simplex-based transformations for discrete text, innovating upon existing methods typically relying on character or byte-level encoding. The semi-autoregressive arrangement not only facilitates variable-length sequence generation but also retains the benefits of both autoregressive and diffusion paradigms in producing coherent token blocks with bidirectional context.

Future Directions

The paper suggests avenues for further research, such as optimizing the sample efficiency and the computational cost associated with decoding. Additionally, exploring more dynamic block length configurations during both training and generation could lead to even more adaptable text generation models. As diffusion models continue to advance in other domains such as images and audio, the innovations presented here open up possibilities for more organically controlled, flexible LLMs.

Overall, the Ssd-LM model represents a significant evolution in applying diffusion processes to text generation, addressing previous limitations and offering practical benefits in modular and controlled text generation scenarios. Future work could leverage these advancements to tackle complex LLMing tasks across diverse applications and languages.

PDF Markdown Bookmark Chat (Pro)

References (85)

Authors (3)

Xiaochuang Han (23 papers)
Sachin Kumar (68 papers)
Yulia Tsvetkov (142 papers)

Citations (64)

View on Semantic Scholar

YouTube

Show All Videos