SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control (2210.17432v2)
Abstract: Despite the growing success of diffusion models in continuous-valued domains (e.g., images), similar efforts for discrete domains such as text have yet to match the performance of autoregressive LLMs. In this work, we present SSD-LM -- a diffusion-based LLM with two key design choices. First, SSD-LM is semi-autoregressive, iteratively generating blocks of text, allowing for flexible output length at decoding time while enabling local bidirectional context updates. Second, it is simplex-based, performing diffusion on the natural vocabulary space rather than a learned latent space, allowing us to incorporate classifier guidance and modular control using off-the-shelf classifiers without any adaptation. We evaluate SSD-LM on unconstrained text generation benchmarks, and show that it matches or outperforms strong autoregressive GPT-2 models across standard quality and diversity metrics, while vastly outperforming diffusion-based baselines. On controlled text generation, SSD-LM also outperforms competitive baselines, with an extra advantage in modularity.
- Logistic-normal distributions:Some properties and uses. Biometrika, 67(2):261–272.
- Structured denoising diffusion models in discrete state-spaces. In Proc. NeurIPS.
- Eugene Bagdasaryan and Vitaly Shmatikov. 2022. Spinning language models: Risks of propaganda-as-a-service and countermeasures. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1532–1532. IEEE Computer Society.
- József Bakosi and J. Raymond Ristorcelli. 2013. A stochastic diffusion process for the dirichlet distribution. arXiv: Mathematical Physics.
- Tweeteval: Unified benchmark and comparative evaluation for tweet classification. In Findings of EMNLP.
- On the dangers of stochastic parrots: Can language models be too big? In Proc. FAccT.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow.
- Language models are few-shot learners. ArXiv, abs/2005.14165.
- Extracting training data from large language models. In USENIX Security Symposium, pages 2633–2650.
- Cocon: A self-supervised approach for controlled text generation. In Proc. ICLR.
- Analog bits: Generating discrete data using diffusion models with self-conditioning. ArXiv, abs/2208.04202.
- Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
- Plug and play language models: A simple approach to controlled text generation. In Proc. ICLR.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. NAACL-HLT.
- Prafulla Dhariwal and Alex Nichol. 2021. Diffusion models beat gans on image synthesis. ArXiv, abs/2105.05233.
- Hierarchical neural story generation. In Proc. ACL.
- RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
- Mask-predict: Parallel decoding of conditional masked language models. In Proc. EMNLP.
- Aaron Gokaslan and Vanya Cohen. 2019. Openwebtext corpus.
- Diffuseq: Sequence to sequence text generation with diffusion models. ArXiv, abs/2210.08933.
- Non-autoregressive neural machine translation. In Proc. ICLR.
- Levenshtein transformer. In Proc. NeurIPS.
- Don’t stop pretraining: Adapt language models to domains and tasks. In Proc. ACL.
- Denoising diffusion probabilistic models. In Proc. NeurIPS.
- Jonathan Ho and Tim Salimans. 2021. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
- Video diffusion models. ArXiv, abs/2204.03458.
- Towards decoding as continuous optimisation in neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 146–156, Copenhagen, Denmark. Association for Computational Linguistics.
- The curious case of neural text degeneration. In Proc. ICLR.
- Argmax flows and multinomial diffusion: Learning categorical distributions. In Proc. NeurIPS.
- Fast decoding in sequence models using discrete latent variables. In Proc. ICML, pages 2390–2399. PMLR.
- Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
- Diffwave: A versatile diffusion model for audio synthesis. In Proc. ICLR.
- Gedi: Generative discriminator guided sequence generation. In Proc. Findings of EMNLP.
- Language generation models can cause harm: So what can we do about it? an actionable survey. arXiv preprint arXiv:2210.07700.
- Controlled text generation as continuous optimization with multiple constraints. In Proc. NeurIPS.
- Constrained sampling from language models via langevin dynamics in embedding spaces. In Proc. EMNLP.
- Deterministic non-autoregressive neural sequence modeling by iterative refinement. In Proc. EMNLP.
- Iterative refinement in the continuous space for non-autoregressive neural machine translation. In Proc. EMNLP.
- A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
- Diffusion-lm improves controllable text generation. ArXiv, abs/2205.14217.
- Dexperts: Decoding-time controlled text generation with experts and anti-experts. In Proc. ACL.
- Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In Proc. ICLR.
- NeuroLogic decoding: (un)supervised neural text generation with predicate logic constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4288–4299, Online. Association for Computational Linguistics.
- Flowseq: Non-autoregressive conditional sequence generation with generative flow. In Proc. EMNLP.
- On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online. Association for Computational Linguistics.
- Locally typical sampling. ArXiv, abs/2202.00666.
- On distillation of guided diffusion models. ArXiv, abs/2210.03142.
- Mix and match: Learning-free controllable text generationusing energy language models. In Proc. ACL.
- Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126.
- A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849.
- A systematic characterization of sampling algorithms for open-ended language generation. In Proc. AACL.
- Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In Proc. ICML.
- The e2e dataset: New challenges for end-to-end generation. arXiv preprint arXiv:1706.09254.
- Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proc. NAACL.
- Threat scenarios and best practices to detect neural fake news. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1233–1249.
- A plug-and-play method for controlled text generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3973–3997, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Mauve: Measuring the gap between neural text and human text using divergence frontiers. In Proc. NeurIPS.
- Bang: Bridging autoregressive and non-autoregressive generation with large scale pretraining. In Proc. ICML, pages 8630–8639. PMLR.
- Cold decoding: Energy-based constrained text generation with langevin dynamics. ArXiv, abs/2202.11705.
- Language models are unsupervised multitask learners.
- Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR.
- High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685.
- Neural machine translation of rare words with subword units. In Proc. ACL.
- Societal biases in language generation: Progress and challenges. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4275–4293, Online. Association for Computational Linguistics.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. ICML.
- Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.
- Denoising diffusion implicit models. In Proc. ICLR.
- Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. In Proc. NeurIPS.
- Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. ArXiv, abs/2206.04119.
- Attention is all you need. In Proc. NeurIPS.
- Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, Hong Kong, China. Association for Computational Linguistics.
- Imitation attacks and defenses for black-box machine translation systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5531–5546, Online. Association for Computational Linguistics.
- Semi-autoregressive neural machine translation. In Proc. EMNLP.
- Non-autoregressive machine translation with auxiliary regularization. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5377–5384.
- Ethical and social risks of harm from language models.
- Taxonomy of risks posed by language models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, page 214–229, New York, NY, USA. Association for Computing Machinery.
- Neural text generation with unlikelihood training. In Proc. ICLR.
- Protein structure generation via folding diffusion. ArXiv, abs/2209.15611.
- Kevin Yang and Dan Klein. 2021. Fudge: Controlled text generation with future discriminators. In Proc. NAACL.
- Defending against neural fake news. Advances in neural information processing systems, 32.
- Trading off diversity and quality in natural language generation. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pages 25–33.
- Opt: Open pre-trained transformer language models. ArXiv, abs/2205.01068.
- Character-level convolutional networks for text classification. In Proc. NeurIPS.
- 3d shape generation and completion through point-voxel diffusion. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5806–5815. IEEE.
- Xiaochuang Han (23 papers)
- Sachin Kumar (68 papers)
- Yulia Tsvetkov (142 papers)