TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings (2402.19097v3)
Abstract: This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained LLM encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.
- Structured denoising diffusion models in discrete state-spaces. In Advances in Neural Information Processing Systems, volume 34, pages 17981–17993. Curran Associates, Inc.
- Improving image generation with better captions.
- Stable video diffusion: Scaling latent video diffusion models to large datasets.
- Analog bits: Generating discrete data using diffusion models with self-conditioning.
- Quora question pairs.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Fast timing-conditioned latent audio diffusion.
- Difformer: Empowering diffusion models on the embedding space for text generation.
- Mask-predict: Parallel decoding of conditional masked language models. arXiv preprint arXiv:1904.09324.
- Diffuseq: Sequence to sequence text generation with diffusion models. In The Eleventh International Conference on Learning Representations.
- Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281.
- Levenshtein transformer. Advances in Neural Information Processing Systems, 32.
- Ssd-lm: Semi-autoregressive simplex-based diffusion language model for text generation and modular control. arXiv preprint arXiv:2210.17432.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851.
- Simple diffusion: end-to-end diffusion for high resolution images. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. OpenReview.net.
- Argmax flows and multinomial diffusion: Learning categorical distributions. In Advances in Neural Information Processing Systems, volume 34, pages 12454–12465. Curran Associates, Inc.
- Shankar Kumar and Bill Byrne. 2004. Minimum bayes-risk decoding for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 169–176.
- Deterministic non-autoregressive neural sequence modeling by iterative refinement. arXiv preprint arXiv:1802.06901.
- Diffusion-lm improves controllable text generation. ArXiv, abs/2205.14217.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Text generation with diffusion language models: a pre-training approach with continuous paragraph denoise. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
- Latent diffusion for language generation. arXiv preprint arXiv:2212.09462.
- On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14297–14306.
- A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849, San Diego, California. Association for Computational Linguistics.
- Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Brussels, Belgium. Association for Computational Linguistics.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- Mauve: Measuring the gap between neural text and human text using divergence frontiers. In Advances in Neural Information Processing Systems, volume 34, pages 4816–4828. Curran Associates, Inc.
- Bang: Bridging autoregressive and non-autoregressive generation with large scale pretraining. In International Conference on Machine Learning, pages 8630–8639. PMLR.
- Language models are unsupervised multitask learners.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695.
- Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Self-conditioned embedding diffusion for text generation.
- A contrastive framework for neural text generation. In Advances in Neural Information Processing Systems.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Attention is all you need. Advances in neural information processing systems, 30.
- AR-diffusion: Auto-regressive diffusion model for text generation. In Thirty-seventh Conference on Neural Information Processing Systems.
- Dinoiser: Diffused conditional sequence learning by manipulating noises. arXiv preprint arXiv:2302.10025.
- Seqdiffuseq: Text diffusion with encoder-decoder transformers. arXiv preprint arXiv:2212.10325.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- PLANNER: Generating diversified paragraph via latent language diffusion model. In Thirty-seventh Conference on Neural Information Processing Systems.