Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings (2402.19097v3)

Published 29 Feb 2024 in cs.CL

Abstract: This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained LLM encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models. Our code is available at https://github.com/M0RJIQUE/tencdm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Structured denoising diffusion models in discrete state-spaces. In Advances in Neural Information Processing Systems, volume 34, pages 17981–17993. Curran Associates, Inc.
  2. Improving image generation with better captions.
  3. Stable video diffusion: Scaling latent video diffusion models to large datasets.
  4. Analog bits: Generating discrete data using diffusion models with self-conditioning.
  5. Quora question pairs.
  6. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  7. Fast timing-conditioned latent audio diffusion.
  8. Difformer: Empowering diffusion models on the embedding space for text generation.
  9. Mask-predict: Parallel decoding of conditional masked language models. arXiv preprint arXiv:1904.09324.
  10. Diffuseq: Sequence to sequence text generation with diffusion models. In The Eleventh International Conference on Learning Representations.
  11. Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281.
  12. Levenshtein transformer. Advances in Neural Information Processing Systems, 32.
  13. Ssd-lm: Semi-autoregressive simplex-based diffusion language model for text generation and modular control. arXiv preprint arXiv:2210.17432.
  14. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851.
  15. Simple diffusion: end-to-end diffusion for high resolution images. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. OpenReview.net.
  16. Argmax flows and multinomial diffusion: Learning categorical distributions. In Advances in Neural Information Processing Systems, volume 34, pages 12454–12465. Curran Associates, Inc.
  17. Shankar Kumar and Bill Byrne. 2004. Minimum bayes-risk decoding for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 169–176.
  18. Deterministic non-autoregressive neural sequence modeling by iterative refinement. arXiv preprint arXiv:1802.06901.
  19. Diffusion-lm improves controllable text generation. ArXiv, abs/2205.14217.
  20. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  21. Text generation with diffusion language models: a pre-training approach with continuous paragraph denoise. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
  22. Latent diffusion for language generation. arXiv preprint arXiv:2212.09462.
  23. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14297–14306.
  24. A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849, San Diego, California. Association for Computational Linguistics.
  25. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Brussels, Belgium. Association for Computational Linguistics.
  26. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  27. Mauve: Measuring the gap between neural text and human text using divergence frontiers. In Advances in Neural Information Processing Systems, volume 34, pages 4816–4828. Curran Associates, Inc.
  28. Bang: Bridging autoregressive and non-autoregressive generation with large scale pretraining. In International Conference on Machine Learning, pages 8630–8639. PMLR.
  29. Language models are unsupervised multitask learners.
  30. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  31. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695.
  32. Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  33. Self-conditioned embedding diffusion for text generation.
  34. A contrastive framework for neural text generation. In Advances in Neural Information Processing Systems.
  35. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  36. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  37. Attention is all you need. Advances in neural information processing systems, 30.
  38. AR-diffusion: Auto-regressive diffusion model for text generation. In Thirty-seventh Conference on Neural Information Processing Systems.
  39. Dinoiser: Diffused conditional sequence learning by manipulating noises. arXiv preprint arXiv:2302.10025.
  40. Seqdiffuseq: Text diffusion with encoder-decoder transformers. arXiv preprint arXiv:2212.10325.
  41. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  42. PLANNER: Generating diversified paragraph via latent language diffusion model. In Thirty-seventh Conference on Neural Information Processing Systems.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets