Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text Diffusion with Reinforced Conditioning (2402.14843v1)

Published 19 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they provide a strong potential for achieving better non-autoregressive sequence generation. However, existing text diffusion models still fall short in their performance due to a challenge in handling the discreteness of language. This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling. Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling. Our extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Unsupervised Neural Machine Translation. In International Conference on Learning Representations.
  2. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34: 17981–17993.
  3. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the ninth workshop on statistical machine translation, 12–58.
  4. Report on the 11th IWSLT evaluation campaign. In Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign, 2–17.
  5. Analog bits: Generating discrete data using diffusion models with self-conditioning. In International Conference on Learning Representations.
  6. Quora question pairs.
  7. Quasar: Datasets for question answering by search and reading. arXiv preprint arXiv:1707.03904.
  8. Continuous diffusion for categorical data. arXiv preprint arXiv:2211.15089.
  9. Diverse text generation via variational encoder-decoder models with gaussian process priors. arXiv preprint arXiv:2204.01227.
  10. Difformer: Empowering Diffusion Model on Embedding Space for Text Generation. arXiv preprint arXiv:2212.09412.
  11. Mask-Predict: Parallel Decoding of Conditional Masked Language Models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 6112–6121.
  12. DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models. In International Conference on Learning Representations.
  13. Non-Autoregressive Neural Machine Translation. In International Conference on Learning Representations.
  14. Levenshtein transformer. Advances in Neural Information Processing Systems, 32.
  15. DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models. arXiv preprint arXiv:2211.15029.
  16. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303.
  17. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 6840–6851.
  18. Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in Neural Information Processing Systems, 34: 12454–12465.
  19. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations.
  20. DiffWave: A Versatile Diffusion Model for Audio Synthesis. In International Conference on Learning Representations.
  21. Minimum Bayes-Risk Decoding for Statistical Machine Translation. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 169–176. The Association for Computational Linguistics.
  22. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In 2018 Conference on Empirical Methods in Natural Language Processing, 1173–1182. Association for Computational Linguistics.
  23. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35: 4328–4343.
  24. Diffusion Models for Non-autoregressive Text Generation: A Survey. arXiv preprint arXiv:2303.06574.
  25. GENIE: Large Scale Pre-training for Text Generation with Diffusion Model. arXiv preprint arXiv:2212.11685.
  26. Composable Text Controls in Latent Space with ODEs. arXiv preprint arXiv:2208.00638.
  27. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 11020–11028.
  28. Latent Diffusion for Language Generation. arXiv preprint arXiv:2212.09462.
  29. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1): 5485–5551.
  30. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695.
  31. Step-unrolled Denoising Autoencoders for Text Generation. In International Conference on Learning Representations.
  32. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  33. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725.
  34. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, 2256–2265. PMLR.
  35. Denoising Diffusion Implicit Models. In 9th International Conference on Learning Representations.
  36. Self-conditioned embedding diffusion for text generation. arXiv preprint arXiv:2211.04236.
  37. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34: 11287–11302.
  38. Attention is all you need. Advances in neural information processing systems, 30.
  39. Diffusion Priors In Variational Autoencoders. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models.
  40. Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning, 5–32.
  41. DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises. arXiv preprint arXiv:2302.10025.
  42. SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers. arXiv preprint arXiv:2212.10325.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

HackerNews