DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space (2404.06760v1)
Abstract: In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation. Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem, but the diversity is limited. Recently, diffusion models have made breakthroughs in computer vision, and some attempts have been made in natural language processing. In this paper, we propose DiffusionDialog, a novel approach to enhance the diversity of dialogue generation with the help of diffusion model. In our approach, we introduce continuous latent variables into the diffusion model. The problem of using latent variables in the dialog task is how to build both an effective prior of the latent space and an inferring process to obtain the proper latent given the context. By combining the encoder and latent-based diffusion model, we encode the response's latent representation in a continuous space as the prior, instead of fixed Gaussian distribution or simply discrete ones. We then infer the latent by denoising step by step with the diffusion model. The experimental results show that our model greatly enhances the diversity of dialog responses while maintaining coherence. Furthermore, in further analysis, we find that our diffusion model achieves high inference efficiency, which is the main challenge of applying diffusion models in natural language processing.
- Audio visual scene-aware dialog. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7558–7567.
- Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993.
- Plato: Pre-trained dialogue generation model with discrete latent variable. arXiv preprint arXiv:1910.07931.
- Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv preprint arXiv:2208.04202.
- Dialogved: A pre-trained latent variable encoder-decoder model for dialog response generation. arXiv preprint arXiv:2204.13031.
- Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794.
- Implicit deep latent variable models for text generation. arXiv preprint arXiv:1908.11527.
- Large-scale transfer learning for natural language generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6053–6058.
- Diffuseq: Sequence to sequence text generation with diffusion models. arXiv preprint arXiv:2210.08933.
- Dialogwae: Multimodal response generation with conditional wasserstein auto-encoder. arXiv preprint arXiv:1805.12352.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851.
- Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- Optimus: Organizing sentences via pre-trained modeling of a latent space. arXiv preprint arXiv:2004.04092.
- A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055.
- Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343.
- Dailydialog: A manually labelled multi-turn dialogue dataset. arXiv preprint arXiv:1710.03957.
- Composable text controls in latent space with odes. arXiv preprint arXiv:2208.00638.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Latent diffusion for language generation. arXiv preprint arXiv:2212.09462.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695.
- Cmu sinbad’s submission for the dstc7 avsd challenge. In DSTC7 at AAAI2019 workshop, volume 6.
- Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
- A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- Generating relevant and coherent dialogue responses using self-separated conditional variational autoencoders. arXiv preprint arXiv:2106.03410.
- Attention is all you need. Advances in neural information processing systems, 30.
- Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869.
- Seqdiffuseq: Text diffusion with encoder-decoder transformers. arXiv preprint arXiv:2212.10325.
- Personalizing dialogue agents: I have a dog, do you have pets too? arXiv preprint arXiv:1801.07243.
- Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. arXiv preprint arXiv:1703.10960.
- The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics, 46(1):53–93.
- Jianxiang Xiang (2 papers)
- Zhenhua Liu (47 papers)
- Haodong Liu (11 papers)
- Yin Bai (2 papers)
- Jia Cheng (20 papers)
- Wenliang Chen (33 papers)