DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models (2310.05793v2)
Abstract: Diffusion models have gained prominence in generating high-quality sequences of text. Nevertheless, current approaches predominantly represent discrete text within a continuous diffusion space, which incurs substantial computational overhead during training and results in slower sampling speeds. In this paper, we introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space, thereby enhancing its capacity to recover conditional signals. During the sampling phase, we employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Comprehensive experimental evaluations reveal that our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster, rendering it significantly closer to practical application. \footnote{The code is released at \url{https://github.com/Shark-NLP/DiffuSeq}
- Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems.
- A cheaper and better diffusion language model with soft-masked noise. ArXiv, abs/2304.04746.
- Analog bits: Generating discrete data using diffusion models with self-conditioning. ArXiv, abs/2208.04202.
- Difformer: Empowering diffusion model on embedding space for text generation. ArXiv, abs/2212.09412.
- DiffuSeq: Sequence to sequence text generation with diffusion models. In International Conference on Learning Representations, ICLR.
- Efficient diffusion training via min-snr weighting strategy. ArXiv, abs/2303.09556.
- Diffusionbert: Improving generative masked language models with diffusion models. ArXiv, abs/2211.15029.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems.
- Argmax flows and multinomial diffusion: Learning categorical distributions. Advances in Neural Information Processing Systems.
- Unified discrete diffusion for simultaneous vision-language generation. ArXiv, abs/2211.14842.
- Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 388–395.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- Diffusion-lm improves controllable text generation. ArXiv, abs/2205.14217.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out.
- Text generation with diffusion language models: A pre-training approach with continuous paragraph denoise. volume abs/2212.11685.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In Conference on Neural Information Processing Systems, NeurIPS.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. ArXiv, abs/2211.01095.
- Tess: Text-to-text self-conditioned simplex diffusion. ArXiv, abs/2305.08379.
- Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, ICML.
- Scaling neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 1–9. Association for Computational Linguistics.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL.
- Denoising diffusion implicit models. In International Conference on Learning Representations, ICLR.
- Can diffusion model achieve better performance in text generation? bridging the gap between training and inference! ArXiv, abs/2305.04465.
- Seqdiffuseq: Text diffusion with encoder-decoder transformers. ArXiv, abs/2212.10325.
- Diffusum: Generation enhanced extractive summarization with diffusion. ArXiv, abs/2305.01735.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, ICLR.
- A reparameterized discrete diffusion model for text generation. ArXiv, abs/2302.05737.
- Diffusion-nat: Self-prompting discrete diffusion for non-autoregressive text generation. ArXiv, abs/2305.04044.
- Texygen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 1097–1100.