Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows (2403.16995v1)
Abstract: Recent works have demonstrated success in controlling sentence attributes ($e.g.$, sentiment) and structure ($e.g.$, syntactic structure) based on the diffusion LLM. A key component that drives theimpressive performance for generating high-quality samples from noise is iteratively denoise for thousands of steps. While beneficial, the complexity of starting from the noise and the learning steps has limited its implementation to many NLP real-world applications. This paper proposes Language Rectified Flow ({\ours}). Our method is based on the reformulation of the standard probabilistic flow models. Language rectified flow learns (neural) ordinary differential equation models to transport between the source distribution and the target distribution, hence providing a unified and effective solution to generative modeling and domain transfer. From the source distribution, our language rectified flow yields fast simulation and effectively decreases the inference time. Experiments on three challenging fine-grained control tasks and multiple high-quality text editing show that our method consistently outperforms its baselines. Extensive experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
- Brian DO Anderson. 1982. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Abductive commonsense reasoning. arXiv preprint arXiv:1908.05739.
- Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Neural ordinary differential equations. Advances in neural information processing systems, 31.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Style transformer: Unpaired text style transfer without disentangled latent representation. arXiv preprint arXiv:1905.05621.
- Plug and play language models: A simple approach to controlled text generation. arXiv preprint arXiv:1912.02164.
- Rahul Dey and Fathi M Salem. 2017. Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pages 1597–1600. IEEE.
- Xiaoan Ding and Kevin Gimpel. 2021. Flowprior: Learning expressive priors for latent variable sentence models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3242–3258.
- Bayesian attention modules. Advances in Neural Information Processing Systems, 33:16362–16376.
- Contextual dropout: An efficient sample-dependent dropout module. arXiv preprint arXiv:2103.04181.
- Maxout networks. In International conference on machine learning, pages 1319–1327. PMLR.
- Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pages 507–517.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851.
- Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364.
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Gedi: Generative discriminator guided sequence generation. arXiv preprint arXiv:2009.06367.
- Controlled text generation as continuous optimization with multiple constraints. Advances in Neural Information Processing Systems, 34:14542–14554.
- Optimus: Organizing sentences via pre-trained modeling of a latent space. arXiv preprint arXiv:2004.04092.
- Delete, retrieve, generate: a simple approach to sentiment and style transfer. arXiv preprint arXiv:1804.06437.
- Diffusion-lm improves controllable text generation. arXiv preprint arXiv:2205.14217.
- Chin-Yew Lin and Eduard Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, pages 150–157.
- Flow matching for generative modeling. arXiv preprint arXiv:2210.02747.
- Revision in continuous space: Unsupervised text style transfer without adversarial learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8376–8383.
- Composable text control operations in latent space with ordinary differential equations. arXiv preprint arXiv:2208.00638.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003.
- Flowgrad: Controlling the output of generative odes with gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24335–24344.
- Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. In The Twelfth International Conference on Learning Representations.
- Flowseq: Non-autoregressive conditional sequence generation with generative flow. arXiv preprint arXiv:1909.02480.
- Plug and play autoencoders for conditional text generation. arXiv preprint arXiv:2010.02983.
- Mix and match: Learning-free controllable text generation using energy language models. arXiv preprint arXiv:2203.13299.
- Sequence to better sequence: continuous revision of combinatorial structures. In International Conference on Machine Learning, pages 2536–2544. PMLR.
- The e2e dataset: New challenges for end-to-end generation. arXiv preprint arXiv:1706.09254.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026.
- Back to the future: Unsupervised backprop-based decoding for counterfactual and abductive commonsense reasoning. arXiv preprint arXiv:2010.05906.
- Cold decoding: Energy-based constrained text generation with langevin dynamics. arXiv preprint arXiv:2202.11705.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer.
- Style transfer from non-parallel text by cross-alignment. Advances in neural information processing systems, 30.
- Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- Continuous language generative flow. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4609–4622.
- Attention is all you need. Advances in neural information processing systems, 30.
- Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803.
- Parameter-efficient tuning helps language model alignment. arXiv preprint arXiv:2310.00819.
- Kevin Yang and Dan Klein. 2021. Fudge: Controlled text generation with future discriminators. arXiv preprint arXiv:2104.05218.
- Latent diffusion energy-based model for interpretable text modeling. arXiv preprint arXiv:2206.05895.
- Knowing more about questions can help: Improving calibration in question answering. arXiv preprint arXiv:2106.01494.
- Learning with different amounts of annotation: From zero to many labels. arXiv preprint arXiv:2109.04408.
- Passage-mask: A learnable regularization strategy for retriever-reader models. arXiv preprint arXiv:2211.00915.
- Allsh: Active learning guided by local sensitivity and hardness. arXiv preprint arXiv:2205.04980.
- Automl-gpt: Automatic machine learning with gpt. arXiv preprint arXiv:2305.02499.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- Shujian Zhang (28 papers)
- Lemeng Wu (29 papers)
- Chengyue Gong (30 papers)
- Xingchao Liu (28 papers)