D-Flow: Differentiating through Flows for Controlled Generation (2402.14017v2)
Abstract: Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce D-Flow, a simple framework for controlling the generation process by differentiating through the flow, optimizing for the source (noise) point. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold, implicitly injecting the prior into the optimization process. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.
- Cormorant: Covariant molecular neural networks, 2019.
- Invertible generative models for inverse problems: mitigating representation error and dataset bias, 2020.
- Multidiffusion: Fusing diffusion paths for controlled image generation. arXiv preprint arXiv:2302.08113, 2023.
- Compressed sensing using generative models, 2017.
- Neural ordinary differential equations. arXiv preprint arXiv:1806.07366, 2018.
- Chen, R. T. Q. torchdiffeq, 2018. URL https://github.com/rtqichen/torchdiffeq.
- Ilvr: Conditioning method for denoising diffusion probabilistic models, 2021.
- Improving diffusion models for inverse problems using manifold constraints, 2022.
- Diffusion posterior sampling for general noisy inverse problems, 2023.
- Chávez, J. A. Generative flows as a general purpose solution for inverse problems, 2022.
- Simple and controllable music generation. In NeurIPS, 2023.
- Torchmetrics - measuring reproducibility in pytorch. Journal of Open Source Software, 7(70):4101, 2022. doi: 10.21105/joss.04101. URL https://doi.org/10.21105/joss.04101.
- Diffusion models beat gans on image synthesis. arXiv preprint arXiv:2105.05233, 2021.
- High fidelity neural audio compression. arXiv preprint arXiv:2210.13438, 2022.
- Evans, L. C. An introduction to mathematical optimal control theory. Lecture Notes, University of California, Department of Mathematics, Berkeley, 3:15–40, 2005.
- Diffusion models as plug-and-play priors, 2023.
- Ffjord: Free-form continuous dynamics for scalable reversible generative models, 2018.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium, 2018.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022a.
- Classifier-free diffusion guidance, 2022b.
- Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239, 2020.
- Video diffusion models, 2022.
- Equivariant diffusion for molecule generation in 3d, 2022.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
- Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022.
- Fréchet audio distance: A metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466, 2018.
- Landrum, G. Rdkit: Open-source cheminformatics software. 2016. URL https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4.
- Microsoft coco: Common objects in context, 2015.
- Flow matching for generative modeling, 2023.
- Flowgrad: Controlling the output of generative odes with gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24335–24344, June 2023.
- Repaint: Inpainting using denoising diffusion probabilistic models, 2022.
- A variational perspective on solving inverse problems with diffusion models, 2023.
- Do deep generative models know what they don’t know?, 2019.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022.
- Training-free linear image inversion via flows, 2023.
- Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
- Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1, 08 2014. doi: 10.1038/sdata.2014.22.
- Hierarchical text-conditional image generation with clip latents, 2022.
- High-resolution image synthesis with latent diffusion models, 2022.
- Solving linear inverse problems provably via posterior sampling with latent diffusion models, 2023.
- Norm-guided latent space exploration for text-to-image generation, 2023a.
- Generating images of rare concepts using pre-trained diffusion models, 2023b.
- E(n) equivariant normalizing flows, 2022.
- On kinetic optimal probability paths for generative models. In International Conference on Machine Learning, pp. 30883–30907. PMLR, 2023.
- Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=9_gsMA8MRKQ.
- Generative modeling by estimating gradients of the data distribution. arXiv preprint arXiv:1907.05600, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Equivariant flow matching with hybrid probability transport, 2023b.
- End-to-end diffusion latent optimization improves classifier guidance, 2023.
- Zero-shot image restoration using denoising diffusion null-space model, 2022.
- Solving inverse problems with a flow-based noise model, 2021.
- Geometric latent diffusion models for 3d molecule generation, 2023.
- Freedom: Training-free energy-guided conditional diffusion model, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric, 2018.
- Masked audio generation using a single non-autoregressive transformer. 2024.
- Heli Ben-Hamu (12 papers)
- Omri Puny (8 papers)
- Itai Gat (30 papers)
- Brian Karrer (41 papers)
- Uriel Singer (20 papers)
- Yaron Lipman (55 papers)