Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

D-Flow: Differentiating through Flows for Controlled Generation (2402.14017v2)

Published 21 Feb 2024 in cs.LG

Abstract: Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce D-Flow, a simple framework for controlling the generation process by differentiating through the flow, optimizing for the source (noise) point. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold, implicitly injecting the prior into the optimization process. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Cormorant: Covariant molecular neural networks, 2019.
  2. Invertible generative models for inverse problems: mitigating representation error and dataset bias, 2020.
  3. Multidiffusion: Fusing diffusion paths for controlled image generation. arXiv preprint arXiv:2302.08113, 2023.
  4. Compressed sensing using generative models, 2017.
  5. Neural ordinary differential equations. arXiv preprint arXiv:1806.07366, 2018.
  6. Chen, R. T. Q. torchdiffeq, 2018. URL https://github.com/rtqichen/torchdiffeq.
  7. Ilvr: Conditioning method for denoising diffusion probabilistic models, 2021.
  8. Improving diffusion models for inverse problems using manifold constraints, 2022.
  9. Diffusion posterior sampling for general noisy inverse problems, 2023.
  10. Chávez, J. A. Generative flows as a general purpose solution for inverse problems, 2022.
  11. Simple and controllable music generation. In NeurIPS, 2023.
  12. Torchmetrics - measuring reproducibility in pytorch. Journal of Open Source Software, 7(70):4101, 2022. doi: 10.21105/joss.04101. URL https://doi.org/10.21105/joss.04101.
  13. Diffusion models beat gans on image synthesis. arXiv preprint arXiv:2105.05233, 2021.
  14. High fidelity neural audio compression. arXiv preprint arXiv:2210.13438, 2022.
  15. Evans, L. C. An introduction to mathematical optimal control theory. Lecture Notes, University of California, Department of Mathematics, Berkeley, 3:15–40, 2005.
  16. Diffusion models as plug-and-play priors, 2023.
  17. Ffjord: Free-form continuous dynamics for scalable reversible generative models, 2018.
  18. Gans trained by a two time-scale update rule converge to a local nash equilibrium, 2018.
  19. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022a.
  20. Classifier-free diffusion guidance, 2022b.
  21. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239, 2020.
  22. Video diffusion models, 2022.
  23. Equivariant diffusion for molecule generation in 3d, 2022.
  24. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
  25. Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022.
  26. Fréchet audio distance: A metric for evaluating music enhancement algorithms. arXiv preprint arXiv:1812.08466, 2018.
  27. Landrum, G. Rdkit: Open-source cheminformatics software. 2016. URL https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4.
  28. Microsoft coco: Common objects in context, 2015.
  29. Flow matching for generative modeling, 2023.
  30. Flowgrad: Controlling the output of generative odes with gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  24335–24344, June 2023.
  31. Repaint: Inpainting using denoising diffusion probabilistic models, 2022.
  32. A variational perspective on solving inverse problems with diffusion models, 2023.
  33. Do deep generative models know what they don’t know?, 2019.
  34. Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022.
  35. Training-free linear image inversion via flows, 2023.
  36. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
  37. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1, 08 2014. doi: 10.1038/sdata.2014.22.
  38. Hierarchical text-conditional image generation with clip latents, 2022.
  39. High-resolution image synthesis with latent diffusion models, 2022.
  40. Solving linear inverse problems provably via posterior sampling with latent diffusion models, 2023.
  41. Norm-guided latent space exploration for text-to-image generation, 2023a.
  42. Generating images of rare concepts using pre-trained diffusion models, 2023b.
  43. E(n) equivariant normalizing flows, 2022.
  44. On kinetic optimal probability paths for generative models. In International Conference on Machine Learning, pp. 30883–30907. PMLR, 2023.
  45. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=9_gsMA8MRKQ.
  46. Generative modeling by estimating gradients of the data distribution. arXiv preprint arXiv:1907.05600, 2019.
  47. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  48. Equivariant flow matching with hybrid probability transport, 2023b.
  49. End-to-end diffusion latent optimization improves classifier guidance, 2023.
  50. Zero-shot image restoration using denoising diffusion null-space model, 2022.
  51. Solving inverse problems with a flow-based noise model, 2021.
  52. Geometric latent diffusion models for 3d molecule generation, 2023.
  53. Freedom: Training-free energy-guided conditional diffusion model, 2023.
  54. The unreasonable effectiveness of deep features as a perceptual metric, 2018.
  55. Masked audio generation using a single non-autoregressive transformer. 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Heli Ben-Hamu (12 papers)
  2. Omri Puny (8 papers)
  3. Itai Gat (30 papers)
  4. Brian Karrer (41 papers)
  5. Uriel Singer (20 papers)
  6. Yaron Lipman (55 papers)
Citations (17)

Summary

Differentiating through Flows: Advancing Controlled Generation with Pre-trained Models

Overview

Controlled generation in generative models is pivotal for a myriad of applications, from the design of new molecules to image and audio editing, where the model output needs to align with specific requirements or conditions. This paper introduces D-Flow, a novel framework that significantly enhances our ability to control generative models' outcomes without necessitating re-training or imposing constraints on the models. The core innovation lies in manipulating the generative process of Diffusion and Flow-Matching models through differentiation with respect to the initial noise vectors, channeling optimization through the generative flow.

Theoretical Foundation

At the heart of D-Flow is the observation that for Diffusion and Flow-Matching models trained with Gaussian probability paths, differentiating the loss function through the generation process projects the gradient onto the data manifold, incorporating an implicit bias. This insight propels a general algorithm that performs optimization based on an arbitrary cost function directly on the source noise vector, effectively steering the generated output towards desired characteristics.

The capability of D-Flow to operate on general flow models and its demonstration of implicit bias across various controlled generation tasks stand out. More so, theoretical models further elucidate the implicit regularization induced by differentiating through the flow, showcasing its utility in aligning output closely with the target data manifold.

Implementation Insights

Practical application of D-Flow involves choices around initialization, solver for the ODE problem, and optimization procedure. The utility of the torchdiffeq package, gradient checkpointing, and the utilization of the LBFGS algorithm with line search are highlighted. While the method exhibits relatively longer runtimes for generation compared to baselines, its simplicity and adaptability across different domains justify its application.

Empirical Validation

D-Flow's effectiveness is uniformly demonstrated across various domains, encompassing linear and non-linear controlled generation problems. The tasks include inverse problems on images and audio, and conditional molecule generation, across which D-Flow has shown to achieve state-of-the-art performance. This broad range of applications underscores the framework's versatility and the effectiveness of source point optimization in controlled generation tasks.

Comparative Analysis

The advantages of D-Flow over existing techniques, particularly in handling non-linear setups and its superior performance in conditional molecule generation, paint a promising picture. Through meticulous quantitative analysis against other state-of-the-art methods, D-Flow establishes its credentials, particularly through improved metrics in controlled molecule generation.

Future Directions and Limitations

While D-Flow presents a robust framework for controlled generation, the aspect of runtime poses a limitation that warrants further exploration. Future directions may involve seeking computational efficiencies or alternative strategies that leverage the implicit bias with reduced computational demands. Additionally, expanding the framework's applicability and exploring its theoretical boundaries present intriguing avenues for continued research.

Concluding Remarks

D-Flow marks a significant step forward in the field of controlled generation, providing a flexible and potent framework that leverages the merits of pre-trained generative models. Its theoretical foundation, combined with empirical validation across varied domains, showcases its potential to influence future developments in generative AI research. The journey of refining and extending D-Flow’s capabilities is set to contribute valuably to the advancement of controlled generative modeling.