Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control (2402.15194v2)

Published 23 Feb 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images or the functional properties of generated proteins. Diffusion models can be finetuned in a goal-directed way by maximizing the value of some reward function (e.g., the aesthetic quality of an image). However, these approaches may lead to reduced sample diversity, significant deviations from the training data distribution, and even poor sample quality due to the exploitation of an imperfect reward function. The last issue often occurs when the reward function is a learned model meant to approximate a ground-truth "genuine" reward, as is the case in many practical applications. These challenges, collectively termed "reward collapse," pose a substantial obstacle. To address this reward collapse, we frame the finetuning problem as entropy-regularized control against the pretrained diffusion model, i.e., directly optimizing entropy-enhanced rewards with neural SDEs. We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards, mitigating the overoptimization of imperfect reward models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Dirichlet diffusion score model for biological sequence generation. arXiv preprint arXiv:2305.10699.
  2. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
  3. Universal guidance for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  843–852.
  4. Survey of variation in human transcription factors reveals prevalent dna binding changes. Science 351(6280), 1450–1454.
  5. An optimal control perspective on diffusion-based generative modeling. arXiv preprint arXiv:2211.01364.
  6. Schr\\\backslash\” odinger bridge samplers. arXiv preprint arXiv:1912.13170.
  7. Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301.
  8. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217.
  9. Neural ordinary differential equations. Advances in neural information processing systems 31.
  10. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174.
  11. Directly fine-tuning diffusion models on differentiable rewards. arXiv preprint arXiv:2309.17400.
  12. Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems 26.
  13. Diffusion schrodinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems 34, 17695–17709.
  14. Inversion by direct iteration: An alternative to denoising diffusion for image restoration. arXiv preprint arXiv:2303.11435.
  15. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34, 8780–8794.
  16. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models. arXiv preprint arXiv:2305.16381.
  17. Scaling laws for reward model overoptimization. In International Conference on Machine Learning, pp.  10835–10866. PMLR.
  18. Generative flow networks assisted biological sequence editing. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop.
  19. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology 73(2), 123–214.
  20. Diffusion models as plug-and-play priors. Advances in Neural Information Processing Systems 35, 14715–14728.
  21. Memory-efficient backpropagation through time. Advances in neural information processing systems 29.
  22. Protein design with guided discrete diffusion. arXiv preprint arXiv:2305.20009.
  23. Controlled sequential monte carlo.
  24. Denoising diffusion probabilistic models. Advances in neural information processing systems 33, 6840–6851.
  25. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  26. Torsional diffusion for molecular conformer generation. Advances in Neural Information Processing Systems 35, 24240–24253.
  27. Kappen, H. J. (2007). An introduction to stochastic control theory, path integrals and reinforcement learning. In AIP conference proceedings, Volume 887, pp.  149–181. American Institute of Physics.
  28. Brownian motion and stochastic calculus, Volume 113. Springer Science & Business Media.
  29. Efficient and accurate gradients for neural sdes. Advances in Neural Information Processing Systems 34, 18747–18761.
  30. A theory of continuous generative flow networks. In International Conference on Machine Learning, pp.  18269–18300. PMLR.
  31. Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192.
  32. Levine, S. (2018). Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909.
  33. Scalable gradients for stochastic differential equations. In International Conference on Artificial Intelligence and Statistics, pp.  3870–3882. PMLR.
  34. Flow matching for generative modeling. ICLR 2023.
  35. I2sb: Image-to-image schrödinger bridge. arXiv preprint arXiv:2302.05872.
  36. Decoupled weight decay regularization. In International Conference on Learning Representations.
  37. Sampling can be faster than optimization. Proceedings of the National Academy of Sciences 116(42), 20881–20885.
  38. Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE conference on computer vision and pattern recognition, pp.  2408–2415. IEEE.
  39. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, 27730–27744.
  40. Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research 22(1), 2617–2680.
  41. Aligning text-to-image diffusion models with reward backpropagation. arXiv preprint arXiv:2310.03739.
  42. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.
  43. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10684–10695.
  44. Local fitness landscape of the green fluorescent protein. Nature 533(7603), 397–401.
  45. Schrödinger, E. (1931). Über die umkehrung der naturgesetze. Verlag der Akademie der Wissenschaften in Kommission bei Walter De Gruyter u ….
  46. Laion aesthetics.
  47. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  48. Diffusion schr\\\backslash\” odinger bridge matching. arXiv preprint arXiv:2303.16852.
  49. Shreve, S. E. et al. (2004). Stochastic calculus for finance II: Continuous-time models, Volume 11. Springer.
  50. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR.
  51. Aligned diffusion schr\\\backslash\” odinger bridges. arXiv preprint arXiv:2302.11419.
  52. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
  53. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
  54. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33, 3008–3021.
  55. A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research 11, 3137–3181.
  56. Relative entropy and free energy dualities: Connections to path integral and kl control. In 2012 ieee 51st ieee conference on decision and control (cdc), pp.  1466–1473. IEEE.
  57. Conditional flow matching: Simulation-free dynamic optimal transport. arXiv preprint arXiv:2302.00482.
  58. Design-bench: Benchmarks for data-driven offline model-based optimization. In International Conference on Machine Learning, pp.  21658–21676. PMLR.
  59. Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pp.  3084–3114. PMLR.
  60. Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, Volume 48. Cambridge university press.
  61. Learning fast samplers for diffusion models by differentiating through sample quality. arXiv preprint arXiv:2202.05830.
  62. De novo design of protein structure and function with rfdiffusion. Nature 620(7976), 1089–1100.
  63. Diffusion-based molecule generation with informative prior bridges. Advances in Neural Information Processing Systems 35, 36533–36545.
  64. Better aligning text-to-image models with human preference. arXiv preprint arXiv:2303.14420.
  65. Imagereward: Learning and evaluating human preferences for text-to-image generation. arXiv preprint arXiv:2304.05977.
  66. Reward-directed conditional diffusion: Provable distribution estimation and reward improvement. In Thirty-seventh Conference on Neural Information Processing Systems.
  67. Path integral sampler: a stochastic control approach for sampling. arXiv preprint arXiv:2111.15141.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Masatoshi Uehara (49 papers)
  2. Yulai Zhao (13 papers)
  3. Kevin Black (29 papers)
  4. Ehsan Hajiramezanali (27 papers)
  5. Gabriele Scalia (22 papers)
  6. Nathaniel Lee Diamant (2 papers)
  7. Alex M Tseng (9 papers)
  8. Tommaso Biancalani (39 papers)
  9. Sergey Levine (531 papers)
Citations (26)

Summary

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

The paper presents a novel strategy for fine-tuning continuous-time diffusion models, aiming to enhance their capacity to generate diverse samples with high reward values while mitigating overoptimization issues commonly referred to as "reward collapse." This approach frames the fine-tuning task as an entropy-regularized control problem, thereby optimizing diffusion models for both drift terms and initial distributions.

Summary

Diffusion models have garnered considerable recognition for their efficacy in modeling intricate data distributions, such as those found in image generation and biological sequences. While these models can adeptly capture data distributions, the primary objective often involves customizing these models for specific tasks, such as optimizing for aesthetic quality in images or enhancing bioactivity in biological sequences. Existing fine-tuning techniques typically rely on reinforcement learning (RL) or direct backpropagation through learned reward functions, which approximate the genuine reward. However, these techniques are prone to reward collapse, where models produce limited and potentially inaccurate high-reward samples due to overfitting to imperfect reward models.

To tackle these challenges, the authors present a framework that models the fine-tuning of diffusion models as entropy-regularized stochastic optimal control. This involves leveraging neural stochastic differential equations (SDEs) to optimize for both genuine reward and diversity by maintaining a close alignment with the pre-trained data distribution. The objective function is designed to maximize expected rewards while incorporating KL divergence as a penalty to prevent excessive deviation from the pre-trained model's distribution.

Theoretical Contributions

The paper's theoretical insights primarily focus on the dual goals of maintaining a reward-maximizing distribution while conserving diversity by incorporating entropy regularization. This is operationalized through two stages of optimization: (1) deriving and learning the optimal value function, and (2) resolving a stochastic control problem to determine optimal drifts and initial conditions. Notably, the authors demonstrate that the proposed method inherently preserves the pre-trained model's bridges (posterior distributions conditioned on terminal points), thus limiting sample generation to plausible regions of the data space.

Empirical Validation

Empirical analysis fostered over several domains, including image generation tasks using aesthetic scores and biological sequence generators for GFP and TFBind datasets, showcases the effectiveness of the entropy-regularized approach. Notably, ELEGANT, the method proposed in this paper, surpasses existing baselines in achieving higher rewards while maintaining diversity, demonstrating a significant reduction in reward collapse symptoms. Not only does ELEGANT achieve comparative rewards to state-of-the-art guidance or PPO-based approaches, but it substantially mitigates reward collapse by sustaining a lower KL divergence and greater sample diversity.

Implications and Future Directions

The implications of this research are multi-faceted. Practically, the method could significantly enhance the general utility of diffusion models across creative and scientific domains by allowing them to efficiently adapt to different reward functions without compromising diversity or realism. Theoretical implications include broadening the understanding of entropy-regularized control within model fine-tuning, potentially complementing RL paradigms. Future prospects might explore extending this framework to other generative models, while better elucidating the role and scope of entropy regularization in steering model alignment with complex, domain-specific reward cues.

In summary, this paper introduces an insightful framework for the entropy-regularized fine-tuning of diffusion models, combining rigorous theoretical development with compelling empirical results to address prevalent issues in generative model adaptation.

X Twitter Logo Streamline Icon: https://streamlinehq.com