Papers
Topics
Authors
Recent
2000 character limit reached

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond

Published 10 Mar 2024 in math.OC and cs.LG | (2403.06279v2)

Abstract: This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402.15194, 2024). The idea is to use stochastic control for sample generation, where the entropy regularizer is introduced to mitigate reward collapse. We also show how the analysis can be extended to fine-tuning involving a general $f$-divergence regularizer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. B. D. O. Anderson. Reverse-time diffusion equation models. Stochastic Process. Appl., 12(3):313–326, 1982.
  2. Training diffusion models with reinforcement learning. In ICLR, 2024.
  3. S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 2004.
  4. N. Chen and P. Glasserman. Malliavin Greeks without Malliavin calculus. Stochastic Process. Appl., 117(11):1689–1723, 2007.
  5. Neural ordinary differential equations. In Neurips, volume 31, page 6572–6583, 2018.
  6. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In ICLR, 2023.
  7. Diffusion Schrödinger bridge with applications to score-based generative modeling. In Neurips, volume 34, pages 17695–17709, 2021.
  8. P. Dhariwal and A. Nichol. Diffusion models beat GANs on image synthesis. In Neurips, volume 34, pages 8780–8794, 2021.
  9. Asymptotic evaluation of certain Markov process expectations for large time. IV. Comm. Pure Appl. Math., 36(2):183–212, 1983.
  10. Y. Fan and K. Lee. Optimizing DDPM sampling with shortcut fine-tuning. 2023. arXiv:2301.13362.
  11. DPOK: Reinforcement learning for fine-tuning text-to-image diffusion models. In Neurips, 2023.
  12. Applications of Malliavin calculus to Monte Carlo methods in finance. Finance Stoch., 3(4):391–412, 1999.
  13. E. Gobet and R. Munos. Sensitivity analysis using Itô-Malliavin calculus and martingales, and application to stochastic optimal control. SIAM J. Control Optim., 43(5):1676–1713, 2005.
  14. U. G. Haussmann and E. Pardoux. Time reversal of diffusions. Ann. Probab., 14(4):1188–1205, 1986.
  15. Denoising diffusion probabilistic models. In Neurips, volume 33, pages 6840–6851, 2020.
  16. A. Hyvärinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6:695–709, 2005.
  17. I. Karatzas and S. E. Shreve. Brownian motion and stochastic calculus, volume 113 of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1991.
  18. Efficient and accurate gradients for neural SDEs. In Neurips, volume 34, pages 18747–18761, 2021.
  19. Diffwave: A versatile diffusion model for audio synthesis. In ICLR, 2021.
  20. N. V. Krylov. Nonlinear elliptic and parabolic equations of the second order, volume 7 of Mathematics and its Applications (Soviet Series). D. Reidel Publishing Co., Dordrecht, 1987.
  21. Stable bias: Analyzing societal representations in diffusion models. In Neurips, volume 36, 2023.
  22. OpenAI. Sora: Creating video from text. 2024. Available at https://openai.com/sora.
  23. Y. Polyanskiy and Y. Wu. Information Theory: From Coding to Learning. 2023. Available at http://www.stat.yale.edu/~yw562/teaching/itbook-export.pdf.
  24. Direct preference optimization: Your language model is secretly a reward model. In Neurips, volume 36, 2023.
  25. Hierarchical text-conditional image generation with clip latents. 2022. arXiv:2204.06125.
  26. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  27. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, volume 32, pages 2256–2265, 2015.
  28. Sliced score matching: A scalable approach to density and score estimation. In UAI, volume 35, pages 574–584, 2020.
  29. Score-based generative modeling through stochastic differential equations. In ICLR, 2021.
  30. Reward collapse in aligning large language models. 2023. arXiv:2305.17608.
  31. Multidimensional diffusion processes, volume 233 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1979.
  32. Exploratory HJB equations and their convergence. SIAM J. Control Optim., 60(6):3191–3216, 2022.
  33. W. Tang and H. Zhao. Contractive diffusion probabilistic models. 2024. arXiv:2401.13115.
  34. W. Tang and H. Zhao. Score-based diffusion models via stochastic differential equations–a technical tutorial. 2024. arXiv:2402.07487.
  35. Fine-tuning of continuous-time diffusion models as entropy-regularized control. 2024. arXiv:2402.15194.
  36. P. Vincent. A connection between score matching and denoising autoencoders. Neural Comput., 23(7):1661–1674, 2011.
  37. Diffusion model alignment using direct preference optimization. 2023. arXiv:2311.12908.
  38. Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints. In ICLR, 2024.
  39. J. Yong and X. Y. Zhou. Stochastic controls, volume 43 of Applications of Mathematics (New York). Springer-Verlag, New York, 1999. Hamiltonian systems and HJB equations.
  40. Score as action: tuning diffusion models by continuous reinforcement learning. 2024+. In preparation.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.