Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control (2402.15194v2)
Abstract: Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images or the functional properties of generated proteins. Diffusion models can be finetuned in a goal-directed way by maximizing the value of some reward function (e.g., the aesthetic quality of an image). However, these approaches may lead to reduced sample diversity, significant deviations from the training data distribution, and even poor sample quality due to the exploitation of an imperfect reward function. The last issue often occurs when the reward function is a learned model meant to approximate a ground-truth "genuine" reward, as is the case in many practical applications. These challenges, collectively termed "reward collapse," pose a substantial obstacle. To address this reward collapse, we frame the finetuning problem as entropy-regularized control against the pretrained diffusion model, i.e., directly optimizing entropy-enhanced rewards with neural SDEs. We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards, mitigating the overoptimization of imperfect reward models.
- Dirichlet diffusion score model for biological sequence generation. arXiv preprint arXiv:2305.10699.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
- Universal guidance for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 843–852.
- Survey of variation in human transcription factors reveals prevalent dna binding changes. Science 351(6280), 1450–1454.
- An optimal control perspective on diffusion-based generative modeling. arXiv preprint arXiv:2211.01364.
- Schr\\\backslash\” odinger bridge samplers. arXiv preprint arXiv:1912.13170.
- Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301.
- Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217.
- Neural ordinary differential equations. Advances in neural information processing systems 31.
- Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174.
- Directly fine-tuning diffusion models on differentiable rewards. arXiv preprint arXiv:2309.17400.
- Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems 26.
- Diffusion schrodinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems 34, 17695–17709.
- Inversion by direct iteration: An alternative to denoising diffusion for image restoration. arXiv preprint arXiv:2303.11435.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34, 8780–8794.
- Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models. arXiv preprint arXiv:2305.16381.
- Scaling laws for reward model overoptimization. In International Conference on Machine Learning, pp. 10835–10866. PMLR.
- Generative flow networks assisted biological sequence editing. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop.
- Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology 73(2), 123–214.
- Diffusion models as plug-and-play priors. Advances in Neural Information Processing Systems 35, 14715–14728.
- Memory-efficient backpropagation through time. Advances in neural information processing systems 29.
- Protein design with guided discrete diffusion. arXiv preprint arXiv:2305.20009.
- Controlled sequential monte carlo.
- Denoising diffusion probabilistic models. Advances in neural information processing systems 33, 6840–6851.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Torsional diffusion for molecular conformer generation. Advances in Neural Information Processing Systems 35, 24240–24253.
- Kappen, H. J. (2007). An introduction to stochastic control theory, path integrals and reinforcement learning. In AIP conference proceedings, Volume 887, pp. 149–181. American Institute of Physics.
- Brownian motion and stochastic calculus, Volume 113. Springer Science & Business Media.
- Efficient and accurate gradients for neural sdes. Advances in Neural Information Processing Systems 34, 18747–18761.
- A theory of continuous generative flow networks. In International Conference on Machine Learning, pp. 18269–18300. PMLR.
- Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192.
- Levine, S. (2018). Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909.
- Scalable gradients for stochastic differential equations. In International Conference on Artificial Intelligence and Statistics, pp. 3870–3882. PMLR.
- Flow matching for generative modeling. ICLR 2023.
- I2sb: Image-to-image schrödinger bridge. arXiv preprint arXiv:2302.05872.
- Decoupled weight decay regularization. In International Conference on Learning Representations.
- Sampling can be faster than optimization. Proceedings of the National Academy of Sciences 116(42), 20881–20885.
- Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE conference on computer vision and pattern recognition, pp. 2408–2415. IEEE.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, 27730–27744.
- Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research 22(1), 2617–2680.
- Aligning text-to-image diffusion models with reward backpropagation. arXiv preprint arXiv:2310.03739.
- Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695.
- Local fitness landscape of the green fluorescent protein. Nature 533(7603), 397–401.
- Schrödinger, E. (1931). Über die umkehrung der naturgesetze. Verlag der Akademie der Wissenschaften in Kommission bei Walter De Gruyter u ….
- Laion aesthetics.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Diffusion schr\\\backslash\” odinger bridge matching. arXiv preprint arXiv:2303.16852.
- Shreve, S. E. et al. (2004). Stochastic calculus for finance II: Continuous-time models, Volume 11. Springer.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR.
- Aligned diffusion schr\\\backslash\” odinger bridges. arXiv preprint arXiv:2302.11419.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33, 3008–3021.
- A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research 11, 3137–3181.
- Relative entropy and free energy dualities: Connections to path integral and kl control. In 2012 ieee 51st ieee conference on decision and control (cdc), pp. 1466–1473. IEEE.
- Conditional flow matching: Simulation-free dynamic optimal transport. arXiv preprint arXiv:2302.00482.
- Design-bench: Benchmarks for data-driven offline model-based optimization. In International Conference on Machine Learning, pp. 21658–21676. PMLR.
- Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pp. 3084–3114. PMLR.
- Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, Volume 48. Cambridge university press.
- Learning fast samplers for diffusion models by differentiating through sample quality. arXiv preprint arXiv:2202.05830.
- De novo design of protein structure and function with rfdiffusion. Nature 620(7976), 1089–1100.
- Diffusion-based molecule generation with informative prior bridges. Advances in Neural Information Processing Systems 35, 36533–36545.
- Better aligning text-to-image models with human preference. arXiv preprint arXiv:2303.14420.
- Imagereward: Learning and evaluating human preferences for text-to-image generation. arXiv preprint arXiv:2304.05977.
- Reward-directed conditional diffusion: Provable distribution estimation and reward improvement. In Thirty-seventh Conference on Neural Information Processing Systems.
- Path integral sampler: a stochastic control approach for sampling. arXiv preprint arXiv:2111.15141.
- Masatoshi Uehara (49 papers)
- Yulai Zhao (13 papers)
- Kevin Black (29 papers)
- Ehsan Hajiramezanali (27 papers)
- Gabriele Scalia (22 papers)
- Nathaniel Lee Diamant (2 papers)
- Alex M Tseng (9 papers)
- Tommaso Biancalani (39 papers)
- Sergey Levine (531 papers)