Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

Published 12 Sep 2024 in cs.LG and cs.AI | (2409.08400v1)

Abstract: Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide a rigorous treatment by formulating the task of fine-tuning diffusion models, with reward functions learned from human feedback, as an exploratory continuous-time stochastic control problem. Our key idea lies in treating the score-matching functions as controls/actions, and upon this, we develop a unified framework from a continuous-time perspective, to employ reinforcement learning (RL) algorithms in terms of improving the generation quality of diffusion models. We also develop the corresponding continuous-time RL theory for policy optimization and regularization under assumptions of stochastic different equations driven environment. Experiments on the text-to-image (T2I) generation will be reported in the accompanied paper.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (5)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research