Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods (2502.01384v2)

Published 3 Feb 2025 in stat.ML, cs.AI, cs.CL, and cs.LG

Abstract: Discrete diffusion models have recently gained significant attention due to their ability to process complex discrete structures for LLMing. However, fine-tuning these models with policy gradient methods, as is commonly done in Reinforcement Learning from Human Feedback (RLHF), remains a challenging task. We propose an efficient, broadly applicable, and theoretically justified policy gradient algorithm, called Score Entropy Policy Optimization (SEPO), for fine-tuning discrete diffusion models over non-differentiable rewards. Our numerical experiments across several discrete generative tasks demonstrate the scalability and efficiency of our method. Our code is available at https://github.com/ozekri/SEPO.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Oussama Zekri (4 papers)
  2. Nicolas Boullé (32 papers)

Summary

We haven't generated a summary for this paper yet.