Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Curriculum Direct Preference Optimization for Diffusion and Consistency Models (2405.13637v2)

Published 22 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Direct Preference Optimization (DPO) has been proposed as an effective and efficient alternative to reinforcement learning from human feedback (RLHF). In this paper, we propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our method is divided into two training stages. First, a ranking of the examples generated for each prompt is obtained by employing a reward model. Then, increasingly difficult pairs of examples are sampled and provided to a text-to-image generative (diffusion or consistency) model. Generated samples that are far apart in the ranking are considered to form easy pairs, while those that are close in the ranking form hard pairs. In other words, we use the rank difference between samples as a measure of difficulty. The sampled pairs are split into batches according to their difficulty levels, which are gradually used to train the generative model. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference. Our code is available at https://anonymous.4open.science/r/Curriculum-DPO-EE14.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  2. Blended diffusion for text-driven editing of natural images. In Proceedings of CVPR, pages 18208–18218, 2022.
  3. Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073, 2022.
  4. Label-Efficient Semantic Segmentation with Diffusion Models. In Proceedings of ICLR, 2022.
  5. Curriculum Learning. In Proceedings of ICML, pages 41–48, 2009.
  6. Training Diffusion Models with Reinforcement Learning. In Proceedings of ICLR, 2024.
  7. Denoising Likelihood Score Matching for Conditional Score-Based Data Generation. In Proceedings of ICLR, 2022.
  8. Deep Reinforcement Learning from Human Preferences. In Proceedings of NeurIPS, volume 30, 2017.
  9. Score-based diffusion models for accelerated MRI. Medical Image Analysis, 80:102479, 2022.
  10. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:10850–10869, 2023.
  11. Score-based generative neural networks for large-scale optimal transport. In Proceedings of NeurIPS, pages 12955–12965, 2021.
  12. Diffusion models beat GANs on image synthesis. In Proceedings of NeurIPS, volume 34, pages 8780–8794, 2021.
  13. On-line adaptative curriculum learning for gans. In Proceedings of AAAI, volume 33, pages 3470–3477, 2019.
  14. DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models. In Proceedings of NeurIPS, volume 36, pages 79858–79885, 2023.
  15. Balanced self-paced learning for generative adversarial clustering network. In Proceedings of CVPR, pages 4391–4400, 2019.
  16. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of CVPR, pages 10696–10706, 2022.
  17. Teaching large language models to reason with reinforcement learning. arXiv preprint arXiv:2403.04642, 2024.
  18. Classifier-Free Diffusion Guidance. In Proceedings of NeurIPS Workshop on DGMs and Applications, 2021.
  19. Denoising diffusion probabilistic models. In Proceedings of NeurIPS, volume 33, pages 6840–6851, 2020.
  20. Cascaded Diffusion Models for High Fidelity Image Generation. Journal of Machine Learning Research, 23(47):1–33, 2022.
  21. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of ICLR, 2022.
  22. Self-paced curriculum learning. In Proceedings of AAAI, volume 29, 2015.
  23. Medical-based Deep Curriculum Learning for Improved Fracture Classification. In Proceedings of MICCAI, pages 694–702, 2019.
  24. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of ICLR, 2018.
  25. Analyzing and improving the image quality of StyleGAN. In Proceedings of CVPR, pages 8110–8119, 2020.
  26. Denoising task difficulty-based curriculum for training diffusion models. arXiv preprint arXiv:2403.10348, 2024.
  27. Variational diffusion models. In Proceedings of NeurIPS, volume 34, pages 21696–21707, 2021.
  28. Self-paced learning for latent variable models. In Proceedings of NeurIPS, volume 23, 2010.
  29. Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192, 2023.
  30. Curriculum Learning for Natural Answer Generation. In Proceedings of IJCAI, pages 4223–4229, 2018.
  31. Visual instruction tuning. In Proceedings of NeurIPS, volume 36, 2024.
  32. Pseudo Numerical Methods for Diffusion Models on Manifolds. In Proceedings of ICLR, 2022.
  33. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. In Proceedings of CVPR, pages 11461–11471, 2022.
  34. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023a.
  35. LCM-LoRA: A Universal Stable-Diffusion Acceleration Module. arXiv preprint arXiv:2311.05556, 2023b.
  36. CL-MAE: Curriculum-Learned Masked Autoencoders. In Proceedings of WACV, pages 2492–2502, 2024.
  37. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In Proceedings of ICLR, 2021.
  38. Curriculum dropout. In Proceedings of ICCV, pages 3544–3552, 2017.
  39. Improved denoising diffusion probabilistic models. In Proceedings of ICML, pages 8162–8171, 2021.
  40. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In Proceedings of ICML, pages 16784–16804. PMLR, 2022.
  41. Training language models to follow instructions with human feedback. In Proceedings of NeurIPS, volume 35, pages 27730–27744, 2022.
  42. Bleu: a method for automatic evaluation of machine translation. In Proceedings of ACL, pages 311–318, 2002.
  43. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning. arXiv preprint arXiv:1910.00177, 2019.
  44. Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of ICML, page 745–750, 2007.
  45. Learning transferable visual models from natural language supervision. In Proceedings of ICML, volume 139, pages 8748–8763, 2021.
  46. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. In Proceedings of NeurIPS, volume 36, pages 53728–53741, 2023.
  47. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.
  48. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of EMNLP, pages 3982–3992, 2019.
  49. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of CVPR, pages 10684–10695, 2022a.
  50. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models. arXiv preprint arXiv:2207.13038, 2022b.
  51. Palette: Image-to-image diffusion models. In Proceedings of SIGGRAPH, pages 1–10, 2022a.
  52. Photorealistic text-to-image diffusion models with deep language understanding. In Proceedings of NeurIPS, pages 36479–36494, 2022b.
  53. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022c.
  54. Progressive distillation for fast sampling of diffusion models. In Proceedings of ICLR, 2022.
  55. Christoph Schuhmann. Laion-aesthetics. LAION-Aesthetics, 2022. URL https://laion.ai/blog/laion-aesthetics/.
  56. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  57. Green AI. Communications of the ACM, 63(12):54––63, 2020.
  58. Curriculum by Smoothing. In Proceedings of NeurIPS, pages 21653–21664, 2020.
  59. Deep unsupervised learning using non-equilibrium thermodynamics. In Proceedings of ICML, pages 2256–2265, 2015.
  60. Denoising Diffusion Implicit Models. In Proceedings of ICLR, 2021a.
  61. Improved Techniques for Training Consistency Models. In Proceedings of ICLR, 2024.
  62. Generative modeling by estimating gradients of the data distribution. In Proceedings of NeurIPS, volume 32, pages 11918–11930, 2019.
  63. Score-Based Generative Modeling through Stochastic Differential Equations. In Proceedings of ICLR, 2021b.
  64. Consistency models. In Proceedings of ICML, pages 32211–32252, 2023.
  65. Image difficulty curriculum for generative adversarial networks (CuGAN). In Proceedings of WACV, pages 3463–3472, 2020.
  66. Curriculum self-paced learning for cross-domain object detection. Computer Vision and Image Understanding, 204:103–166, 2021.
  67. Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565, 2022.
  68. Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of ACL, pages 3645–3650, 2019.
  69. Score-based generative modeling in latent space. In Proceedings of NeurIPS, volume 34, pages 11287–11302, 2021.
  70. Diffusion Model Alignment Using Direct Preference Optimization. arXiv preprint arXiv:2311.12908, 2023.
  71. EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones. In Proceedings of ICCV, pages 5852–5864, 2023.
  72. Learn like a Pathologist: Curriculum Learning by Annotator Agreement for Histopathology Image Classification. In Proceedings of WACV, pages 2472–2482, 2021.
  73. Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis. arXiv preprint arXiv:2306.09341, 2023.
  74. Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In Proceedings of CVPR, pages 650–656, 2022.
  75. Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon. arXiv preprint arXiv:2404.07946, 2024.
  76. Adding Conditional Control to Text-to-Image Diffusion Models. In Proceedings of ICCV, pages 3836–3847, 2023.
  77. Fine-Tuning Language Models from Human Preferences. arXiv preprint arXiv:1909.08593, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Florinel-Alin Croitoru (10 papers)
  2. Vlad Hondru (8 papers)
  3. Radu Tudor Ionescu (103 papers)
  4. Nicu Sebe (270 papers)
  5. Mubarak Shah (208 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com