Papers
Topics
Authors
Recent
2000 character limit reached

Class-Conditional self-reward mechanism for improved Text-to-Image models (2405.13473v2)

Published 22 May 2024 in cs.CV and cs.AI

Abstract: Self-rewarding have emerged recently as a powerful tool in the field of NLP, allowing LLMs to generate high-quality relevant responses by providing their own rewards during training. This innovative technique addresses the limitations of other methods that rely on human preferences. In this paper, we build upon the concept of self-rewarding models and introduce its vision equivalent for Text-to-Image generative AI models. This approach works by fine-tuning diffusion model on a self-generated self-judged dataset, making the fine-tuning more automated and with better data quality. The proposed mechanism makes use of other pre-trained models such as vocabulary based-object detection, image captioning and is conditioned by the a set of object for which the user might need to improve generated data quality. The approach has been implemented, fine-tuned and evaluated on stable diffusion and has led to a performance that has been evaluated to be at least 60\% better than existing commercial and research Text-to-image models. Additionally, the built self-rewarding mechanism allowed a fully automated generation of images, while increasing the visual quality of the generated images and also more efficient following of prompt instructions. The code used in this work is freely available on https://github.com/safouaneelg/SRT2I.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. The falcon series of open language models. arXiv:2311.16867.
  2. Direct preference optimization with an offset. arXiv:2402.10571.
  3. Yolo-world: Real-time open-vocabulary object detection. arXiv preprint arXiv:2401.17270 .
  4. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv:2210.17323.
  5. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 .
  6. Mistral 7b. arXiv:2310.06825.
  7. Multi-concept customization of text-to-image diffusion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1931–1941.
  8. Large language models understand and can be enhanced by emotional stimuli. arXiv:2307.11760.
  9. LimeWare, . Blueillow.ai. https://www.bluewillow.ai/. Accessed: 2024-03-21.
  10. Awq: Activation-aware weight quantization for llm compression and acceleration. arXiv:2306.00978.
  11. Visual instruction tuning. arXiv:2304.08485.
  12. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
  13. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, 27730–27744.
  14. Application of large language models to software engineering tasks: Opportunities, risks, and implications. IEEE Software 40, 4–8. doi:10.1109/MS.2023.3248401.
  15. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers.
  16. Learning transferable visual models from natural language supervision, in: International conference on machine learning, PMLR. pp. 8748–8763.
  17. Direct preference optimization: Your language model is secretly a reward model. arXiv:2305.18290.
  18. High-resolution image synthesis with latent diffusion models, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695.
  19. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288.
  20. A survey on llm-generated text detection: Necessity, methods, and future directions. arXiv:2310.14724.
  21. Self-rewarding language models. arXiv:2401.10020.
  22. Sdxs: Real-time one-step latent diffusion models with image conditions. arxiv .
  23. Text-to-image diffusion models in generative ai: A survey. arXiv:2303.07909.
  24. Responsible task automation: Empowering large language models as responsible task automators. arXiv:2306.01242.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Video Overview

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.