Aligning Text-to-Image Models using Human Feedback (2302.12192v1)

Published 23 Feb 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Deep generative models have shown impressive results in text-to-image synthesis. However, current text-to-image models often generate images that are inadequately aligned with text prompts. We propose a fine-tuning method for aligning such models using human feedback, comprising three stages. First, we collect human feedback assessing model output alignment from a set of diverse text prompts. We then use the human-labeled image-text dataset to train a reward function that predicts human feedback. Lastly, the text-to-image model is fine-tuned by maximizing reward-weighted likelihood to improve image-text alignment. Our method generates objects with specified colors, counts and backgrounds more accurately than the pre-trained model. We also analyze several design choices and find that careful investigations on such design choices are important in balancing the alignment-fidelity tradeoffs. Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.

PDF Abstract

Aligning Text-to-Image Models using Human Feedback

This paper introduces a novel methodology aimed at enhancing text-to-image models through the incorporation of human feedback. The paper focuses on improving the alignment of generated images with their respective text prompts—a known challenge for current models. While deep generative models have achieved notable success in this area, they still often produce misaligned outputs, particularly in generating specific attributes such as color, quantity, and context. The proposed approach, inspired by reinforcement learning from human feedback (RLHF), consists of a structured three-stage process that culminates in improved text-image alignment.

Methodology Overview

The paper articulates a fine-tuning process for text-to-image models, leveraging human feedback to guide improvements:

Data Collection and Feedback: The initial stage involves generating a diverse set of images from varied text prompts using a pre-trained text-to-image model, specifically Stable Diffusion v1.5. These are then subjected to human evaluation, providing binary feedback on alignment quality.
Reward Function Construction: Using the feedback-labeled dataset, a reward function is trained to predict the likelihood of human approval for given image-text pairs. This step includes an auxiliary task of identifying the correct prompt amidst perturbations, enhancing the model's capability to generalize across unseen prompts and images.
Fine-Tuning the Model: The text-to-image model is refined by optimizing a reward-weighted likelihood function. This involves updating the model to improve its performance based on the learned reward function, balancing between alignment accuracy and image fidelity.

Empirical Results

The evaluation demonstrates significant enhancement in image-text alignment following the fine-tuning process, particularly with respect to specific, test-like categories such as color and object count. This improvement is quantified through a 47% increase in alignment adherence, albeit with a minor compromise on image fidelity. Moreover, the learned reward function outperforms traditional CLIP scores in predicting human preference, affirming its alignment with human judgment.

The paper also explores the application of rejection sampling based on the reward function as an inference-time improvement technique, achieving notable gains in alignment without additional model training.

Implications and Future Directions

The implications of this work are indeed substantial, offering a robust framework to enhance multi-modal models using direct human feedback. The paper delineates a path toward more intuitive and adaptable models capable of better integrating nuanced human assessments, with possible applications extending into more subjective domains, such as artistic or abstract generation.

Future research could delve into more complex feedback mechanisms beyond binary labels, possibly incorporating nuanced annotations or ranking systems to capture a broader spectrum of human preferences. Additionally, expanding the diversity and scale of human datasets used for fine-tuning could further ameliorate alignment quality and mitigate fidelity trade-offs.

In conclusion, this paper underscores the potential of integrating human-centric learning paradigms into artificial intelligence workflows, fostering advancements in model alignment through collaborative, iterative enhancements underpinned by human insight.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Kimin Lee (69 papers)
Hao Liu (497 papers)
Olivia Watkins (13 papers)
Yuqing Du (28 papers)
Craig Boutilier (78 papers)
Pieter Abbeel (372 papers)
Mohammad Ghavamzadeh (97 papers)
Shixiang Shane Gu (34 papers)
MoonKyung Ryu (9 papers)

Citations (206)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos