Dice Question Streamline Icon: https://streamlinehq.com

Develop multi-dimensional reward models for RL in native image generation

Develop reward models for reinforcement learning in native image generation that effectively capture and balance multiple dimensions, including image quality, instruction following, and alignment with human preferences, to enable reliable optimization of generation policies.

Information Square Streamline Icon: https://streamlinehq.com

Background

Within the paper’s reinforcement learning section, the authors compare applying RL to the autoregressive versus diffusion components and emphasize that algorithmic choices (e.g., GRPO, Flow-GRPO) are less limiting than the design of reward signals. They categorize rewards into verifiable metrics (e.g., GenEval for object composition and OCR-based text rendering) and model-based preference metrics (e.g., PickScore, HPSv2.1, ImageReward), noting each captures only parts of what matters.

They conclude that the core unresolved issue is constructing reward models that jointly and faithfully assess image quality, adherence to textual instructions, and human preference. Solving this would better guide RL optimization and improve the reliability and alignment of native image generation models.

References

The key open problem is to develop reward models that can effectively capture and balance multiple dimensions, including image quality, instruction following, and human preference alignment.

BLIP3o-NEXT: Next Frontier of Native Image Generation (2510.15857 - Chen et al., 17 Oct 2025) in Finding box in Discussion, Section 3.5 (Image Generation with Reinforcement Learning)