Develop multi-dimensional reward models for RL in native image generation
Develop reward models for reinforcement learning in native image generation that effectively capture and balance multiple dimensions, including image quality, instruction following, and alignment with human preferences, to enable reliable optimization of generation policies.
References
The key open problem is to develop reward models that can effectively capture and balance multiple dimensions, including image quality, instruction following, and human preference alignment.
— BLIP3o-NEXT: Next Frontier of Native Image Generation
(2510.15857 - Chen et al., 17 Oct 2025) in Finding box in Discussion, Section 3.5 (Image Generation with Reinforcement Learning)