Does RL post-training in MLLMs truly leverage visual information
Determine whether reinforcement learning–based post-training for Multimodal Large Language Models (such as Qwen2.5-VL) truly enables the models to learn from and utilize visual information in the training inputs, rather than primarily strengthening internal text-based reasoning patterns.
References
Although many studies have reported improved performance, it remains unclear whether RL training truly enables models to learn from visual information.
— Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
(2604.03179 - Zhang et al., 3 Apr 2026) in Abstract (page 1)