Upper bound of OpenMMReasoner performance under further scaling
Determine the upper bound of achievable multimodal reasoning performance when further scaling the OpenMMReasoner training recipe that unifies supervised fine-tuning (SFT) and reinforcement learning (RL). Specifically, ascertain how far the performance of models initialized from Qwen2.5‑VL‑Instruct and trained with the OpenMMReasoner SFT (874k samples) and RL (74k samples) pipelines can be pushed as data volume, answer-trace diversity, and RL training scale increase.
Sponsor
References
Additionally, although we explore scaling strategies in both SFT and RL stages, we have not yet identified the upper bound of model performance under further scaling, leaving open the question of how far the current recipe can be pushed.
— OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
(2511.16334 - Zhang et al., 20 Nov 2025) in Limitation and Future Work