Robustness of Video-R4 across diverse domains and larger model scales
Determine the robustness of the Video-R4 system when applied to domains beyond M4-ViteVQA and related text-centric datasets, and when scaled to model backbones larger than 7B.
References
Third, our training data are primarily derived from M4-ViteVQA and a few related text-centric datasets, and experiments are conducted on a 7B backbone, leaving open questions about robustness under more diverse domains and larger model scales.
— Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
(2511.17490 - Tang et al., 21 Nov 2025) in Limitations (Supplementary), page 1