Applicability of VPPO to subjective or creative multimodal tasks
Determine whether the Visually-Perceptive Policy Optimization (VPPO) algorithm is applicable to subjective or creative multimodal tasks, including detailed image captioning and visual storytelling, where the notion of a single visually-grounded reasoning chain is less clearly defined.
References
Its applicability to more subjective or creative tasks, such as detailed image captioning or visual storytelling, where the notion of a single ``visually-grounded'' reasoning chain is less clear, remains an open question.
                — Spotlight on Token Perception for Multimodal Reinforcement Learning
                
                (2510.09285 - Huang et al., 10 Oct 2025) in Appendix, Section "Limitations" (Scope of Generalization paragraph)