Efficacy of PGPO on Large-Scale LVLMs
Ascertain whether Perception-Grounded Policy Optimization (PGPO) remains effective when applied to Large Vision-Language Models that exceed 7B parameters by verifying its efficacy on substantially larger model scales.
References
Furthermore, limited by our computational resources, our experiments validate PGPO on models up to the 7B parameter scale. Although these results indicate a positive scaling trend, its efficacy on large-scale models remains to be verified.
— Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models
(2604.01840 - Ye et al., 2 Apr 2026) in Limitations