Scaling QeRL to >70B-parameter LLMs
Determine whether the QeRL framework—comprising NVFP4 weight quantization integrated with Low-Rank Adaptation and Adaptive Quantization Noise—maintains the same level of reinforcement learning performance when applied to large language models with parameter counts exceeding 70 billion, as observed in experiments on 3B–32B models.
References
However, since RL for LLMs inherently demands significantly greater computational resources than SFT, our experiments, conducted on model sizes ranging from 3B to 32B, do not yet establish whether QeRL can maintain the same level of performance for models exceeding 70B parameters, leaving that investigation for future work.
— QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
(2510.11696 - Huang et al., 13 Oct 2025) in Appendix, Limitation Analysis