Scalability of QeRL beyond 70B-parameter models
Determine whether QeRL—an NVFP4-quantized, LoRA-based reinforcement learning framework with adaptive quantization noise—can maintain the same level of performance when applied to large language models exceeding 70 billion parameters as it demonstrates on 3B–32B models for reasoning-focused reinforcement learning tasks.
References
However, since RL for LLMs inherently demands significantly greater computational resources than SFT, our experiments, conducted on model sizes ranging from 3B to 32B, do not yet establish whether QeRL can maintain the same level of performance for models exceeding 70B parameters, leaving that investigation for future work.
— QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
(2510.11696 - Huang et al., 13 Oct 2025) in Appendix, Section “Limitation Analysis”