Dice Question Streamline Icon: https://streamlinehq.com

Transferability of GlobalRAG to Very Large Language Models

Determine whether GlobalRAG, a reinforcement learning framework for multi-hop question answering that integrates planning-aware rewards and progressive weight annealing, can be effectively transferred to very large-scale language models such as DeepSeek-R1 under reinforcement learning training.

Information Square Streamline Icon: https://streamlinehq.com

Background

GlobalRAG is proposed as a reinforcement learning framework that enhances global reasoning in multi-hop question answering through planning-aware rewards (structural and semantic consistency) and a subgoal completion reward, combined with a progressive weight annealing strategy. The method demonstrates strong performance across multiple datasets and backbones (Qwen2.5-3B and 7B, Base/Instruct) but was evaluated on relatively modest model scales.

Due to computational and cost constraints, the authors did not conduct reinforcement learning training on very large-scale LLMs (e.g., DeepSeek-R1). As a result, it remains explicitly uncertain whether the GlobalRAG approach—its reward design and training strategy—can be effectively applied to and perform well on such large-scale models.

References

First, due to computational and cost constraints, we are unable to conduct RL training on very large-scale models (e.g., DeepSeek-R1). Whether our approach can effectively transfer to such models remains an open question.

GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning (2510.20548 - Luo et al., 23 Oct 2025) in Section: Limitations