Persistence of sparsity and RAM efficacy at 70B+ scale

Determine whether the sparsity of reinforcement-learning-induced task vectors and the resulting effectiveness of Reinforced Agent Merging (RAM) persist for large language models with at least 70 billion parameters. Specifically, ascertain if the sparsity hypothesis observed on 3B–7B parameter models continues to hold and whether RAM maintains its performance advantages when merging multiple RL-trained agents at massive scale.

Background

The paper proposes Reinforced Agent Merging (RAM), a distribution-aware method tailored to merge agents trained via reinforcement learning by disentangling shared and unique parameter update regions and selectively rescaling unique regions to counteract signal dilution. The method’s motivation relies on the empirical observation that on-policy RL induces sparse, heterogeneous task vectors, which RAM leverages during merging.

All experiments in the paper are conducted on 3B and 7B parameter models across Qwen and Llama architectures. The authors explicitly note that it remains unknown whether the sparsity hypothesis and the efficacy of RAM persist when scaling to massive models (70B+), which may exhibit different parameter update distributions and merging dynamics.

References

Finally, our evaluation is primarily conducted on 3B and 7B parameter models; verifying whether the sparsity hypothesis and RAM's efficacy persist in massive-scale models (70B+) remains an open question for future research.

Behavior Knowledge Merge in Reinforced Agentic Models  (2601.13572 - Yuan et al., 20 Jan 2026) in Limitations (Section*), end of main paper