ExpSeek as rollout augmentation for Agentic Reinforcement Learning

Investigate whether incorporating ExpSeek as an enhancement technique for Agentic Reinforcement Learning rollout improves training convergence speed and sampling quality, given its observed ability to significantly increase pass@k performance in web-agent evaluations.

Background

The authors report that ExpSeek increases pass@k performance, indicating improved sampling diversity and potential benefits for training regimes that rely on multiple rollouts. In the discussion and appendices, they note strong sampling effects and hypothesize utility as a rollout augmentation strategy.

Despite these observations, the authors explicitly state that it is unknown whether embedding ExpSeek into agentic RL training pipelines would concretely enhance convergence speed or sampling quality. This motivates a targeted investigation into integrating ExpSeek into RL rollout generation and measuring its impact on RL training efficacy.

References

Since ExpSeek can also significantly improve pass@k performance, it has not yet been studied whether it can serve as an enhancement technique for Agentic Reinforcement Learning rollout to improve training convergence speed and sampling quality.

ExpSeek: Self-Triggered Experience Seeking for Web Agents  (2601.08605 - Zhang et al., 13 Jan 2026) in Limitations Section