Retaining online RL benefits for web agents without live web interaction

Determine how to retain the benefits of online reinforcement learning for web agents while dramatically reducing reliance on direct interaction with live web environments, thereby mitigating the inefficiency, cost, and risk associated with real-world web interactions.

Background

Online reinforcement learning has shown strong promise for improving web agents’ robustness and long-horizon decision-making, but large-scale training requires extensive interaction with the live internet, which is inefficient, expensive, and risky (e.g., unintended purchases or irreversible actions).

A key challenge is to preserve the advantages of online reinforcement learning while curbing dependence on direct web interaction. The paper positions learned web world models as a potential surrogate for live environments, motivating the explicit question of how to achieve this balance.

References

As a result, a central open question emerges: how can we retain the benefits of online reinforcement learning for web agents while dramatically reducing reliance on direct interaction with the live web?

DynaWeb: Model-Based Reinforcement Learning of Web Agents  (2601.22149 - Ding et al., 29 Jan 2026) in Section 1 (Introduction)