Scalable and stable multi-turn reinforcement learning for GUI agents
Establish reinforcement learning techniques that remain stable and effective for GUI-centered agents in long-horizon interactive environments, addressing sparse or delayed rewards, optimization instability, and credit assignment across extended action sequences to enable consistent scaling beyond short-horizon demonstrations.
References
While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and memory through end-to-end learning, open problems remain in data scalability, multi-turn reinforcement learning (RL), the limitations of GUI-only operation, and environment stability.
— UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
(2509.02544 - Wang et al., 2 Sep 2025) in Abstract (Page 1)