Data scalability for long-horizon GUI agent learning

Develop scalable data generation and curation methodologies that enable long-horizon training of GUI-centered agents by producing large-scale interactive trajectories with reasoning, actions, environment states, and feedback at feasible cost.

Background

The report frames data scarcity as a core bottleneck for GUI agents, noting that unlike text or code corpora, high-quality long-horizon trajectories capturing detailed reasoning and interaction are costly to collect at scale. The authors propose a Data Flywheel to co-evolve model and data, but they identify the broader problem of achieving scalable data strategies for long-horizon GUI learning as an outstanding challenge.

This problem is explicitly grouped among the open problems highlighted in the abstract, motivating the methodological pillars introduced later: continual pre-training, supervised fine-tuning, rejection sampling, and multi-turn reinforcement learning to improve data quality and diversity iteratively.

References

While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and memory through end-to-end learning, open problems remain in data scalability, multi-turn reinforcement learning (RL), the limitations of GUI-only operation, and environment stability.

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning  (2509.02544 - Wang et al., 2 Sep 2025) in Abstract (Page 1)