Overcoming limitations of pure GUI-only operation
Design hybrid GUI-centered interaction environments that allow GUI agents to interoperate seamlessly with file systems, terminals, and external tools, thereby overcoming the insufficiency of pure GUI manipulation for realistic workflows such as data processing, software development, and system administration.
References
While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and memory through end-to-end learning, open problems remain in data scalability, multi-turn reinforcement learning (RL), the limitations of GUI-only operation, and environment stability.
— UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
(2509.02544 - Wang et al., 2 Sep 2025) in Abstract (Page 1)