Predictiveness of performance on free alternatives for commercial counterparts
Determine to what extent computer-use agent performance on sandboxable, free alternatives included in CUA-World predicts performance on the corresponding proprietary, licensed commercial software used professionally within the same software categories.
References
While we specifically select the closest sandboxable alternative for software that cannot be freely sandboxed (e.g., due to licensing), a large fraction of professionally used software remains excluded, and the degree to which performance on free alternatives predicts performance on their commercial counterparts is an open question.
— Gym-Anything: Turn any Software into an Agent Environment
(2604.06126 - Aggarwal et al., 7 Apr 2026) in Limitations