Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 67 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 128 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

AgentNet: Multi-OS Desktop Use Dataset

Updated 18 August 2025
  • AgentNet is a multi-OS desktop use dataset comprising 22,625 trajectories across 100+ applications and 200+ websites, capturing diverse real-world tasks.
  • It features a robust annotation infrastructure and a scalable pipeline that transforms raw interactions into concise state-action pairs with multi-level chain-of-thought reasoning.
  • The dataset underpins benchmark evaluations, enabling state-of-the-art computer-use agents to generalize effectively across various operating systems and digital environments.

AgentNet is the first large-scale, multi-operating-system desktop computer-use dataset, designed to advance the development and evaluation of general-purpose computer-use agents (CUAs). AgentNet provides high-fidelity, human-annotated demonstrations of diverse real-world computer tasks, along with a robust annotation infrastructure and a scalable transformation pipeline that integrates multi-level Chain-of-Thought reasoning. As an open-source resource, AgentNet underpins the training, benchmarking, and analysis of state-of-the-art vision-language agents across a broad spectrum of digital environments, surpassing all previously open datasets in scale, modality coverage, and operating system diversity (Wang et al., 12 Aug 2025).

1. Composition and Multi-Platform Scope

AgentNet comprises 22,625 trajectories of human demonstrating the execution of computer-use tasks across more than 100 desktop applications and over 200 distinct websites. Crucially, the dataset spans three major operating systems—Windows, macOS, and Ubuntu—capturing a rich variety of application modalities, system UI conventions, and interaction paradigms. This breadth ensures that agents trained on AgentNet are exposed to heterogeneous GUI layouts, input mechanisms, and application-specific workflows, enabling cross-domain generalization and robustness.

OS Platforms Applications Web Sites Task Trajectories
Windows 100+ 200+ 22,625
macOS
Ubuntu

The data’s coverage includes productivity tools, browsers, file managers, code editors, communication platforms, system utilities, and a broad array of websites—not limited to simple or synthetic environments.

2. Annotation Infrastructure and Data Collection

Data was collected via the AgentNet Tool, a user-facing application operating natively on each of the supported platforms. It records:

  • Screen-capture videos for visual context.
  • Low-level machine interaction traces such as precise mouse and keyboard events, utilizing frameworks like DuckTrack.
  • Accessibility trees (Axtree), capturing structured metadata for on-screen UI elements.

The AgentNet Tool is designed to operate with minimal user disruption, simultaneously enabling real-world task capture and post-hoc annotation. Annotators can review and optionally edit early captures, and the system allows for human or LLM-driven correction of strictness in correctness requirements. The annotation workflow implements multi-level privacy protection, combining anonymization, human oversight, and GPT-based checks to minimize leakage of any personally identifiable information.

3. Data Transformation and Reflective Chain-of-Thought Reasoning

AgentNet transitions raw demonstration streams into a structured task representation, suitable for model training:

  • Each trajectory is decomposed into a sequence of compact (si,ai)(s_i, a_i) pairs, where sis_i is a keyframe (screenshot before the action), and aia_i is a compressed, semantically meaningful abstraction of the human action.
  • Low-level event traces are algorithmically compressed: sequences of fine-grained mouse/key events are merged into semantically coherent action primitives (e.g., mouse movements and clicks → "Click Submit"; consecutive keystrokes → "Type 'hello'").
  • State-action alignment is precise: for each aia_i, sis_i is selected by backtracking to a frame that strictly precedes the action, thereby eliminating future information leakage.

A key innovation is the integration of reflective, long Chain-of-Thought (CoT) reasoning in the form of a three-level reasoning trace:

  • Level 3 (L3): Contextual observations derived from the screenshot.
  • Level 2 (L2): Reflective reasoning over state transitions, preceding actions, and possible errors.
  • Level 1 (L1): Succinct final action decision.

The automated CoT synthesis pipeline composes these layers using "generator", "reflector", and "summarizer" modules, demonstrably improving learning as the dataset scales.

4. Benchmarks and Performance Metrics

The OpenCUA-32B agent model, trained on AgentNet, was evaluated on the OSWorld-Verified benchmark (online) and the AgentNetBench (offline proxy):

  • Success Rate: 34.8% (100-step budget) on OSWorld-Verified, setting a new SOTA among open-source CUAs and surpassing the proprietary OpenAI CUA (GPT-4o).
  • Effect of Test-time Computation: Higher Pass@nn budgets (multiple parallel candidate rollouts) further boost success rates, demonstrating effective utilization of data scale and reasoning at inference.
  • Generalization: Trained agents generalize across daily use, professional, and system tasks, as well as across operating system boundaries.

Empirical ablation confirms that reflective CoT reasoning and data scale are both critical factors for high performance and robustness to long-horizon, error-prone workflows.

5. Data Accessibility and Open Source Ecosystem

AgentNet, along with the complete OpenCUA infrastructure, is open-sourced. Released components include:

  • The full AgentNet dataset.
  • The AgentNet Tool for multi-platform scalable data collection.
  • The processing pipeline and AgentNetBench evaluation suite.
  • Pretrained models and all supporting code.

This facilitates transparent, reproducible research and allows the community to extend, benchmark, and analyze general-purpose CUAs—the first time such a resource has matched commercial counterparts in breadth and depth.

6. Technical Detailing and Notation

The core dataset formation process is notated as:

si,ai,wheresikeyframe (pre-action),aicompress(raw_actions)\langle s_i, a_i \rangle, \quad \text{where} \quad s_i \leftarrow \text{keyframe (pre-action)}, \quad a_i \leftarrow \text{compress(raw\_actions)}

The pipeline further augments each pair with L3 \rightarrow L2 \rightarrow L1 Chain-of-Thought context.

Element Format Purpose
sis_i High-resolution frame Pre-action context; input to VLM
aia_i Reduced action label Semantically precise intent
Reasoning trace L3, L2, L1 text Multi-level reflection to support VLM generation

Collection utilities integrate tools such as DuckTrack for event recording, OBS Studio for frame capture, OpenAdapt for streamlined data handling, and Axtree introspection for UI elements.

7. Significance and Future Directions

AgentNet provides a scalable, high-fidelity foundation for training and evaluation of generalist computer agents, opening the paper of agentic reasoning, robustness, and capability in real-world use cases at a level of detail and coverage that previously only proprietary efforts possessed. The combination of diverse, realistic interactions, multi-modal annotation, and reflective reasoning positions it as an essential resource for benchmarking, safety research, and the extension of agentic models to new domains and applications. As the dataset continues to grow and additional annotations (e.g., error states, user corrections, and complex workflows) are integrated, AgentNet is expected to underpin ongoing advances in open, safe, and general-use agentic intelligence (Wang et al., 12 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to AgentNet Dataset.