Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 85 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 37 tok/s
GPT-5 High 37 tok/s Pro
GPT-4o 100 tok/s
GPT OSS 120B 473 tok/s Pro
Kimi K2 240 tok/s Pro
2000 character limit reached

OpenHands Agent Framework

Updated 1 July 2025
  • OpenHands is an open-source framework for building, evaluating, and deploying generalist and specialist AI agents that interact with digital environments.
  • The framework provides secure, sandboxed execution environments, modular components, and interfaces for agents to perform tasks like coding, command-line use, and web browsing.
  • OpenHands supports multi-agent collaboration, human interaction, and rigorous benchmarking across various tasks, accelerating research and deployment of advanced AI agents.

OpenHands Agent Framework is an open, community-driven platform for building, evaluating, and deploying generalist and specialist AI agents that interact with digital environments in a human-like manner. Emphasizing robust abstraction, secure execution, and modularity, OpenHands (previously known as OpenDevin) enables agents to write code, operate command lines, navigate web environments, collaborate in multi-agent settings, and be evaluated under standardized, reproducible benchmarks. Released under the MIT license, it has engaged over 188 contributors from academia and industry, reflecting the acceleration of AI agent research in realistic software engineering and web-based tasks.

1. System Abstraction and Core Architecture

OpenHands defines its agent–environment interface via an event-stream abstraction capturing actions and observations, forming a perception–action loop analogous to that employed by human software developers. Each agent operates by reading a history of environment events and producing the next atomic action, which is then executed in the current session.

Key platform components include:

  • Sandboxed Linux Operating System: Each session is instantiated within a Docker container, furnishing full OS capabilities insulated from the host for secure code and command execution.
  • Jupyter Kernel Environment: Integrated Python execution is supported within each container, providing stateful code interaction and debugging workflows.
  • Browser Agent API: Using the BrowserGym interface, agents can conduct browser automation via a declarative domain-specific set of primitives, supporting tasks such as DOM manipulation and navigation.
  • Multi-Agent Delegation Interface: OpenHands supports hierarchical agent structures. Agents can delegate subtasks to other agents using built-in delegation primitives and a standardized vocabulary for agent roles and capabilities.

The agent control flow is structured around a step function:

1
2
3
4
5
6
class Agent:
    def step(self, state):
        # state: history of actions, observations, costs, and metadata
        # Process logic to return next action
        action = ...
        return action

Actions can include:

  • Shell commands
  • Python/Jupyter code execution
  • Browser navigations and interactions
  • Calls to other micro-agents or human intervention

2. Secure Sandboxing and Execution Control

Isolation of potentially unsafe or destructive agent actions is central to OpenHands:

  • Containerization: Agents are restricted to their own Docker-based environment, which is torn down post-session, ensuring filesystem integrity and prohibiting cross-agent interference.
  • SSH Mediated Interface: Agents access the container via SSH, preserving the semantics of human-initiated remote development and maximizing compatibility with conventional toolchains.
  • Resource Access Policies: Only project and task-specific files are exposed to the agent via workspace mounting. All code, test, and command execution remains fully auditable and controlled within the sandbox context.
  • Deterministic Testing Framework: Integration test harnesses mock LLM completions for reproducible, affordable agent runs during development, eliminating stochasticity introduced by external LLM responses.

This firm isolation guarantees agent actions are safely contained—a prerequisite for both academic reproducibility and industrial deployment in continuous integration systems.

3. Agent Implementation, Specialization, and Extensibility

OpenHands accommodates both generalist and specialist agent designs. The platform’s registry, AgentHub, maintains a suite of agent templates including:

  • CodeActAgent: Generalist code-writing and debugging agent.
  • BrowserAgent: Specialist in web navigation and web-based task execution.
  • Micro-agents: Lightweight agents instantiated from natural language or minimal interface demonstration, automatable via system message and I/O specification alone.

The platform’s extensibility arises from:

  • AgentSkills Library: A modular set of Python utilities (file I/O, search, parsing, etc.) accessible to all agents.
  • Prompt Templates and Demonstrations: Structured prompts and demonstrations allow micro-agents to be created for new domains or benchmarks without additional code.
  • High-Level Delegation and Composition: Agents can be configured as collaborative teams (e.g., delegation of a web search subtask from a coding agent to a browser agent) using built-in coordination primitives.

These patterns support scalable agent development and encourage community-contributed behaviors and benchmarks.

4. Evaluation Suite and Benchmarks

OpenHands includes an evaluation harness with support for over 15 benchmarks, spanning:

  • Software engineering: SWE-bench, HumanEvalFix, ML-Bench, BIRD, Gorilla APIBench, ToolQA, BioCoder.
  • Web agent tasks: WebArena, MiniWoB++.
  • Complex reasoning and multi-turn assistance: GAIA, GPQA, AgentBench, ProofWriter, Entity Deduction Arena, MINT.

Success metrics are defined by each task:

  • Code tasks: percentage of issues fixed to specification or test-based objective pass rates.
  • Web and navigation tasks: success rates on web automation or interaction goals.
  • QA/reasoning: accuracy, completeness, or multi-step solution correctness.

The platform logs LLM inference costs and execution times, enabling comparative efficiency studies.

Recent evaluation results:

  • SWE-bench Lite: 26% success rate
  • HumanEvalFix: 79%
  • WebArena: 15%
  • GPQA (graduate-level QA): 53%

These results are competitive with or exceed specialist agent baselines, reflecting the system’s versatility and robust architecture.

5. Multi-Agent Coordination and Human-in-the-Loop Interaction

OpenHands supports both multi-agent orchestration and interactive human-guided workflows:

  • Agent Delegation: Using AgentDelegateAction, agents can automatically hand off subtasks to the most qualified collaborator, supporting division of labor and skill specialization in composite solutions.
  • Dynamic Multi-Agent Compositions: Coordination protocol vocabulary simplifies creation and management of agent teams, enabling experimentation with cooperation and emergent behaviors.
  • Human-in-the-Loop UI: The platform exposes graphical interfaces through which users can visualize agent behavior, intervene manually, or co-work on tasks in real-time. This enables agile research into mixed-initiative development and pair-programming scenarios.

These features facilitate both research on agent collaboration and practical deployments incorporating human oversight.

6. Community and Open Science Contributions

OpenHands has established a broad contributor base, reflecting its open-source orientation:

  • Contribution Scope: Over 2.1K contributions from 188+ contributors, including agent modules, evaluation benchmarks, codebase maintenance, and documentation.
  • Collaboration: Integrates agents, tools, and benchmarks authored by both academic and industrial consortia.
  • Permissive Licensing: MIT license enables unrestricted academic and commercial use.
  • Community Infrastructure: Includes a public GitHub, evaluation leaderboard, and communication channels for collaborative research and support.

This openness has resulted in rapid dissemination, reproducibility, and accelerated research progress.

7. Representative Applications and Future Prospects

OpenHands has been deployed across diverse applications:

  • Autonomous issue fixing on real-world codebases (SWE-bench).
  • Cross-domain QA tasks requiring programmatic tool use (GPQA).
  • Real-time web automation and navigation (WebArena).
  • Human–AI collaborative workflows, including interactive debugging and agent-guided code review.

Ongoing research areas, suggested by current limitations and active development:

  • Enhancement of high-level planning, meta-reasoning, and agentic search strategies for open-ended or ambiguous tasks.
  • Advanced delegation and team orchestration for scaling to complex, multi-stage tasks.
  • Broader support for interactive environments, e.g., GUI-based control and heterogeneous tool integration.

A plausible implication is that the OpenHands framework is positioned as a reference implementation for generalist agent research, and its rapidly expanding ecosystem is likely to seed further studies in human–AI collaboration, secure agent deployment, and reproducible benchmarking in software and web domains.