Papers
Topics
Authors
Recent
Search
2000 character limit reached

OpenHands Framework

Updated 27 May 2026
  • OpenHands Framework is a modular, open platform for developing and evaluating AI-driven software agents using secure sandboxing and composable tool APIs.
  • The framework employs a layered architecture with agent abstraction, event stream logging, and sandboxed execution to support coordinated digital task automation.
  • Benchmark results and empirical studies demonstrate its energy efficiency, reproducibility, and robust security for both research and production deployments.

OpenHands is a family of open, extensible frameworks designed for AI-driven software agents and, separately, for pose-based sign language recognition; however, usage in research since 2023 primarily refers to a modular end-to-end platform for developing, orchestrating, and evaluating agents that perform software engineering and general digital tasks by interacting programmatically with code, command-line environments, and the web. The platform is developed and maintained as a large-scale, community-driven initiative, embraced across academia and industry for its generalist agent architecture, reproducible sandboxing, composability, and benchmark-driven evaluation (Wang et al., 2024, Soni et al., 3 Jun 2025, Wang et al., 5 Nov 2025, Tripathy et al., 10 Dec 2025).

1. System Architecture and Core Abstractions

At the highest level, OpenHands formalizes the agentic workflow using three primary architectural layers:

  • Agent Abstraction & Hub: An agent is a Python class implementing two methods: reset(self) for initialization and step(self, state: State) → Action for observing its event-history and emitting primitive actions. The AgentHub registry enables plug-and-play use of agents such as CodeActAgent, BrowsingAgent, and GPTSwarm. Micro-agents, specializing for targeted operations (e.g., CommitWriterAgent), can be rapidly constructed by prompt/schema tuning atop generalist agents.
  • Event Stream (State): A chronological log of (action, observation) pairs and metadata (e.g., LLM cost, delegation tags) forms the State object. This enables persistent multi-turn reasoning, compositional tool use, and delegation via consistent input to each agent's step function.
  • Agent Runtime: Each agent-issued Action is executed in a strict sandbox and yields an Observation: (1) bash commands run in isolated Linux Docker containers via SSH; (2) Python code is executed within an IPython kernel; (3) web actions are mediated by a Chromium instance orchestrated through Playwright and a BrowserGym DSL. All per-session state is ephemeral, guaranteeing environment rollback, CPU/memory/network isolation, and reproducible execution logs (Wang et al., 2024).

2. Agent Interface, Tool API, and Extensibility

OpenHands enforces a unified agent interface, greatly simplifying agent definition and deployment. Agents process their event-history (state) and select among core actions: CmdRunAction (bash), IPythonRunCellAction (Python), or BrowseInteractiveAction (browser). Agents may also generate generic messages or finalize execution with AgentFinishAction.

The framework's Tool API exposes a standard set of operations:

  • Code editing: File manipulation by line-oriented insert/delete/replace primitives.
  • Code execution: Python REPL and shell commands in sandboxed contexts.
  • Web browsing: DOM navigation, element interaction, screenshot retrieval via a programmable browser interface.
  • Web search: Querying a Search-API to obtain up-to-date factual data—recently extended to support multimodal file access and markdown conversions in the OpenHands-Versa variant (Soni et al., 3 Jun 2025).
  • Multimodal file viewing: Transforming local files (e.g., PDFs, Office docs) to markdown for direct agent consumption.

This tool-centric architecture, combined with event-stream history and modular planning modules (e.g., Plan-Injector for summarization/decomposition), enables both single-agent and multi-agent deployments. OpenHands supports multi-agent coordination either graphically (GPTSwarm) or via AgentDelegateAction, where sub-tasks are delegated to specialized or auxiliary agents and results are returned to the parent for synthesis (Wang et al., 2024, Soni et al., 3 Jun 2025).

3. Secure and Reproducible Sandboxing

Security, safety, and scientific reproducibility are foundational. Agents execute in per-session Docker containers with the following constraints:

  • Non-root Linux user with strict filesystem mounts (workspace-in, no external access).
  • Resource capping (cgroups: CPU, RAM limits).
  • Network access blocked by default, with egress permitted only if explicitly allowed.
  • Command execution is stratified by risk: All actions above a configurable threshold require explicit confirmation, and a security analyzer can approve/block or prompt for risk acknowledgment.
  • All outputs—stdout, stderr, DOM, screenshots, and web trees—are logged and appended to the EventStream for auditing.

Ephemeral session instantiation and automatic container rollback ensure no persistent state carries over between runs, closing avenues for cross-session leakage and maximizing guardrail effectiveness (Wang et al., 2024, Wang et al., 5 Nov 2025).

4. Evaluation Benchmarks and Empirical Performance

OpenHands ships with a broad suite of standardized benchmarks—spanning software engineering, web interaction, and general assistant tasks—with unambiguous quantitative metrics:

Domain Benchmark Metric
Software Eng. SWE-Bench Resolve rate (success %)
HumanEvalFix Pass@1 (bug-fix success)
BIRD, BioCoder, ML-Bench Execution/unit test/task accuracy
Web WebArena, MiniWoB++ Task success rate
Assistance GAIA, GPQA, MINT, ToolQA Score, accuracy, solve rate

Empirical results indicate that code-centric agents (e.g., CodeActAgent v1.8) evaluated on strong contemporary LLMs (e.g., GPT-4o, Claude 3.5) achieve competitive results: 22–26% on SWE-Bench Lite, 76–77% on MINT-math and ML-Bench, and 53% accuracy on GPQA ("diamond" level) without any benchmark-specific tuning (Wang et al., 2024). OpenHands-Versa demonstrates further advances, raising performance +9.1 points on SWE-Bench Multimodal and comparable absolute gains on GAIA and The Agent Company benchmarks by leveraging a minimal multimodal toolset and enhanced planning/state management (Soni et al., 3 Jun 2025).

5. Efficiency, Energy Use, and Limitations

Energy profiling studies confirm that OpenHands is the least energy-intensive among top agentic frameworks in SLM-constrained (Small LLM) regimes, with 9.4× lower mean energy usage than AutoCodeRover when both are paired with Gemma-3 4B (Tripathy et al., 10 Dec 2025). This efficiency is driven by concise orchestration loops, lower token overhead (100k per run vs. 400k for competitors), and streamlined event handling. However, with small backbone models, success rates fall to zero, indicating that efficiency gains can be misleading if the framework fails to adapt agentic control to underlying model limitations. Suggested optimizations include SLM-aware loop-breakers, context filtering, adaptive strategies, and token-cap capping, but these remain active research challenges.

6. Software Agent SDK and Productionization

The OpenHands Software Agent SDK constitutes a comprehensive architectural redesign facilitating production agent deployment (Wang et al., 5 Nov 2025). Key properties include:

  • Composability via abstracted Agent, Tool, EventStore, and Context interfaces.
  • Native sandboxed execution and seamless local/remote deployment via workspace abstraction.
  • Integrated REST/WebSocket services for lifecycle control and external integration.
  • Model-agnostic multi-LLM routing for heterogeneous agent populations or task-relevant model switching.
  • Built-in security analyzers and confirmation policies assigning risk grades (LOW/MEDIUM/HIGH) and controlling action approval.
  • Human-in-the-loop interfaces: VS Code server, VNC, browser, CLI, and API access.
  • Extensive empirical validation shows success rates of 72.8% (Claude Sonnet 4.5) on SWE-Bench Verified and 67.9% on GAIA.
  • SDK features surpass leading alternatives in native sandboxing, LLM routing, API completeness, and risk assessment.

This design supports both prototyping and robust, scalable agent deployments, backed by a growing library of preset tools and skill modules. The platform maintains a stable point-estimate performance profile due to event-sourced replay and fixed random seed initializations (Wang et al., 5 Nov 2025).

7. Governance, Licensing, and Community Ecosystem

OpenHands operates under a permissive MIT license, granting unrestricted academic and commercial usage. The project has amassed 28,000+ GitHub stars and over 1,300 pull requests from 160+ contributors, underlining its community-driven growth (Wang et al., 2024). Contribution processes involve standard open-source practices—issue-based development, PR review, integration tests (mocked LLM and sandbox runs), and transparent governance. As of 2026, the framework offers over ten core agents and extensible skill libraries, ensuring continued support and expansion for emerging research directions in generalist AI developer agents.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OpenHands Framework.