OpenHands Software Agent SDK
- OpenHands Software Agent SDK is a modular, model-agnostic framework for developing, deploying, and scaling AI-powered software agents.
- It employs event-sourced state management, multi-LLM routing, and pluggable tools to ensure reliable, type-safe operations.
- The SDK supports diverse deployment environments, robust security measures, and empirical benchmarking, making it ideal for research and production.
The OpenHands Software Agent SDK is a modular, model-agnostic framework for the development, deployment, and scaling of software engineering agents in both experimental and production settings. Designed to address the flexibility, reliability, and extensibility requirements of modern AI-powered software development tools, the SDK is a complete architectural redesign of the agent stack originating from the OpenHands project—an open-source agent platform with over 64k GitHub stars as of 2025. The SDK combines event-sourced state management, type-checked extensible tooling, multi-LLM routing, robust security policies, pluggable memory and context management, empirical benchmarking, and integration-ready APIs for a range of user interfaces and deployment environments. It establishes a compositional foundation for research and real-world agent systems, outperforming or matching peer SDKs across empirical benchmarks and feature coverage (Wang et al., 5 Nov 2025).
1. Architectural Decomposition and Core Design Principles
The OpenHands Software Agent SDK employs a modular architecture, decomposed into four independently deployable Python packages:
- openhands.sdk: Core agent abstractions, including Agent, Conversation, LLM, Tool, MCP protocol, and the event system.
- openhands.tools: Concretely implemented tools using the SDK tool interface.
- openhands.workspace: Execution environments, supporting both local and remote (Docker, cloud) agent operations.
- openhands.agent_server: FastAPI-based web server exposing REST and WebSocket APIs for remote/multi-user invocation.
This structure ensures strict separation of agent logic, tool implementation, environment management, and remote execution, enabling plug-and-play integration, lighter dependency surfaces, and rapid iteration. Each package corresponds to a separate functional concern, promoting type-safe extension, testability, and incremental adoption.
All interactions—agent messages, tool invocations, user inputs—are modeled as immutable events appended to an event log, with only a mutable ConversationState tracking session metadata and a full interaction history. This event sourcing pattern provides deterministic replay, persistent auditability, and precise debugging, foundational for both reliability and compliance-critical applications.
2. LLM Abstraction, Multi-LLM Routing, and Execution Control
The SDK abstracts model invocation through a unified LLM class supporting over a hundred models (via LiteLLM), encapsulating both chat-based and completion-based APIs, including new reasoning functionality (e.g., "ThinkingBlock" for Anthropic, "ReasoningItemModel" for OpenAI). Native prompt-based tool calling and output parsing enable compatibility with models lacking function-calling capabilities, expanding the usable model universe beyond most peer SDKs.
A model-agnostic RouterLLM enables per-invocation LLM selection, supporting hybrid workflows: for instance, text steps may be routed to cost-efficient models, image/multimodal tasks to high-capacity models, and fallback/ensemble logic can be codified for reliability or resource optimization. This makes OpenHands uniquely suitable for heterogeneous deployment environments and composite agents requiring cross-modal capabilities.
The agent abstraction is stateless and immutable, with configuration (LLM, tools, skills, context augmentations) loaded at conversation instantiation. Extensibility for skills and context components is enabled by pluggable markdown/code files and user-definable hooks.
3. Tools, Action Safety, and Event-Sourced Execution
Tooling in OpenHands follows an Action–Execution–Observation triad: actions are strongly validated via Pydantic schemas (guaranteeing type-safety for LLM tool arguments), executed by a dispatcher which enforces environmental and policy constraints, and observed outputs are structured and serializable for downstream consumption and log inspection.
The tool system supports:
- Action risk analysis: Each tool action is assigned a risk level (LOW/MEDIUM/HIGH/UNKNOWN) by LLM policy, secondary risk metrics, or explicit programmatic policies.
- Confirmation policies: Actions above user-defined thresholds require explicit confirmation, pausing agent execution until user validation. The separation of analysis/enforcement allows for fine-grained guardrails and dynamic policy updating without modifying core agent logic.
- MCP (Model Context Protocol) support: All MCP-compliant tools are convertible to SDK tools, supporting standards-based composition.
- Secret registry and credential isolation: Secrets are managed per-workspace, supporting dynamic binding, late resolution, and context-appropriate auto-redaction for logs and user outputs.
4. Memory Management, Context Condensation, and Lifecycle Handling
Large-scale or long-lived agent deployments are supported by a Condenser system, condensing earlier events into summary forms (e.g., CondensationEvents) to manage the LLM context window and minimize prompt/inference costs. Empirical analysis indicates that context condensation reduces operational cost by up to a factor of two without measurable impact on accuracy. The state at any time is:
where applies events sequentially to the initial state , ensuring deterministic replay and recovery.
Workspace abstraction separates local and remote/sandbox execution. Processes may pause, resume, spawn sub-agents, and detect or abort stuck states automatically, addressing core production-readiness problems (e.g., divergent agent loops, orphaned processes).
5. User Integration, Interaction Modalities, and Remote Access
The SDK provides multi-modal user interaction channels:
- VS Code Web integration: Embedded agent environments for collaborative development and agent/human code inspection.
- VNC and Chromium Browser interaction: GUI-based monitoring, debugging, and oversight of agent actions in real time (e.g., visible browser actions, headless/interactive browser control).
- CLI and REST/WebSocket APIs: Local prototyping and scalable remote invocation.
- Production agent server: Serves as multi-user, multi-tenancy hub for agent deployment, supporting isolated sandboxes, persistent sessions, and rich event reporting.
These UIs enable human-in-the-loop workflows, essential for practical deployment in mixed agent–developer teams.
6. Security, Reliability, and Quality Assurance
Security features are enforced at multiple levels:
- Sandboxing: Docker-based isolation (filesystem, network, resources) per agent/session; local execution by default, with optional sandboxing for additional assurance.
- Risk-informed execution: Actions with uncertain or HIGH risk are held for human review, with default and customizable policy thresholds.
- Credential management: All sensitive data is workspace-scoped, never injected into LLM context or logs, and dynamically managed.
Quality assurance incorporates a comprehensive testing and benchmarking stack:
- Unit and integration testing: All extension points are type-checked and undergo rigorous CI, including nightly runs on real LLMs and multi-tool orchestration.
- Benchmarking: Empirical evaluation is built-in for key published datasets (e.g., SWE-Bench Verified, GAIA), enabling reproducible measurement of agent correctness and reliability.
7. Empirical Results and Comparative Position
The OpenHands SDK achieves strong and consistent empirical results across benchmarks and LLMs: | Benchmark | Model | Score | |--------------------|----------------------------|---------| | SWE-Bench Verified | Claude Sonnet 4.5 | 72.8% | | SWE-Bench Verified | Claude Sonnet 4 | 68.0% | | SWE-Bench Verified | GPT-5 (reasoning=high) | 68.8% | | SWE-Bench Verified | Qwen3 Coder 480B A35B | 65.2% | | GAIA | Claude Sonnet 4.5 | 67.9% | | GAIA | Claude Sonnet 4 | 57.6% | | GAIA | GPT-5 (reasoning=high) | 62.4% | | GAIA | Qwen3 Coder 480B A35B | 41.2% |
This places the SDK as state-of-the-art or near-SOTA among both academic and commercial agent toolkits, substantiating its claims of reliability and flexible, model-agnostic performance. Systematic comparison shows that OpenHands uniquely combines advanced features: | Feature | OpenAI | Claude | Google ADK | OpenHands | |------------------------------------------|--------|--------|------------|-----------| | Model-agnostic (100+ LLMs) | ✔ | ✖ | ✔ | ✔ | | Multi-LLM Routing | ✖ | ✖ | ✖ | ✔ | | Security Analyzer (LLM-based) | ✖ | ✖ | (Δ) | ✔ | | REST/WebSocket Server | ✖ | ✖ | ✖ | ✔ | | Sandboxed Remote Execution | ✖ | ✖ | (Δ) | ✔ | | Interactive VSCode/VNC/Browser | ✖ | ✖ | ✖ | ✔ | | Integration Tests with Real LLMs | ✖ | ✔ | ✔ | ✔ | | Academic Benchmark Suite | ✖ | ✖ | ✖ | ✔ | (Δ = partial/with caveats)
8. Significance and Impact
By fusing event-sourced design, modular extensibility, model-agnostic execution, production-grade isolation, and empirical test/benchmark coverage, the OpenHands Software Agent SDK establishes a robust foundation for building, customizing, and deploying AI-driven software agents at scale. Deployments using the SDK benefit from deterministic recoverability, type-safe tool integration, fine-grained security controls, multi-modal UX, and strong empirical credibility across real-world software engineering benchmarks. The design approach enables rapid prototyping and domain customization while also supporting stringent requirements for auditability, safety, and continuous reliability validation. This positions the SDK as a central infrastructure component for research and engineering teams building autonomous or semi-autonomous agentic systems in complex development environments (Wang et al., 5 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free