Octopus Protocol: One-Shot Hardware Discovery and Control for AI Agents via Infrastructure-as-Prompts

Published 9 May 2026 in cs.RO, cs.AI, and cs.MA | (2605.09055v1)

Abstract: Recent agentic-robotics systems, from Code-asPolicies to modern vision-language-action (VLA) foundation models, presuppose that drivers, SDKs, or ROS-style primitives for the target hardware already exist. Writing those primitives is the dominant engineering cost of bringing up new hardware for agent control. We present Octopus Protocol, a system that collapses that cost to a single shell command. Given only raw OS access and a language-model API key, a coding agent executes a five-stage pipeline--PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY--to discover connected devices, infer their capabilities, generate a Model Context Protocol (MCP) server with typed tools, and deploy it as a live HTTP endpoint. A persistent daemon then monitors the system, heals broken code, and perceives physical state through the camera tools it generated for itself. Two architectural principles make this work: protocols are prompts, not code, and the coding agent is the runtime. We validate the system on three heterogeneous platforms (PC/WSL, Apple Silicon macOS, Raspberry Pi 4) and on a commercial 6-DOF robotic arm with USB camera feedback. One command onboards the hardware in ~10-15 minutes and exposes up to 30 MCP tools; an MCP-compliant client then performs closed-loop visual-motor control through tools no human wrote.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a one-shot LLM-driven framework for synthesizing hardware interfaces via markdown prompts, eliminating manual driver coding.
It employs a five-stage pipeline, validated across diverse platforms, that auto-generates and self-heals device control endpoints in 10-15 minutes.
The approach democratizes hardware integration for robotics and embodied AI, enabling autonomous sensorimotor loops and emergent closed-loop operation.

Overview of the Octopus Protocol: Model Context Protocol Generation via Infrastructure-as-Prompts

The paper entitled "Octopus Protocol: One-Shot Hardware Discovery and Control for AI Agents via Infrastructure-as-Prompts" (2605.09055) introduces a framework for hardware discovery and automatic MCP-based endpoint generation using LLM agents, with a focus on agentic robotics and self-healing AI infrastructure. The Octopus Protocol aims to replace human-authored drivers and SDKs with runtime prompt-based specification interpreted by an LLM-driven coding agent, dramatically reducing the engineering cost for hardware onboarding and control.

Motivation and Positioning

The primary bottleneck between AI agents and physical hardware, particularly for robotics and embodied AI, is glue code—custom driver and integration logic tailored for each device. Whereas prior foundation models such as Code-as-Policies and GR00T N1 assume the existence of hardware drivers or ROS-node primitives, Octopus goes further by having the agent generate these primitives on demand, removing the requirement for domain-specific, manually engineered middleware. By turning protocol specifications into prompts and executing generation at deploy time, Octopus enables prompt-level specification as a hardware interface abstraction.

Architecture and Methodology

Octopus implements a five-stage pipeline for dynamic hardware interface synthesis:

PROBE uses platform-native enumeration tools (e.g., lsusb, system_profiler, gpiodetect) to construct a structured hardware inventory.
IDENTIFY maps detected vendor and product IDs to candidate functions (e.g., set_servo_angle, capture_image) by static lookup and web search, assigning confidence scores.
INTERFACE produces per-capability MCP tool schemas, assigning correct input types.
SERVE auto-generates a complete FastMCP server, including all error handling, I/O operations, and runtime guards, with no templates.
DEPLOY installs dependencies and launches a live HTTP/SSE endpoint exposing the generated tool suite.

The backbone orchestrator is minimal (~640 lines Python + ~560 lines markdown), with all target-platform code synthesized by the agent at deploy time.

A persistent daemon supplements the build pipeline with continuous monitoring:

WATCH inspects logs using a lightweight model;
HEAL invokes the coding agent with error context to rewrite broken components or reinstall dependencies;
PERCEIVE leverages generated camera tools for Markov-bounded visual summaries to maintain agent awareness.

Two critical principles differentiate Octopus from previous frameworks: (i) protocols become prompts, not static source artifacts, so extending hardware is a matter of editing markdown specifications, and (ii) the coding agent acts as the persistent backend, not just an initial code generator.

Empirical Validation

Octopus was evaluated on three heterogenous platforms—Windows/WSL, macOS (Apple Silicon), and Raspberry Pi 4—provisioned by the same top-level command and specification, with no device-specific engineering. The protocol generated a functioning MCP server in approximately 10-15 minutes per platform, exposing a suite of up to 30 distinct hardware tools depending on connected peripherals and agent confidence in mapping.

Strong empirical claims include:

One-shot onboarding: All platforms required only a single command and were fully addressed without custom drivers.
Closed-loop visual-motor control: On a Pi4-connected 6-DOF robotic arm and camera, the generated endpoints sufficed for an MCP client to perform sense-act-verify cycles without custom integration, demonstrating emergent closed-loop operation using only Octopus-generated APIs.
Self-healing operation: The HEAL stage robustly recovered from induced failures, e.g., missing Python dependencies, USB device hot unplug, and direct source corruption, without human intervention. All orchestrator integration tests (14/14) were passed.

Implications and Theoretical Consequences

Octopus fundamentally reframes hardware abstraction layers by delegating device driver synthesis and maintenance to LLM-driven agents operating over markdown specifications. This implies:

Democratization: Any AI agent with MCP compliance can interface newly connected hardware, neutralizing the glue-code tax and broadening hardware-access reach for research and deployment.
Embodied Cognition: The protocol naturally supports sensorimotor loops, as the toolset allows self-perception and real-time environment feedback without bespoke pipelines.
Self-sufficiency: The runtime agent maintains and regenerates its own endpoints, effectively becoming an autonomous backend for edge and robotics workloads.

Planned extensions include cross-device orchestration among Octopus nodes, distributed hardware discovery (Wi-Fi/Bluetooth), formal safety-constrained tool activation, and physical realization of complex perception rigs.

Future Directions

Practical and research-level ramifications include:

Integration of safety proofs in agent-generated MCP tools, which is critical for industrial and safety-critical robotics.
Improvements in plug-and-play multi-device ecosystems, enabling collective agent action and perception over untrusted or heterogeneous hardware clusters.
Benchmarking and standardization efforts for agent-driven infrastructure to compare with or complement static code engineering pipelines.

Conclusion

Octopus Protocol operationalizes the notion that protocol as prompt, executed by a coding agent as runtime, can entirely subsume platform-specific driver development. Empirical results demonstrate robust, one-shot onboarding, sense-act-control, and self-healing properties across diverse host architectures and devices. This approach suggests significant new directions for embodied AI, infrastructure-as-code paradigms, and LLM-powered autotelic systems.

Markdown Report Issue