Octo-planner Framework
- Octo-planner is a family of advanced agentic architectures that decompose complex goals into hierarchical, context-sensitive sub-steps for varied modalities.
- It integrates multiple systems including LLM-based reasoning, on-device planners, adaptive octree path mapping, and decentralized multi-agent control.
- Empirical results show improved accuracy, efficiency, and robustness across applications such as tool-augmented reasoning, edge automation, and robot control.
The Octo-planner Framework encompasses a family of agentic architectures and algorithms that operationalize complex planning and decision-making across diverse modalities and embodiments, including agent tool orchestration, on-device planning/action, adaptive spatial pathfinding, and distributed robot control. Implementations include the planning center of the OctoTools agentic framework for tool-based reasoning (Lu et al., 16 Feb 2025), an efficient on-device LLM-driven planner/action decomposition system (Chen et al., 2024), an adaptive octree-based online spatial path planner (“A-OctoMap”) (Mao et al., 2024), and a biologically inspired MARL-based control system for multi-arm space robots (“SpaceOctopus”) (Zhao et al., 2024). The suite of methodologies subsumed under the Octo-planner designation generally share the principle of explicit decomposition—a hierarchical or multi-stage separation of high-level intent from actionable sub-steps, with context-sensitive refinement and robust feedback.
1. Agentic Planning: OctoTools and Hierarchical Controllers
The Octo-planner in the OctoTools framework (Lu et al., 16 Feb 2025) is architected as a two-stage LLM-powered controller for tool-augmented reasoning. Inputs include a user query , a set of wrapped tools with associated metadata , and a base LLM (e.g., GPT-4o). The Planner’s high-level plan —generated by the Query Analyzer module—summarizes objectives, skills required, relevant tools, and best-practice caveats. At each step , the low-level Action Predictor issues an action selecting a tool and context for execution. The Executor component translates into executable code , runs it, and returns result , recursively updating context 0. Execution proceeds until a stop condition is signaled by a context verifier.
This process is formally cast as a discrete-time Markov process over states 1 (aggregating context and history) and actions 2 (planned tool invocations). The state transition evolves as 3 without explicit cost or reward optimization; reasoning completeness and ambiguity minimization are instead induced via LLM prompt engineering.
A critical ablation (step-budget study) demonstrates that multi-step decomposition yields monotonic accuracy gains, with performance peaking (458.5% on 16-task mean accuracy) at 5 plan steps. The Planner’s selective sequencing—preferring specialized tool invocation (68% vs. 10–25% for baselines)—drives significant accuracy improvements over vanilla GPT-4o and competitor frameworks (AutoGen, GPT-Functions, LangChain) by 6–7 (Lu et al., 16 Feb 2025).
2. On-Device Planner-Action Agents
The Octo-planner architecture introduced by NexaAI (Chen et al., 2024) operationalizes a strict separation between planning and execution, tailored for edge devices. The Planner Agent is instantiated as a fine-tuned Microsoft Phi-3 Mini (3.8B), which decomposes incoming queries 8 and a canonical function set 9 into an ordered sequence 0 of abstract substeps. The Action Agent (Octopus V2) ingests each substep 1, converts it into a structured JSON API call, and dispatches it to the relevant function.
Communication between planner and action components uses an internal message protocol: the planner emits 3 which is then split and parsed per step by the runtime. The planner executes a synchronous, non-reactive subgoal schedule without mid-execution re-planning.
Task decomposition is formalized as 2 with a standard cross-entropy loss over token-level plan generation during training. Data creation leverages GPT-4-generated queries and step-supervision for a fixed function set; examples undergo subsequent validation to ensure correctness.
Multi-domain generality is achieved via multi-LoRA (Low-Rank Adaptation): separate domain-specific adapters 3 are combined at inference, yielding merged weights 4. This enables high-quality composite planning (97% success on benchmark tasks) while sustaining low memory (250 MB) and latency (51 s per plan) constraints (Chen et al., 2024).
3. Adaptive Octree Path Planning: The A-OctoMap Framework
The A-OctoMap “Octo-planner” (Mao et al., 2024) fuses an adaptive octree-based spatial mapping subsystem with a multi-resolution path planner, incorporating a modified Jump-Point-Search. The architecture is partitioned into four modules: adaptive octree mapping, convex-hull-based per-leaf downsampling, hierarchical tree/grid representation, and adaptive JPS trajectory planning.
Sensor data populates an octree structure whose leaves may be split or merged online to satisfy minimum controllable region (MCR) granularity. Each leaf computes a convex hull to remove redundant interior points, storing obstacle boundary-representative meshes that accelerate subsequent collision checks. When a plan is requested, the octree projects to an adaptive grid (cell size = MCR edge), maintaining obstacle fidelity while minimizing cell count.
The planning module adapts Jump-Point-Search to operate over the non-uniform, octree-derived grid. Experimental results show an average 6 increase in path-finding success rates, 7 path length reductions, and typical plan computation times of 8 ms for million-point environments. This approach ensures both computational efficiency and geometric precision, especially in cluttered or high-resolution domains (Mao et al., 2024).
4. Distributed Multi-Agent Planning: Space-Octopus Paradigm
The SpaceOctopus framework (Zhao et al., 2024) extends the Octo-planner philosophy to decentralized multi-agent robotic systems, inspired by the distributed neural architecture of the biological octopus. Each robotic arm (in a four-arm, free-floating space manipulator) is partitioned into “limb” agents: position (joints 1–3) and orientation (joints 4–6), each equipped with local observations and policy nets. At the middle level, pairs coordinate per-arm sub-tasks, while at the top level, a centralized critic (for training only) ensures cooperative mission fulfillment (target capture or base reorientation).
Formally, the system is modeled as a DEC-POMDP. Policies are learned via MAPPO in a Centralized Training, Decentralized Execution (CTDE) setting, with per-agent observations and global state critics. Reward shaping ensures sub-task fidelity while penalizing energy, sudden velocity deviations, and collisions.
Empirical evaluation demonstrates that MAPPO outperforms centralized PPO and off-policy MADDPG, attaining sub-0.025 m end-effector error (vs. 90.05 m for PPO) and robust performance under disturbances, mass variation, or single-arm failures. Furthermore, due to modular policy learning, mixed-task reassembly (assigning arms to distinct tasks at runtime) is possible without retraining, mirroring octopus-like behavioral flexibility (Zhao et al., 2024).
5. Key Algorithmic Constructs and Planning Formalisms
Core to all Octo-planner architectures is explicit task decomposition: mapping complex global queries or objectives into hierarchical action plans compatible with available primitives (functions, tools, low-level motor commands). This can be abstracted as:
- Planner function: 0, where 1 is a knowledge/toolset.
- Executor function: 2, where 3 denotes environment state or execution feedback.
- Markov/decision process formalism: states 4, actions 5, transitions 6.
- Stopping criteria: trajectory completeness, step-budget, or satisfaction of verification predicates.
Implementations differ in the optimizer or underlying prediction engine. LLM-based planners use prompt engineering (OctoTools), fine-tuned LLMs (on-device Octo-planner), or RL policy nets (SpaceOctopus) to maximize reasoning and planning efficiency under resource or feedback constraints.
6. Application Contexts and Performance Metrics
Octo-planner frameworks span several application domains:
| Framework | Domain | Planner Modality | Notable Metric(s) |
|---|---|---|---|
| OctoTools (Lu et al., 16 Feb 2025) | Multimodal reasoning/tools | LLM-based hierarchical planning | 7 accuracy (16-task mean), 8 vs. agentic baselines |
| On-device Octo-planner (Chen et al., 2024) | Edge automation/APIs | Fine-tuned LLM task breakdown | 97% plan “success,” sub-1s latency, 250MB footprint |
| A-OctoMap (Mao et al., 2024) | Robotics/path planning | Adaptive octree + JPS | 9 pathfinding, 0 ms/plan, 1 shorter paths |
| SpaceOctopus (Zhao et al., 2024) | Multi-arm space robotics | Modular MARL with CTDE | 20.025 m pos. error, robust to failures/disturbance |
Performance is evaluated via task/plan success, accuracy, energy/memory/latency, robustness, and adaptability—each in accordance with the requirements of its target domain and embodiment.
7. Limitations, Extensions, and Prospective Directions
Key limitations in current instantiations include: lack of explicit utility optimization in LLM-based planners (OctoTools, on-device Octo-planner); worst-case quadratic cost in convex hull calculation (A-OctoMap); and potential for tree overgrowth without aggressive pruning. In MARL planners (SpaceOctopus), decentralized execution can under-exploit cross-agent information in deployment, though physical coupling mitigates coordination loss.
A plausible implication is that future Octo-planner frameworks may benefit from hybridizing explicit search/planning with learned utility approximations, more generalized re-planning (especially on-device), and adaptive computation/resource management. The biological inspiration (distributed, semi-autonomous control) points to further research in modular, reassemblable learning and resilient, parallel agentic architectures, which are essential for scalable intelligent systems operating under varied and dynamic constraints.