Octo-planner Framework

Updated 15 April 2026

Octo-planner is a family of advanced agentic architectures that decompose complex goals into hierarchical, context-sensitive sub-steps for varied modalities.
It integrates multiple systems including LLM-based reasoning, on-device planners, adaptive octree path mapping, and decentralized multi-agent control.
Empirical results show improved accuracy, efficiency, and robustness across applications such as tool-augmented reasoning, edge automation, and robot control.

The Octo-planner Framework encompasses a family of agentic architectures and algorithms that operationalize complex planning and decision-making across diverse modalities and embodiments, including agent tool orchestration, on-device planning/action, adaptive spatial pathfinding, and distributed robot control. Implementations include the planning center of the OctoTools agentic framework for tool-based reasoning (Lu et al., 16 Feb 2025), an efficient on-device LLM-driven planner/action decomposition system (Chen et al., 2024), an adaptive octree-based online spatial path planner (“A-OctoMap”) (Mao et al., 2024), and a biologically inspired MARL-based control system for multi-arm space robots (“SpaceOctopus”) (Zhao et al., 2024). The suite of methodologies subsumed under the Octo-planner designation generally share the principle of explicit decomposition—a hierarchical or multi-stage separation of high-level intent from actionable sub-steps, with context-sensitive refinement and robust feedback.

1. Agentic Planning: OctoTools and Hierarchical Controllers

The Octo-planner in the OctoTools framework (Lu et al., 16 Feb 2025) is architected as a two-stage LLM-powered controller for tool-augmented reasoning. Inputs include a user query $q \in \mathcal{Q}$ , a set of wrapped tools $D = \{d_i\}_{1}^{n}$ with associated metadata $m_i$ , and a base LLM $_0$ (e.g., GPT-4o). The Planner’s high-level plan $p_0$ —generated by the Query Analyzer module—summarizes objectives, skills required, relevant tools, and best-practice caveats. At each step $t$ , the low-level Action Predictor issues an action $a_t = (d_t, \mathrm{subgoal}_t, \mathrm{context}_t)$ selecting a tool and context for execution. The Executor component translates $a_t$ into executable code $o_t$ , runs it, and returns result $r_t$ , recursively updating context $D = \{d_i\}_{1}^{n}$ 0. Execution proceeds until a stop condition is signaled by a context verifier.

This process is formally cast as a discrete-time Markov process over states $D = \{d_i\}_{1}^{n}$ 1 (aggregating context and history) and actions $D = \{d_i\}_{1}^{n}$ 2 (planned tool invocations). The state transition evolves as $D = \{d_i\}_{1}^{n}$ 3 without explicit cost or reward optimization; reasoning completeness and ambiguity minimization are instead induced via LLM prompt engineering.

A critical ablation (step-budget study) demonstrates that multi-step decomposition yields monotonic accuracy gains, with performance peaking ( $D = \{d_i\}_{1}^{n}$ 458.5% on 16-task mean accuracy) at $D = \{d_i\}_{1}^{n}$ 5 plan steps. The Planner’s selective sequencing—preferring specialized tool invocation (68% vs. 10–25% for baselines)—drives significant accuracy improvements over vanilla GPT-4o and competitor frameworks (AutoGen, GPT-Functions, LangChain) by $D = \{d_i\}_{1}^{n}$ 6– $D = \{d_i\}_{1}^{n}$ 7 (Lu et al., 16 Feb 2025).

2. On-Device Planner-Action Agents

The Octo-planner architecture introduced by NexaAI (Chen et al., 2024) operationalizes a strict separation between planning and execution, tailored for edge devices. The Planner Agent is instantiated as a fine-tuned Microsoft Phi-3 Mini (3.8B), which decomposes incoming queries $D = \{d_i\}_{1}^{n}$ 8 and a canonical function set $D = \{d_i\}_{1}^{n}$ 9 into an ordered sequence $m_i$ 0 of abstract substeps. The Action Agent (Octopus V2) ingests each substep $m_i$ 1, converts it into a structured JSON API call, and dispatches it to the relevant function.

Communication between planner and action components uses an internal message protocol: the planner emits $p_0$ 3 which is then split and parsed per step by the runtime. The planner executes a synchronous, non-reactive subgoal schedule without mid-execution re-planning.

Task decomposition is formalized as $m_i$ 2 with a standard cross-entropy loss over token-level plan generation during training. Data creation leverages GPT-4-generated queries and step-supervision for a fixed function set; examples undergo subsequent validation to ensure correctness.

Multi-domain generality is achieved via multi-LoRA (Low-Rank Adaptation): separate domain-specific adapters $m_i$ 3 are combined at inference, yielding merged weights $m_i$ 4. This enables high-quality composite planning (97% success on benchmark tasks) while sustaining low memory (250 MB) and latency ( $m_i$ 51 s per plan) constraints (Chen et al., 2024).

3. Adaptive Octree Path Planning: The A-OctoMap Framework

The A-OctoMap “Octo-planner” (Mao et al., 2024) fuses an adaptive octree-based spatial mapping subsystem with a multi-resolution path planner, incorporating a modified Jump-Point-Search. The architecture is partitioned into four modules: adaptive octree mapping, convex-hull-based per-leaf downsampling, hierarchical tree/grid representation, and adaptive JPS trajectory planning.

Sensor data populates an octree structure whose leaves may be split or merged online to satisfy minimum controllable region (MCR) granularity. Each leaf computes a convex hull to remove redundant interior points, storing obstacle boundary-representative meshes that accelerate subsequent collision checks. When a plan is requested, the octree projects to an adaptive grid (cell size = MCR edge), maintaining obstacle fidelity while minimizing cell count.

The planning module adapts Jump-Point-Search to operate over the non-uniform, octree-derived grid. Experimental results show an average $m_i$ 6 increase in path-finding success rates, $m_i$ 7 path length reductions, and typical plan computation times of $m_i$ 8 ms for million-point environments. This approach ensures both computational efficiency and geometric precision, especially in cluttered or high-resolution domains (Mao et al., 2024).

4. Distributed Multi-Agent Planning: Space-Octopus Paradigm

The SpaceOctopus framework (Zhao et al., 2024) extends the Octo-planner philosophy to decentralized multi-agent robotic systems, inspired by the distributed neural architecture of the biological octopus. Each robotic arm (in a four-arm, free-floating space manipulator) is partitioned into “limb” agents: position (joints 1–3) and orientation (joints 4–6), each equipped with local observations and policy nets. At the middle level, pairs coordinate per-arm sub-tasks, while at the top level, a centralized critic (for training only) ensures cooperative mission fulfillment (target capture or base reorientation).

Formally, the system is modeled as a DEC-POMDP. Policies are learned via MAPPO in a Centralized Training, Decentralized Execution (CTDE) setting, with per-agent observations and global state critics. Reward shaping ensures sub-task fidelity while penalizing energy, sudden velocity deviations, and collisions.

Empirical evaluation demonstrates that MAPPO outperforms centralized PPO and off-policy MADDPG, attaining sub-0.025 m end-effector error (vs. $m_i$ 90.05 m for PPO) and robust performance under disturbances, mass variation, or single-arm failures. Furthermore, due to modular policy learning, mixed-task reassembly (assigning arms to distinct tasks at runtime) is possible without retraining, mirroring octopus-like behavioral flexibility (Zhao et al., 2024).

5. Key Algorithmic Constructs and Planning Formalisms

Core to all Octo-planner architectures is explicit task decomposition: mapping complex global queries or objectives into hierarchical action plans compatible with available primitives (functions, tools, low-level motor commands). This can be abstracted as:

Planner function: $_0$ 0, where $_0$ 1 is a knowledge/toolset.
Executor function: $_0$ 2, where $_0$ 3 denotes environment state or execution feedback.
Markov/decision process formalism: states $_0$ 4, actions $_0$ 5, transitions $_0$ 6.
Stopping criteria: trajectory completeness, step-budget, or satisfaction of verification predicates.

Implementations differ in the optimizer or underlying prediction engine. LLM-based planners use prompt engineering (OctoTools), fine-tuned LLMs (on-device Octo-planner), or RL policy nets (SpaceOctopus) to maximize reasoning and planning efficiency under resource or feedback constraints.

6. Application Contexts and Performance Metrics

Octo-planner frameworks span several application domains:

Framework	Domain	Planner Modality	Notable Metric(s)
OctoTools (Lu et al., 16 Feb 2025)	Multimodal reasoning/tools	LLM-based hierarchical planning	$_0$ 7 accuracy (16-task mean), $_0$ 8 vs. agentic baselines
On-device Octo-planner (Chen et al., 2024)	Edge automation/APIs	Fine-tuned LLM task breakdown	97% plan “success,” sub-1s latency, 250MB footprint
A-OctoMap (Mao et al., 2024)	Robotics/path planning	Adaptive octree + JPS	$_0$ 9 pathfinding, $p_0$ 0 ms/plan, $p_0$ 1 shorter paths
SpaceOctopus (Zhao et al., 2024)	Multi-arm space robotics	Modular MARL with CTDE	$p_0$ 20.025 m pos. error, robust to failures/disturbance

Performance is evaluated via task/plan success, accuracy, energy/memory/latency, robustness, and adaptability—each in accordance with the requirements of its target domain and embodiment.

7. Limitations, Extensions, and Prospective Directions

Key limitations in current instantiations include: lack of explicit utility optimization in LLM-based planners (OctoTools, on-device Octo-planner); worst-case quadratic cost in convex hull calculation (A-OctoMap); and potential for tree overgrowth without aggressive pruning. In MARL planners (SpaceOctopus), decentralized execution can under-exploit cross-agent information in deployment, though physical coupling mitigates coordination loss.

A plausible implication is that future Octo-planner frameworks may benefit from hybridizing explicit search/planning with learned utility approximations, more generalized re-planning (especially on-device), and adaptive computation/resource management. The biological inspiration (distributed, semi-autonomous control) points to further research in modular, reassemblable learning and resilient, parallel agentic architectures, which are essential for scalable intelligent systems operating under varied and dynamic constraints.

Markdown Report Issue Upgrade to Chat

References (4)

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning (2025)

Octo-planner: On-device Language Model for Planner-Action Agents (2024)

A-OctoMap: An Adaptive OctoMap for Online Path Planning (2024)

SpaceOctopus: An Octopus-inspired Motion Planning Framework for Multi-arm Space Robot (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Octo-planner Framework.

Octo-planner Framework

1. Agentic Planning: OctoTools and Hierarchical Controllers

2. On-Device Planner-Action Agents

3. Adaptive Octree Path Planning: The A-OctoMap Framework

4. Distributed Multi-Agent Planning: Space-Octopus Paradigm

5. Key Algorithmic Constructs and Planning Formalisms

6. Application Contexts and Performance Metrics

7. Limitations, Extensions, and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Octo-planner Framework

1. Agentic Planning: OctoTools and Hierarchical Controllers

2. On-Device Planner-Action Agents

3. Adaptive Octree Path Planning: The A-OctoMap Framework

4. Distributed Multi-Agent Planning: Space-Octopus Paradigm

5. Key Algorithmic Constructs and Planning Formalisms

6. Application Contexts and Performance Metrics

7. Limitations, Extensions, and Prospective Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research