Papers
Topics
Authors
Recent
Search
2000 character limit reached

Octo-planner Framework

Updated 15 April 2026
  • Octo-planner is a family of advanced agentic architectures that decompose complex goals into hierarchical, context-sensitive sub-steps for varied modalities.
  • It integrates multiple systems including LLM-based reasoning, on-device planners, adaptive octree path mapping, and decentralized multi-agent control.
  • Empirical results show improved accuracy, efficiency, and robustness across applications such as tool-augmented reasoning, edge automation, and robot control.

The Octo-planner Framework encompasses a family of agentic architectures and algorithms that operationalize complex planning and decision-making across diverse modalities and embodiments, including agent tool orchestration, on-device planning/action, adaptive spatial pathfinding, and distributed robot control. Implementations include the planning center of the OctoTools agentic framework for tool-based reasoning (Lu et al., 16 Feb 2025), an efficient on-device LLM-driven planner/action decomposition system (Chen et al., 2024), an adaptive octree-based online spatial path planner (“A-OctoMap”) (Mao et al., 2024), and a biologically inspired MARL-based control system for multi-arm space robots (“SpaceOctopus”) (Zhao et al., 2024). The suite of methodologies subsumed under the Octo-planner designation generally share the principle of explicit decomposition—a hierarchical or multi-stage separation of high-level intent from actionable sub-steps, with context-sensitive refinement and robust feedback.

1. Agentic Planning: OctoTools and Hierarchical Controllers

The Octo-planner in the OctoTools framework (Lu et al., 16 Feb 2025) is architected as a two-stage LLM-powered controller for tool-augmented reasoning. Inputs include a user query qQq \in \mathcal{Q}, a set of wrapped tools D={di}1nD = \{d_i\}_{1}^{n} with associated metadata mim_i, and a base LLM0_0 (e.g., GPT-4o). The Planner’s high-level plan p0p_0—generated by the Query Analyzer module—summarizes objectives, skills required, relevant tools, and best-practice caveats. At each step tt, the low-level Action Predictor issues an action at=(dt,subgoalt,contextt)a_t = (d_t, \mathrm{subgoal}_t, \mathrm{context}_t) selecting a tool and context for execution. The Executor component translates ata_t into executable code oto_t, runs it, and returns result rtr_t, recursively updating context D={di}1nD = \{d_i\}_{1}^{n}0. Execution proceeds until a stop condition is signaled by a context verifier.

This process is formally cast as a discrete-time Markov process over states D={di}1nD = \{d_i\}_{1}^{n}1 (aggregating context and history) and actions D={di}1nD = \{d_i\}_{1}^{n}2 (planned tool invocations). The state transition evolves as D={di}1nD = \{d_i\}_{1}^{n}3 without explicit cost or reward optimization; reasoning completeness and ambiguity minimization are instead induced via LLM prompt engineering.

A critical ablation (step-budget study) demonstrates that multi-step decomposition yields monotonic accuracy gains, with performance peaking (D={di}1nD = \{d_i\}_{1}^{n}458.5% on 16-task mean accuracy) at D={di}1nD = \{d_i\}_{1}^{n}5 plan steps. The Planner’s selective sequencing—preferring specialized tool invocation (68% vs. 10–25% for baselines)—drives significant accuracy improvements over vanilla GPT-4o and competitor frameworks (AutoGen, GPT-Functions, LangChain) by D={di}1nD = \{d_i\}_{1}^{n}6–D={di}1nD = \{d_i\}_{1}^{n}7 (Lu et al., 16 Feb 2025).

2. On-Device Planner-Action Agents

The Octo-planner architecture introduced by NexaAI (Chen et al., 2024) operationalizes a strict separation between planning and execution, tailored for edge devices. The Planner Agent is instantiated as a fine-tuned Microsoft Phi-3 Mini (3.8B), which decomposes incoming queries D={di}1nD = \{d_i\}_{1}^{n}8 and a canonical function set D={di}1nD = \{d_i\}_{1}^{n}9 into an ordered sequence mim_i0 of abstract substeps. The Action Agent (Octopus V2) ingests each substep mim_i1, converts it into a structured JSON API call, and dispatches it to the relevant function.

Communication between planner and action components uses an internal message protocol: the planner emits p0p_03 which is then split and parsed per step by the runtime. The planner executes a synchronous, non-reactive subgoal schedule without mid-execution re-planning.

Task decomposition is formalized as mim_i2 with a standard cross-entropy loss over token-level plan generation during training. Data creation leverages GPT-4-generated queries and step-supervision for a fixed function set; examples undergo subsequent validation to ensure correctness.

Multi-domain generality is achieved via multi-LoRA (Low-Rank Adaptation): separate domain-specific adapters mim_i3 are combined at inference, yielding merged weights mim_i4. This enables high-quality composite planning (97% success on benchmark tasks) while sustaining low memory (250 MB) and latency (mim_i51 s per plan) constraints (Chen et al., 2024).

3. Adaptive Octree Path Planning: The A-OctoMap Framework

The A-OctoMap “Octo-planner” (Mao et al., 2024) fuses an adaptive octree-based spatial mapping subsystem with a multi-resolution path planner, incorporating a modified Jump-Point-Search. The architecture is partitioned into four modules: adaptive octree mapping, convex-hull-based per-leaf downsampling, hierarchical tree/grid representation, and adaptive JPS trajectory planning.

Sensor data populates an octree structure whose leaves may be split or merged online to satisfy minimum controllable region (MCR) granularity. Each leaf computes a convex hull to remove redundant interior points, storing obstacle boundary-representative meshes that accelerate subsequent collision checks. When a plan is requested, the octree projects to an adaptive grid (cell size = MCR edge), maintaining obstacle fidelity while minimizing cell count.

The planning module adapts Jump-Point-Search to operate over the non-uniform, octree-derived grid. Experimental results show an average mim_i6 increase in path-finding success rates, mim_i7 path length reductions, and typical plan computation times of mim_i8 ms for million-point environments. This approach ensures both computational efficiency and geometric precision, especially in cluttered or high-resolution domains (Mao et al., 2024).

4. Distributed Multi-Agent Planning: Space-Octopus Paradigm

The SpaceOctopus framework (Zhao et al., 2024) extends the Octo-planner philosophy to decentralized multi-agent robotic systems, inspired by the distributed neural architecture of the biological octopus. Each robotic arm (in a four-arm, free-floating space manipulator) is partitioned into “limb” agents: position (joints 1–3) and orientation (joints 4–6), each equipped with local observations and policy nets. At the middle level, pairs coordinate per-arm sub-tasks, while at the top level, a centralized critic (for training only) ensures cooperative mission fulfillment (target capture or base reorientation).

Formally, the system is modeled as a DEC-POMDP. Policies are learned via MAPPO in a Centralized Training, Decentralized Execution (CTDE) setting, with per-agent observations and global state critics. Reward shaping ensures sub-task fidelity while penalizing energy, sudden velocity deviations, and collisions.

Empirical evaluation demonstrates that MAPPO outperforms centralized PPO and off-policy MADDPG, attaining sub-0.025 m end-effector error (vs. mim_i90.05 m for PPO) and robust performance under disturbances, mass variation, or single-arm failures. Furthermore, due to modular policy learning, mixed-task reassembly (assigning arms to distinct tasks at runtime) is possible without retraining, mirroring octopus-like behavioral flexibility (Zhao et al., 2024).

5. Key Algorithmic Constructs and Planning Formalisms

Core to all Octo-planner architectures is explicit task decomposition: mapping complex global queries or objectives into hierarchical action plans compatible with available primitives (functions, tools, low-level motor commands). This can be abstracted as:

  • Planner function: 0_00, where 0_01 is a knowledge/toolset.
  • Executor function: 0_02, where 0_03 denotes environment state or execution feedback.
  • Markov/decision process formalism: states 0_04, actions 0_05, transitions 0_06.
  • Stopping criteria: trajectory completeness, step-budget, or satisfaction of verification predicates.

Implementations differ in the optimizer or underlying prediction engine. LLM-based planners use prompt engineering (OctoTools), fine-tuned LLMs (on-device Octo-planner), or RL policy nets (SpaceOctopus) to maximize reasoning and planning efficiency under resource or feedback constraints.

6. Application Contexts and Performance Metrics

Octo-planner frameworks span several application domains:

Framework Domain Planner Modality Notable Metric(s)
OctoTools (Lu et al., 16 Feb 2025) Multimodal reasoning/tools LLM-based hierarchical planning 0_07 accuracy (16-task mean), 0_08 vs. agentic baselines
On-device Octo-planner (Chen et al., 2024) Edge automation/APIs Fine-tuned LLM task breakdown 97% plan “success,” sub-1s latency, 250MB footprint
A-OctoMap (Mao et al., 2024) Robotics/path planning Adaptive octree + JPS 0_09 pathfinding, p0p_00 ms/plan, p0p_01 shorter paths
SpaceOctopus (Zhao et al., 2024) Multi-arm space robotics Modular MARL with CTDE p0p_020.025 m pos. error, robust to failures/disturbance

Performance is evaluated via task/plan success, accuracy, energy/memory/latency, robustness, and adaptability—each in accordance with the requirements of its target domain and embodiment.

7. Limitations, Extensions, and Prospective Directions

Key limitations in current instantiations include: lack of explicit utility optimization in LLM-based planners (OctoTools, on-device Octo-planner); worst-case quadratic cost in convex hull calculation (A-OctoMap); and potential for tree overgrowth without aggressive pruning. In MARL planners (SpaceOctopus), decentralized execution can under-exploit cross-agent information in deployment, though physical coupling mitigates coordination loss.

A plausible implication is that future Octo-planner frameworks may benefit from hybridizing explicit search/planning with learned utility approximations, more generalized re-planning (especially on-device), and adaptive computation/resource management. The biological inspiration (distributed, semi-autonomous control) points to further research in modular, reassemblable learning and resilient, parallel agentic architectures, which are essential for scalable intelligent systems operating under varied and dynamic constraints.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Octo-planner Framework.