Papers
Topics
Authors
Recent
2000 character limit reached

LLM-Based Hierarchical TODO Decomposition

Updated 25 November 2025
  • LLM-based Hierarchical TODO Decomposition is a paradigm that deconstructs complex tasks into structured subtasks, enabling robust orchestration with multi-agent systems.
  • It employs formal models, dependency graphs, and scoring functions to assign domain-specific subtasks and manage parallel execution.
  • Empirical benchmarks demonstrate significant gains in accuracy, efficiency, and cost savings across applications like robotics, survey generation, and 6G management.

LLM-based Hierarchical TODO Decomposition is a paradigm for orchestrating LLMs and agent systems to robustly solve complex, ambiguous, or multi-stage problems by systematically splitting them into hierarchically structured sub-tasks (“TODOs”), routing these to specialized agents or tools, and aggregating the results. This methodology overcomes context window limitations, enables parallel and modular execution, and delivers improved solution quality. Modern designs are grounded in formalisms from automated planning, multi-agent systems, computational graph theory, and empirical workflow management.

1. Formal Models and Notation

Formally, the decomposition process begins with a high-level task TTT \in \mathcal{T}, which is transformed into a set or hierarchy of sub-tasks via a decomposition function:

D(T)={t1,t2,,tn}D(T) = \{ t_1, t_2, \ldots, t_n \}

A directed acyclic dependency graph Dep{(titj)}\mathrm{Dep} \subseteq \{ (t_i \to t_j) \} encodes prerequisite relations between subtasks. Each sub-task tit_i is annotated with:

  • d(ti)d(t_i): domain/expertise label (e.g., “math calculation”, “flight search”)
  • c(ti)c(t_i): complexity estimate (e.g., token budget, number of steps)
  • agent(ti)agent(t_i): assigned agent (LLM specialist or external tool)
  • status(ti){Pending,In-Progress,Done,Failed}status(t_i) \in \{\text{Pending}, \text{In-Progress}, \text{Done}, \text{Failed}\}
  • result(ti)result(t_i): the output of solving tit_i

The global solution is reconstructed as:

Sfinal=Aggregate({result(ti)tiD(T)})S_\text{final} = \text{Aggregate}( \{ result(t_i) \mid t_i \in D(T) \} )

In multi-agent or multi-LLM workflows, assignment and prioritization are governed by scoring functions:

agent(ti)=argmaxAjScore(Aj,d(ti)) Score(Aj,d)=wdomainMatch(Aj.domain,d)+wperfhistorical_accuracy(Aj,d)agent(t_i) = \arg\max_{A_j} \text{Score}(A_j, d(t_i)) \ \text{Score}(A_j, d) = w_\text{domain} \cdot \text{Match}(A_j.\text{domain}, d) + w_\text{perf} \cdot \text{historical\_accuracy}(A_j, d)

This abstraction generalizes to recursive and cross-domain scenarios, such as tree-based mission planning for robots (Gupta et al., 27 Jan 2025), debate-based subtask planning for 6G management (Lin et al., 6 Jun 2025), and compositional workflows in code generation (Nakkab et al., 23 Jul 2024, Tang et al., 6 Dec 2024).

2. Decomposition and Orchestration Algorithms

A canonical orchestration pipeline proceeds in five distinct phases (Rasal et al., 26 Feb 2024):

  1. Requirement Elicitation: The orchestrator LLM interacts with the user, posing clarifying questions until the specification is sufficient. This leverages chain-of-thought prompting to uncover ambiguous or missing requirements.
  2. Task Decomposition: The orchestrator applies an LLM-driven split to produce a structured TODO list, modeled as a tree or DAG with explicit dependencies and stepwise domain annotations.
  3. Agent Assignment: Each subtask is routed to the agent or tool best suited by capability and prior observed accuracy (domain-specific routing).
  4. Parallel Subproblem Solving: Using a dependency graph and work queue (priority determined by topological order and/or complexity), subtasks are dispatched to agents as soon as all dependencies are satisfied. Execution is asynchronous and exploits available parallelism.
  5. Aggregation: Final solutions are synthesized by prompting the orchestrator LLM with the collection of subtask results to produce a coherent, user-facing response.

The following pseudocode from (Rasal et al., 26 Feb 2024) exemplifies this loop, with additional application-specific modules—such as utility-based robot task allocation (Gupta et al., 27 Jan 2025) and DSE-driven prompt generation for IC design (Tang et al., 6 Dec 2024)—refining assignment and aggregation strategies.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
context = user_input
while requirements_incomplete(context):
    q = Orch.generate_follow_up_question(context)
    user_answer = query_user(q)
    context |= user_answer
root_task = context
subtasks = Orch.decompose(root_task)
for t in subtasks:
    t.assignee = select_agent(t.domain)
    queue.add(t)
while queue:
    t = queue.pop_ready()
    t.status = "In-Progress"
    t.result = t.assignee.solve(t.description)
    t.status = "Done"
    queue.enqueue_ready_dependents(t)
S_final = Orch.aggregate({t.result for t in subtasks})
return S_final

3. Data Structures and Hierarchy Representation

Task hierarchies are predominantly managed as trees or DAGs. Each node represents a subproblem:

1
2
3
4
5
6
7
8
9
class TaskNode:
    id: str
    description: str
    domain: str
    complexity: float
    deps: List[str]
    assignee: AgentHandle
    status: Enum("Pending", "InProgress", "Done", "Failed")
    result: Optional[Any]

Orchestrators maintain a mapping of task IDs to TaskNodes, a dependency graph (adjacency list), and a priority work queue.

In multi-agent or modular agent systems, the entire workflow is a tree of services (Chao et al., 13 Oct 2025):

  • Root: high-level task (e.g., “Survey Generation”)
  • Intermediate: phase/functional modules (e.g., AnalysisPhase, SkeletonPhase)
  • Leaves: atomic LLM or tool servers (e.g., SearchServer, DigestServer)

Each module exposes one or more functions as standard protocols (e.g., MCP tool calls), facilitating distributed orchestration and plug-and-play module insertion.

4. Mathematical Criteria for Split, Assignment, and Aggregation

Although not always formalized as explicit closed-form equations, the decomposition process is driven by:

  • Chain-of-Thought (CoT) Decomposition: TlinkD(T)T \Rightarrow_{link} D(T), where link\Rightarrow_{link} models LLM-generated, stepwise breaking-down via CoT prompting.
  • Complexity Measures: Subtasks are defined so as to keep c(ti)c(t_i) (tokens or steps per subtask) below agent-specific thresholds, ensuring each is LLM-manageable (Chen et al., 20 Jul 2024). Theoretical analysis relates depth DD, branching bb, and per-node error εD\varepsilon_D to overall workflow accuracy:

E0bDεD Ctotall=0D1bl[Cpre(Lsys+ml)+Cdec(Lsys+ml,Ldec(ml))]E_0 \leq b^D \cdot \varepsilon_D \ C_{total} \leq \sum_{l=0}^{D-1} b^l \cdot [C_{pre}(L_{sys}+m_l) + C_{dec}(L_{sys}+m_l, L_{dec}(m_l))]

Optimization aims to set mlm_l, bb, and DD so that E0E_0 is under a target while CtotalC_{total} is minimized.

  • Assignment Score: As above, Score(Aj,d)=wdomainMatch+wperfaccuracyScore(A_j, d) = w_\text{domain}\cdot Match + w_\text{perf}\cdot accuracy controls agent routing (Rasal et al., 26 Feb 2024).
  • Task-robot matching: In multi-robot planning, assignment maximizes

maxrRaTrua(r)\max \sum_{r\in R} \sum_{a \in T_r} u_a(r)

subject to deadline and sequentiality, with utility ua(r)=αqa(r)βda(r)γca(r)u_a(r) = \alpha q_a(r) - \beta d_a(r) - \gamma c_a(r) (Gupta et al., 27 Jan 2025).

5. Runtime Protocols

At runtime, the orchestrator operates as a long-lived service, executing the following protocol (Rasal et al., 26 Feb 2024, Chao et al., 13 Oct 2025):

  1. Instantiation: Ingest user input; dynamically clarify via question–answer loop.
  2. Decomposition: Generate the task DAG/tree, possibly interacting with the user for further disambiguation.
  3. Agent Dispatch: Assign ready subtasks to available agent instances (with system such as LangChain or MCP).
  4. Concurrency and Monitoring: Track task status; upon completion of dependencies, schedule downstream tasks.
  5. Checkpointing and Fault Tolerance: Periodically record partial results to persist progress and allow recovery.
  6. Aggregation and Finalization: Aggregate subresults via LLM prompt or symbolic function; deliver final output.

In advanced systems, “orchestra” agents holistically plan next tool invocations based on execution history and user feedback (Chao et al., 13 Oct 2025). Human-in-the-loop intervention may occur at key decision points (topic scope, outline restructuring, etc.).

6. Empirical Results and Comparative Benchmarks

Empirical evaluation demonstrates substantive gains in accuracy, reliability, and efficiency:

  • On GSM8K math (2–8 steps per task), a GPT-4 orchestrator with GPT-3.5-turbo specialists achieved a 73% solve-rate, outperforming single-agent and flat multi-agent approaches by 8–23 percentage points (Rasal et al., 26 Feb 2024).
  • In hierarchical debate for 6G network management, MCR (macro coverage rate) improved as follows for GPT-4o + GPT-4o-mini: 39.62% (baseline) → 49.75% (regular debate) → 81.19% (hierarchical debate), with similar lifts for other model combinations (Lin et al., 6 Jun 2025).
  • In chip design, hierarchical prompting delivered >30% token and >45% runtime savings compared to flat prompting, with pass@5 rates rising from 0–10% to >90% for certain architectures (Nakkab et al., 23 Jul 2024, Tang et al., 6 Dec 2024).
  • In multi-robot mission planning, LLM-constructed hierarchical trees yielded tractable, near-optimal alternatives sublinear in the number of abstract tree nodes, with demonstrable flexibility across diverse mission types (Gupta et al., 27 Jan 2025).

These results generalize to domains including programming education (DBox: +0.198 correctness, +2.33 self-efficacy) (Ma et al., 26 Feb 2025) and cross-task zero-shot generalization in reinforcement learning (ReflexGrad: 67% trial-0 success, zero action loops) (Kadu et al., 18 Nov 2025).

7. Applications and Illustrative Examples

LLM-based hierarchical TODO decomposition frameworks are deployed in scenarios such as:

  • Travel planning: Decomposing user requests (“Book me a return flight...”) into flight search, amenity check, booking, with agent routing and dependency management (Rasal et al., 26 Feb 2024).
  • Robotics: Multi-level decomposition of missions (“Reunite mom with her lost child”) into compound and primitive subroutines, capability-aware agent assignment, and utility-maximizing task allocation (Gupta et al., 27 Jan 2025).
  • 6G Management: Hierarchical debate among LLMs for sub-task extraction (“Optimize RIS placement...”) and per-step solution refinement (Lin et al., 6 Jun 2025).
  • HDL/IC Generation: Recursive submodule generation (“64-to-1 MUX” → “8 MUX8-1” → “MUX2-1”) with simulation feedback in each TODO iteration (Nakkab et al., 23 Jul 2024, Tang et al., 6 Dec 2024).
  • Survey Generation and Planning: Modular orchestration of MCP servers for search, clustering, outline generation, and content refinement (Chao et al., 13 Oct 2025).
  • Programming Education: Co-decomposition of algorithmic tasks; learner-LLM step-tree alignment with dynamic hints and scaffolded code mapping (Ma et al., 26 Feb 2025).

Each of these exemplifies the translation of high-level, often ambiguous, natural language instructions into a structured, agent-executable workflow that supports parallelism, modular failure recovery, and extendability to new domains.


Key References:

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LLM-based Hierarchical TODO Decomposition.