Decomposed Prompting: Modular Neural Reasoning
- Decomposed Prompting (DecomP) is a family of techniques that transform complex neural tasks into a sequence of simpler, specialized sub-tasks.
- It leverages hierarchical and recursive decomposition to modularize multi-step reasoning, question answering, and algorithmic problem-solving.
- By integrating symbolic modules with LLM sub-handlers, DecomP achieves significant performance improvements over monolithic prompting methods.
Decomposed Prompting (DecomP) encompasses a family of techniques that transform complex neural inference or decision problems into a composition of simpler prompting sub-tasks, each tailored to a concrete subproblem or decision module. These approaches leverage prompt engineering, sub-task modularization, and often hybrid symbolic-neural execution—enabling LLMs and other foundation models to robustly solve tasks that are intractable, unreliable, or inefficient for monolithic or end-to-end prompts.
1. Formal Framework and Modular Architecture
The canonical DecomP methodology represents a complex task —for example, multi-step reasoning, question answering, or structured prediction—as a cascade or program of sub-tasks . Each sub-task is instantiated by a handler from a library (e.g., a prompt template or symbolic routine). A decomposition module orchestrates which sub-task to invoke at each step, optionally based on prior outputs. This supports recursion (tasks defined in terms of smaller instances of themselves), hierarchical decomposition (splitting operations into atomic steps), and modular handler libraries.
Formally: where
- is a handler (LLM prompt, symbolic API, or merged module),
- is the sub-query or instruction,
- is the sub-answer,
- is the final output.
The decomposition and handler execution proceeds either sequentially or recursively, terminating when emits a special signal.
This modular paradigm enables:
- Isolated debug/optimization: Each handler is prompt-engineered/tested independently.
- Composable workflows: Sub-task prompts are reusable and extensible.
- Plug-in hybridization: Symbolic components (retrievers, calculators) are directly integrated wherever LLMs struggle.
2. Decomposition Strategies and Subproblem Factorizations
The efficacy of DecomP hinges on the granularity and structure of decomposition:
- Hierarchical decomposition: Tasks are broken into a tree of subtasks, delegating complex operations (e.g., “k-th letter of word sequence”) to sequences or trees of simpler functions (e.g., “split string” “arr_pos”).
- Recursive decomposition: Used for variable-length or compositional tasks (e.g., list reversal, merge sort), where the task is defined over progressively smaller subproblems until reaching a base-case solved by a dedicated handler.
- Hybrid symbolic-neural flows: For open-domain QA, one sub-task may invoke symbolic retrieval (e.g., Elasticsearch indexing), passing retrieved evidence to downstream LLM QA handlers.
Empirically, such decompositions enable significant performance gains over monolithic few-shot or even chain-of-thought (CoT) prompting—especially when reasoning complexity or input size exceeds the capacity of a single prompt window or when the required reasoning steps are not represented in standard few-shot exemplars. For instance, th-letter-concatenation and list reversal tasks are solved at 100% accuracy with recursive DecomP, whereas standard CoT and least-to-most strategies fail as list length grows.
3. Implementation and Inference Mechanics
The system architecture comprises:
- Decomposer prompt: A few-shot prompt template that, given the input and the prior history, outputs the next sub-task instruction (i.e., which handler to call and what sub-question to ask).
- Handler library: Each is an atomic prompt, another decomposer, or a symbolic function.
Pseudocode for the DecomP execution loop:
1 2 3 4 5 6 7 8 9 |
def DecomP_infer(Q, D_decomposer, F_handlers): history = [] while True: out = LLM_call(D_decomposer, input=(Q, history)) (f_next, Q_next) = parse(out) if f_next == "EOQ": return history[-1].A A_next = run_handler(f_next, Q_next, F_handlers[f_next]) history.append((f_next, Q_next, A_next)) |
For recursive decomposition, can call itself on smaller inputs. For symbolic retrieval, the handler may invoke an API returning retrieved context used by downstream prompts. Subtasks may be further decomposed until they are “simple enough”—as determined by prompt performance, user-defined thresholds, or cost constraints.
4. Task Domains, Representative Use Cases, and Results
DecomP methods have been demonstrated across a diverse range of domains:
| Domain | Task Example | Baseline | DecomP Result |
|---|---|---|---|
| Symbolic | th-letter concatenation (3–5 words) | CoT: 60%; LtM: 51% | 100% |
| Algorithmic | List reversal, lengths 4–10 | CoT: 0% | 100% |
| Synthetic QA | CommaQA-E multi-hop | CoT: 64.2 | 70.4 |
| Open-domain QA | 2Wiki, MuSiQue, HotpotQA | F1 up to 73.5 | +6–10 F1 over baseline |
| Math Word Problems | GSM8K | CoT: 36.0 | 50.6 |
These improvements are replicated across models: GPT-3, Codex, Flan-T5 (0.7B, 3B, 11B), and others.
The approach enables fully modular, flexible systems that can outperform chain-of-thought and least-to-most prompting—especially for tasks:
- with compositional complexity,
- requiring external retrieval,
- necessitating recursive breakdowns,
- or when individual reasoning primitives are hard to prompt directly.
5. Symbolic Component Integration
A defining feature of DecomP is the seamless integration of symbolic modules within decomposed LLM flows. For example, the open-domain QA scenario:
- The decomposer issues a
[retrieve]instruction, - The handler calls a symbolic retriever (e.g., Elasticsearch) to obtain top- evidence paragraphs,
- Subsequent LLM sub-task handlers consume retrieved context using small, well-targeted prompts (e.g., single-hop QA on specific evidence).
This hybrid pipeline improves both performance and reliability, allowing non-differentiable symbolic computation alongside flexible LLM reasoning.
6. Analysis, Limitations, and Best Practices
DecomP's principal strengths are modularity, composability, and explainability:
- Subtask prompts can be tuned, diagnosed, and upgraded in isolation.
- Handler sharing enables rapid adaptation to new tasks using existing subcomponents.
- Error analysis is facilitated by explicit subtask outputs and the overall decomposition trace.
However, known limitations include:
- Manual engineering cost in designing high-quality task decompositions and handler prompts,
- Cascading errors: mistakes in early subtasks can propagate unless mitigated by post-hoc correction routines,
- Elevated inference cost and latency: increased LLM calls per complex instance; thus, decomposition granularity should be tuned to balance prompt complexity and cost.
Recommended practices:
- For recursive tasks, define base-case thresholds to avoid over-decomposition.
- Integrate symbolic functions by default where accuracy or efficiency exceeds few-shot LLMs.
- Debug and benchmark handlers individually before full system integration.
- Maintain a shared, versioned prompt library for reproducibility and rapid iteration.
7. References and Future Directions
Decomposed Prompting as formalized here follows the methodology and empirical framework of "Decomposed Prompting: A Modular Approach for Solving Complex Tasks" (Khot et al., 2022), with connections to least-to-most prompting, hybrid neuro-symbolic QA, and recursive prompt programming.
Current research explores:
- Automated or learned decomposition policies,
- Dynamic control of decomposition granularity,
- Integration with retrieval-augmented models and agentic decision planners,
- Effective caching, batching, and parallelization strategies for large-scale deployments.
Benchmark code, datasets, and prompt repositories are available to enable reproduction and extension of the core DecomP paradigm (Khot et al., 2022).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free