Least-to-Most Prompting (LtM)

Updated 16 December 2025

Least-to-Most Prompting (LtM) is a paradigm that decomposes complex problems into ordered, simpler subproblems solved sequentially.
It clearly separates the decomposition phase from the solution phase, reducing error propagation and enhancing compositional reasoning by chaining previous subanswers.
LtM has shown robust improvements in tasks such as symbolic manipulation, mathematical reasoning, and text-to-SQL, outperforming standard and chain-of-thought approaches.

Least-to-most prompting (LtM) is a decomposition-based prompting paradigm for LLMs that systematically decomposes complex problems into ordered sequences of simpler subproblems, which are then solved incrementally. Each step conditions on prior intermediate answers, enabling robust compositional generalization, effective handling of “harder than seen” test cases, and the construction of explicit, auditable reasoning chains. LtM contrasts with standard chain-of-thought reasoning by explicitly separating decomposition from the solution phase and enforcing a monotonic “least” to “most” complexity ordering of subproblems (Zhou et al., 2022, Schulhoff et al., 6 Jun 2024).

1. Formal Definition and Core Algorithm

LtM operates in two principal stages: (1) Decomposition and (2) Sequential solution. Let $P$ denote the original complex problem. A decomposition function $D$ breaks $P$ into an ordered list of $k$ subproblems:

$[P_1, P_2, \ldots, P_k] = D(P)$

The LLM then solves each subproblem $P_i$ in sequence, conditioning on all previously obtained answers $A_1, \dots, A_{i-1}$ :

$A_i = \mathrm{LM}\Big(\text{Prompt}_{\mathrm{solve}}; \bigcup_{j<i}(P_j,A_j) \;\Vert\; "Q: P_i"\Big)$

This process returns $A_k$ , the final answer. Solve prompts typically include few-shot Q/A exemplars and the chain of previously solved subproblems.

Pseudocode excerpt (LaTeX style, as in (Zhou et al., 2022)):

Input: Complex problem P
Stage 1: [P1,...,Pk] = LM_θ(Prompt_decomp)
Stage 2: C = [solving exemplars]
for i = 1..k:
    Prompt_i = C ∪ { (Pj,Aj) for j<i } ∪ { "Q: Pi" }
    Ai = LM_θ(Prompt_i)
    C = C ∪ { (Pi,Ai) }
return Ak

LtM applies identically across domains such as symbolic manipulation, mathematical word problems, and semantic parsing (Zhou et al., 2022, Schulhoff et al., 6 Jun 2024, Tai et al., 2023, Arora et al., 2023, He et al., 28 Mar 2024).

2. Motivations, Design Principles, and Theoretical Characteristics

The explicit decomposition in LtM addresses the limitations of chain-of-thought (CoT) prompting in generalizing from easy to harder problems, especially those outside the distribution of reasoning patterns present in the exemplars (Zhou et al., 2022, Schulhoff et al., 6 Jun 2024). By making subproblem structure explicit, LtM imposes a dependency graph over intermediate facts, reduces cognitive burden (for the model), and enables each solution step to be tightly grounded in prior context. The ordering from “simplest” (least) to “most complex” (most) systematically exposes latent compositionality and recurses along natural problem gradations (Schulhoff et al., 6 Jun 2024).

Granularity: Subproblems should be at or just beyond the LLM’s comfort threshold (e.g., constant-length symbolic manipulations, single arithmetic steps).

Number of subproblems ( $k$ ): $k$ should scale with overall problem complexity to ensure all compositional structure is explicitly traversed (Zhou et al., 2022).

Prompt structure:

Stage 1: Decomposition exemplars + target question
Stage 2: Solving exemplars + previously solved (Q,A) pairs + target subproblem (Zhou et al., 2022, He et al., 28 Mar 2024)

Implementation best practices:

Use compact representations to satisfy LLM token limits (e.g., Python code, concise statements)
Apply deterministic (temperature=0) decoding for arithmetic/symbolic steps, increased temperature if solution diversity is required (Schulhoff et al., 6 Jun 2024)
Explicitly chain intermediate subanswers into each prompt

3. Illustrative Examples and Domain Adaptations

Symbolic manipulation: For a last-letter-concatenation task, initial decomposition produces growing sublists (e.g., “think, machine”, “think, machine, learning”), each step reusing prior intermediate outputs (Zhou et al., 2022).

Mathematical reasoning: For multi-step word problems (e.g., GSM8K), decomposition identifies atomic arithmetic steps (e.g., “How many apples does Anna have?”), and each is solved in order, facilitating compositional depth exceeding prompt exemplars (Zhou et al., 2022, Schulhoff et al., 6 Jun 2024).

Text-to-SQL: LtM divides natural language queries into sub-questions, each targeting a partial SQL, with succeeding prompts building upon previously generated SQL fragments. For instance, an original question on female students might decompose into (1) “Show [fields] for all students”, then (2) “Show [fields] for all female students” (Tai et al., 2023, Arora et al., 2023).

Medical dialogue generation (BP4ER): Each dialog turn is recast as a triple of sub-questions: (1) patient state, (2) diagnostic decision, (3) physician response, with each subsequent sub-question’s prompt including prior answers. Bootstrapping is applied to correct and diversify the intermediate reasoning chains (He et al., 28 Mar 2024).

Table: Exemplars of LtM Decomposition (selected domains) | Domain | Decomposition Example | Reference | |-----------------------|----------------------------------------------------------------------------------------------------|---------------| | Symbolic (letters) | ["think, machine", "think, machine, learning", ...] | (Zhou et al., 2022) | | GSM8K math reasoning | ["How many apples does Anna have?", "How many together?"] | (Zhou et al., 2022) | | Text-to-SQL | ["Show fields for all students.", "Show fields for all female students."] | (Tai et al., 2023) | | Medical Dialogue | ["Patient state?", "Diagnostic decision?", "Physician reply?"] | (He et al., 28 Mar 2024) |

4. Empirical Results and Comparative Performance

LtM yields substantial improvements over Standard and CoT prompting across diverse benchmarks, especially those featuring compositional depth.

Table: Representative accuracy results for LtM vs. alternatives ((Zhou et al., 2022) Table 2,3; (Tai et al., 2023) Table 3; (Arora et al., 2023) Table 4):

Task/Model	Standard	CoT	LtM	Domain
SCAN (code-davinci-002)	16.7%	16.2%	99.7%	Symbolic
GSM8K (code-davinci-002)	17.1%	60.9%	62.4%	Math
DROP non-football	58.8%	74.8%	82.5%	Reading comp
Spider Dev (text2SQL)	63.2%	56.8%	66.0%	SQL parsing

Key findings:

On the compositional SCAN benchmark, LtM achieves near-perfect accuracy (99.7%) with only 14 exemplars, outperforming CoT by over 80 points (Zhou et al., 2022).
On multi-step math (GSM8K), LtM surpasses both Standard and CoT prompting.
In text-to-SQL, LtM lifts test-suite accuracy over CoT and Standard prompts, though further improvements are observed with single-pass decomposition variants (Tai et al., 2023, Arora et al., 2023).
In medical dialogue, explicit LtM in BP4ER raises BLEU-1 and ROUGE-1 by 5–8 points over non-decomposed prompting, with ablation confirming that removing LtM sub-questions yields sharp metric drops (He et al., 28 Mar 2024).

Performance gains are especially pronounced in “generalization harder than seen in context,” indicating that explicit decomposition enables models to synthesize longer or deeper compositions than implicitly scaffolded reasoning (Zhou et al., 2022, Schulhoff et al., 6 Jun 2024).

5. Error Modes, Limitations, and Variants

Error propagation: Early mistakes in intermediate subproblems (e.g., erroneous SQL fragment or miscomputed intermediate result) cannot be “unwound” by later steps. This phenomenon is especially pronounced in regularized, multi-stage pipelines such as text-to-SQL, where partial SQL errors propagate rigidly (Tai et al., 2023).

Scaling and token limits: As $k$ (number of subproblems) grows, context windows may be exhausted, requiring trading off subproblem granularity for prompt budget (Schulhoff et al., 6 Jun 2024).

Comparison with related methods:

Chain-of-Thought (CoT): Interleaves reasoning and answering in a monolithic prompt; LtM’s explicit decomposition reduces hallucination and sharpens compositional generalization.
Plan-and-Solve: Collapses decomposition and solution steps; LtM’s explicit subquestion ordering provides greater control (Schulhoff et al., 6 Jun 2024).
Tree-of-Thought: Explores a branching search tree over solutions, trading off linear simplicity for broader candidate exploration (Schulhoff et al., 6 Jun 2024).
DECOMP: Integrates external function-calling in the solution chain; LtM stays entirely within prefix prompting (Schulhoff et al., 6 Jun 2024).
BP4ER bootstrapping: BP4ER appends answer-providing and prompt-revision bootstraps to LtM to correct errors in intermediate answers and thereby enhance overall reasoning faithfulness (He et al., 28 Mar 2024).

6. Practical Guidelines for Effective LtM Prompting

Key principles for designing robust and efficient LtM prompts include (Zhou et al., 2022, Schulhoff et al., 6 Jun 2024, Tai et al., 2023, He et al., 28 Mar 2024):

Subproblem granularity: Steps should not be excessively fine-grained to mitigate propagation risk, nor too coarse to overload single model calls.
Number and selection of exemplars: 8-shot context is empirically optimal in text-to-SQL; too few exemplars degrade accuracy.
Prompt structure: Demarcate decomposition and solution phases, always concatenate prior sub-answers to the current prompt.
Domain adaptation: In cross-domain settings, adapt exemplars to the new schema via offline generic prompt construction and domain-adapted few-shots to maximize operator coverage and schema diversity (Arora et al., 2023).
Bootstrapping: When error rates are critical (e.g., medical dialogue), append post-processing bootstraps that correct chains of intermediate rationales or prompt variations to ensure answer consistency and robustness (He et al., 28 Mar 2024).

7. Applications, Extensions, and Outlook

LtM underpins cutting-edge performance in compositional generalization (SCAN, CFQ), symbolic manipulation, mathematical reasoning, multi-step dialogue, and semantic parsing tasks (Zhou et al., 2022, Tai et al., 2023, He et al., 28 Mar 2024, Arora et al., 2023). Recent systematic surveys confirm its ranking among top decomposition techniques across symbolic, arithmetic, and compositional tasks (Schulhoff et al., 6 Jun 2024).

Variants and extensions such as Plan-and-Solve, Tree-of-Thought, and Skeleton-of-Thought adapt the decomposition and solving phases to parallel, hierarchical, or multi-branch settings, balancing depth of reasoning, parallelism, and ease of prompt engineering (Schulhoff et al., 6 Jun 2024).

A plausible implication is that as LLM context window sizes grow, and as automated decomposition and bootstrapping improve, LtM and its lineage will continue to yield robust advances in multi-step reasoning and out-of-distribution generalization, particularly in settings demanding explicit, interpretable chains of reasoning.