Papers
Topics
Authors
Recent
Search
2000 character limit reached

Question Decomposition Strategy

Updated 6 February 2026
  • Question Decomposition Strategy is a method that splits complex, multi-step questions into simpler parts, improving processing efficiency and accuracy in QA systems.
  • It employs various techniques such as explicit, neural, and graph-based decomposition to isolate critical reasoning steps and enhance interpretability.
  • Its modular design enables effective answer composition across modalities, yielding measurable performance gains in diverse QA benchmarks.

A question decomposition strategy is a set of algorithmic and architectural techniques that divides a complex, multi-step, or compositional question into simpler sub-units—typically simpler sub-questions or structured fields—whose answers or representations can be more efficiently and reliably processed, and then recombined to obtain the final answer. Such strategies play a pivotal role in question answering (QA) systems that must resolve reasoning paths across unstructured text, structured knowledge bases, or multimodal data. Question decomposition enables these systems to address compositionality, improve interpretability, boost faithfulness of explanation, and in certain data regimes, substantially improve accuracy.

1. Principles and Formal Definitions

A question decomposition strategy always involves (at least) two mappings:

  • Decomposition: A function (or sequence transducer) that takes a complex input question QQ and outputs a sequence or graph-structured set of simpler units (e.g., D(Q)={q1,,qn}D(Q) = \{q_1,\ldots, q_n\} for subquestions, or a tree/graph for hierarchical decompositions).
  • Composition: A second function CC that aggregates the answers (or intermediate representations) from sub-units back into a final answer (typically A=C(a1,,an)A = C(a_1, \ldots, a_n)).

Depending on the setting and modality, decomposition may be explicit (human annotation, model output as text, or structural representation) or implicit (latent fields, operator tokens, attention distributions). Some representative formalizations include:

  • Explicit decomposition: D:Qq1,...,qkD: Q \mapsto \langle q_1, ..., q_k\rangle, with ai=QA(qi,),a^=C(a1,...,ak)a_i = \operatorname{QA}(q_i, \cdot), \quad \hat{a} = C(a_1, ..., a_k) (Wei et al., 2022).
  • Field-based structural decomposition: Each question and document is represented as nn aligned fields q={q1,...,qn}q = \{q_1, ..., q_n\}, dt={d1t,...,dnt}d^t = \{d_1^t, ..., d_n^t\}, scored via per-field retrieval and weighted sum aggregation (Jurczyk et al., 2016).
  • Hierarchical/graph/tree-based decompositions: E.g., question decomposition trees (QDTs) (Huang et al., 2023), hierarchical question decomposition trees (HQDTs) (Zhang et al., 2023).
  • Mixed-modality extensions: Visual question decomposition with sub-question generation and selective composition (Zhang et al., 2024, Khan et al., 2023).

2. Algorithmic and Model Strategies

Contemporary decomposition methodologies can be categorized broadly as follows:

A. Structural/Linguistic Decomposition

Deploying NLP techniques (POS, dependency, SRL, coreference) to map questions into fields or mini-documents that are then indexed, scored, and recomposed using statistical or search-based frameworks (Jurczyk et al., 2016). Each field corresponds to a bag-of-terms (lexical, syntactic, semantic), and retrieval incorporates per-field TF-IDF scoring, weighted by learnable coefficients.

B. Learned Neural Decomposition

Trainable neural modules (pointer networks, seq2seq LMs, BART/T5/Flan-T5-based decomposers) generate either explicit sub-questions (as text) or intermediate operator chains. Learning may be fully supervised (Guo et al., 2022, Min et al., 2019), weakly supervised via hard-EM or unsupervised round-trip objectives (Perez et al., 2020), or involve distant/pretrain signal mining from comparable texts (Zhou et al., 2022).

C. Graph/AMR-Based Decomposition

Mapping questions to semantic (AMR) or logical (SPARQL, SQL, QPL) graphs, then partitioned via graph segmentation or operator mapping to produce sub-questions tied to entities or compositional structures. Subsequent sub-question generation uses graph-to-text models (Deng et al., 2022, Eyal et al., 2023).

D. Hierarchical and Tree-Based Methods

Modeling decompositions as trees (e.g., QDT, HQDT) that organize sub-questions via composition, conjunction, and constraints. Two-stage neural architectures (“Clue-Decipher,” BART-based decomposers) first hypothesize separator placements and then align to the question, ensuring coverage of all original tokens and compositional types (Huang et al., 2023, Zhang et al., 2023).

E. Prompt-based, In-Context and Retrieval-Augmented Decomposition

Zero- or few-shot in-context learning to induce decomposition ability from existing datasets or tasks (e.g., ICAT via Frechet Term-Embedding Distance for exemplar selection (V et al., 2023)). Retrieval-augmented pipelines intertwine LLM-driven decomposition with document or passage retrieval, followed by answer composition after cross-encoder reranking (Ammann et al., 1 Jul 2025).

F. Reward-Shaped and Selective Decomposition

Fine-tuning LLMs to pose only those sub-questions that provably improve downstream QA performance, with reward functions based on answer flipping and preservation (Wang et al., 1 Oct 2025). Selective decomposition mechanisms decide, per question, whether decomposition is needed, achieving substantial improvements in multimodal and medical VQA (Zhang et al., 2024, Khan et al., 2023).

3. Applications Across QA Modalities

The decomposition paradigm is highly modular, enabling its application across textual, tabular, visual, and knowledge-base QA.

  • Open-Domain and Multi-Hop Textual QA: Decomposition recasts multi-hop queries into factored chains, generally producing large gains in supporting-fact retrieval, multi-hop reasoning, and end-to-end F1 under low- and medium-data regimes (Min et al., 2019, Wei et al., 2022, Guo et al., 2022, Perez et al., 2020).
  • Knowledge Base QA (KBQA): Tree-based and QPL decomposition strategies map complex compositions into sequences of SPARQL or SQL program steps, often yielding performance boosts and improved interpretability (Huang et al., 2023, Eyal et al., 2023).
  • Quantitative Reasoning / Financial QA: Reward-pruned, domain-specific decomposition can focus LLM attention on a single crucial supporting sub-question (average 1.2 per QA) and surpass long CoT chains, both in accuracy and inference efficiency (Wang et al., 1 Oct 2025).
  • Multimodal / Visual QA: Visual QA models benefit from decomposing high-level, semantic questions into a blend of low-level perceptual sub-questions and conceptual reasoning steps; selective decomposition policies further increase SOTA accuracy, especially in domain-specialized and low-data regimes (Zhang et al., 2024, Khan et al., 2023).

4. Effectiveness, Empirical Results, and Limitations

Question decomposition can dramatically improve several key metrics, often dependent on the regime:

Setting/Strategy Typical Gain Notes
Structural fields (bAbI, all λ) +>40 pp MAP Multi-field, learned weights (Jurczyk et al., 2016)
Hard compositional RC (DROP, DecompRC) +6–7 F1 Supervised pointer-based (Min et al., 2019)
Weakly/unsupervised multi-hop (ONUS, QDAMR) +2–8 F1 No gold decompositions (Perez et al., 2020, Deng et al., 2022)
RAG multi-hop retrieval (QD+RR) +11.6% F1 / +36% MRR Zero-shot, passage-level (Ammann et al., 1 Jul 2025)
Financial QA (EQD) +0.6–10.5% EM Only 1–1.2 sub-questions generated (Wang et al., 1 Oct 2025)
Zero-shot VQA, selective-decomp. +6–26 pp acc. Largest gains in medical/art tasks (Khan et al., 2023)

Major considerations and limitations identified:

  • Decomposition improves performance most in low-data or high-compositionality regimes. As labeled data increases, end-to-end seq2seq models can implicitly learn decompositions through attention and hidden representations (Wei et al., 2022).
  • Over-decomposition (producing many sub-questions when a single supporting fact suffices) may harm performance due to error propagation or retrieval overhead (Wang et al., 1 Oct 2025, Ammann et al., 1 Jul 2025).
  • Faithfulness improves substantially: factored decomposition can force LLMs to actually condition the answer on sub-answers, reducing “post-hoc rationalization” and hidden sycophancy compared to Chain-of-Thought (Radhakrishnan et al., 2023).
  • Quality of sub-question generation is critical; annotation or data refinement on repetition, groundedness, and relevance is central in multimodal settings (Zhang et al., 2024).
  • Error propagation and ill-posed sub-questions remain a concern in explicit explicit or unsupervised chains (Wei et al., 2022, Patel et al., 2022).
  • For knowledge-integration tasks, hierarchical structures (trees) provide more robust fusion of heterogeneous sources than flat lists (Zhang et al., 2023).

5. Interpretability, Faithfulness, and Explainability

Decomposition strategies provide intrinsic interpretability—each sub-question constitutes an explicit, human-auditable reasoning step. This property underpins both user trust and fault localization in complex QA systems.

  • Factored and field-based decompositions expose intermediate structures (fields, trees, graphs) that clarify decision provenance (Jurczyk et al., 2016, Huang et al., 2023, Zhang et al., 2023).
  • Explicit answer composition footsteps (e.g., in RoHT/HQDT) allow probabilistic merging of KB-supported, text-supported, and recursively derived answers, with global likelihoods available for each branch (Zhang et al., 2023).
  • Faithfulness metrics—including truncation sensitivity, corruption sensitivity, and bias-induced accuracy drop—objectively quantify how tightly final predictions depend on generated reasoning, highlighting the superiority of decomposition over CoT on these axes (Radhakrishnan et al., 2023).

6. Practical Guidance and Future Directions

  • When Decomposition Helps: Most beneficial in few-shot, zero-shot, or cross-domain adaptation settings; for difficult, compositional, or multi-hop questions; and for retrieval-augmented or knowledge-integration tasks (Wei et al., 2022, Ammann et al., 1 Jul 2025).
  • Modularity and Transfer: Decomposers can be trained or selected via in-context transfer (e.g., ICAT), compositional data augmentation, or reward shaping, with plug-and-play architectures in most modern LLM or RAG pipelines (V et al., 2023, Wang et al., 1 Oct 2025, Ammann et al., 1 Jul 2025).
  • Extension to New Modalities: Finetuned multimodal models (VQD) and multimodal knowledge integration (HQDT, selective decomposition) represent state-of-the-art modular designs for explainable VQA and XQA (Zhang et al., 2024, Zhang et al., 2023).
  • Limitations and Cautions: Careful calibration is required to avoid over-decomposition and error cascades; joint training and dynamic decomposition length selection are open areas for research. Sub-question generation quality, particularly in highly compositional or domain-specific settings, is pivotal.

A plausible implication is that as LLM reasoning abilities improve and implicit decomposition becomes more salient in deep models, the value of explicit decomposition may further shift toward interpretability, reliability, and modularity, rather than pure accuracy, especially in mission-critical or high-risk domains.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Question Decomposition Strategy.