QDMR: Structured Question Decomposition

Updated 19 November 2025

QDMR is a structured formalism that decomposes complex natural language questions into sequential atomic reasoning steps like select, filter, and project.
It enables the mapping of decompositions to dependency graphs or typed programs, facilitating applications in text-to-SQL, synthetic context generation, and multi-step QA.
Modeling approaches leverage both seq2seq and graph-based parsers to enhance interpretability, robustness, and inference speed in multi-step reasoning systems.

Question Decomposition Meaning Representation (QDMR) is a structured formalism for compositional natural-language question understanding and reasoning, enabling the decomposition of complex questions into interpretable sequences of atomic reasoning steps. Each QDMR instance provides a sequence of explicit operations—such as select, filter, project, group, and aggregate—whose compositional execution yields the final answer. QDMR serves as an intermediary semantic layer between open-domain question answering and both formal (e.g., SQL) and semi-formal (e.g., chain-of-thought) reasoning systems, supporting robust training, evaluation, and interpretability of multi-step reasoning models (Hasson et al., 2021, Zhu et al., 2023, Trivedi et al., 2022, Wolfson et al., 2021).

1. Formal Structure and Semantics of QDMR

A QDMR is defined as an ordered sequence of reasoning steps for a question $q$ , where each step is either a basic operation over the text or a transformation grounded in the answer to a prior step. Formally, given $q = \langle q_1, \ldots, q_n \rangle$ (the tokenized input question), a QDMR decomposition is

$S = \langle q;\;s^1, \ldots, s^m \rangle,$

where each $s^i$ is a short, human-readable "atomic question," each with an operator $o^i$ from a finite set $O = \{\texttt{SELECT}, \texttt{FILTER}, \texttt{PROJECT}, \ldots\}$ , operator-specific properties $\rho^i \in \text{PROP}(o^i)$ , and arguments $A^i: A(o^i) \to \{\text{token-spans from } q \text{ or prior steps}\}$ (Hasson et al., 2021).

Semantically, QDMR decompositions encode directed acyclic computation graphs: each reasoning step's output can be referenced via a variable (e.g., "#2"), forming explicit data flow and dependency structure (Trivedi et al., 2022, Wolfson et al., 2021). The compositionality of QDMR is foundational for constraining QA models to exhibit interpretable, stepwise reasoning.

2. QDMR Representations and Conversion to Programs

QDMR steps are typically linearized as tuples $(o^i, \rho^i, A^i)$ , but can equivalently be represented as dependency graphs $G=(V,E)$ over tokens, where each edge encodes logical or argument relations defined by QDMR operators and step structure (Hasson et al., 2021). QDMRs can also be programmatically mapped to "typed programs" $\langle f_1, ..., f_n \rangle$ of primitive functions, each with explicit input/output types such as \texttt{List[Entity]}, \texttt{Int}, or \texttt{Bool} (Trivedi et al., 2022). For example:

QDMR step: "project yard_line of #1 $\rightarrow$ #2"
Program: \texttt{#2 = project(input=#1, field="yard_line")}

Typed program conversions enable direct grounding of QDMRs into executable logical forms for downstream applications, such as synthesis of SQL queries or numerical computation pipelines (Wolfson et al., 2021, Trivedi et al., 2022).

3. Modeling Approaches: Parsing and Supervision

Two dominant modeling paradigms exist for QDMR parsing:

Sequence-to-Sequence (Seq2seq) Models: Standard transformer encoder-decoder architectures (e.g., T5, BART, CopyNet–BERT) linearize QDMRs as token sequences and generate decompositions autoregressively. Gold QDMR sequences are used for direct supervision; copy mechanisms are common for referencing question spans (Hasson et al., 2021, Zhu et al., 2023).
Graph-Based Non-Autoregressive Parsers: Dependency-graph parsers, following a Biaffine style, treat tokens as graph nodes, predicting labeled edges that encode argument/operator relations. All edge predictions are made simultaneously, enabling a single forward pass without stepwise decoding. This approach yields $\approx16\times$ inference speed-up compared to seq2seq, moderately improved sample complexity, and better robustness to domain shift, at a small loss of maximum accuracy (Hasson et al., 2021).

Auxiliary supervision—via jointly training seq2seq models with graph-based losses using Relation-Aware Transformers (Latent-RAT)—can further improve generalization and robustness, especially for long or compositional questions (Hasson et al., 2021).

4. Applications in Multistep Question Answering and Pretraining

QDMR enables explicitly compositional modeling in multistep question answering (QA), supporting:

Chain-of-Questions (CoQ) Training: QDMR guides LLMs to generate and answer sub-questions sequentially, treating sub-answers as latent variables. Training leverages a hybrid of Hard-Expectation Maximization (Hard-EM) for strong early signal and Memory Augmented Policy Optimization (MAPO) for late-stage convergence. CoQ demonstrates marked robustness gains over conventional neuro-symbolic and LLM methods on adversarial and contrast datasets (e.g., +16.8 to +24.3 F1 over GPT-3.5 on DROP/HOTPOTQA adversarial sets) (Zhu et al., 2023).
Synthetic Context Generation: Mapping QDMR decompositions to typed programs allows the systematic creation of "hard" synthetic contexts (TeaBReaC dataset), designed to enforce non-trivial, stepwise reasoning by the model while preventing shortcut exploitation. Pretraining LLMs on TeaBReaC yields substantial F1 improvements (+4–13, up to +20 on complex questions) and higher robustness on multi-step QA tasks (Trivedi et al., 2022).
Text-to-SQL Pipeline: QDMR enables a weakly supervised approach to text-to-SQL parsing, where QDMRs (manual or predicted) serve as intermediate representations. SQL queries are synthesized algorithmically from (question, QDMR, answer) triples using execution-guided search and repair heuristics, achieving 91–97% of fully supervised SQL accuracy and 86–93% with only predicted QDMRs, thereby obviating SQL annotation (Wolfson et al., 2021).

5. Empirical Evaluation and Comparative Results

Empirical analyses highlight QDMR's strengths as both a modeling target and as auxiliary guidance for multi-step reasoning:

Parsing Accuracy: On the BREAK dataset, state-of-the-art seq2seq (BART) achieves LF-EM of 0.496, CopyNet+BERT 0.470, and Biaffine graph parser 0.440, with a 16× inference speed-up for Biaffine over CopyNet+BERT (Hasson et al., 2021).
Domain Generalization and Sample Complexity: Non-autoregressive graph approaches and Latent-RAT (auxiliary graph supervision) exhibit smaller performance drops under domain shift (e.g., –43.1% vs. –50% for standard seq2seq), and require less data to reach a fixed performance threshold (notably +3–5 points LF-EM at 1–10% data) (Hasson et al., 2021).
Compositional QA and Robustness: Chain-of-Questions with QDMR supervision yields +9.0 F1 over strong neuro-symbolic methods and +24.3 F1 over GPT-3.5 on complex adversarial QA sets. These methods maintain performance under adversarial distributional shifts, demonstrating QDMR's capacity to encode and enforce genuine multi-step reasoning (Zhu et al., 2023, Trivedi et al., 2022).

6. Interpretability, Debugging, and Downstream Impact

QDMR's explicit compositional structure enables fine-grained interpretability and debugging:

The dependency-graph and stepwise forms provide token- or span-level explanation for each stage of reasoning, elucidating the execution role of every question part (Hasson et al., 2021).
Intermediate representations can be visualized, traced, or mapped to formal languages (e.g., SQL, typed programs), facilitating error analysis and annotation refinement (Wolfson et al., 2021, Trivedi et al., 2022).
QDMR-guided pipelines serve as bridges for bootstrapping weak supervision in domains lacking formal annotations, and as scaffolding for curriculum learning and data synthesis (Wolfson et al., 2021, Trivedi et al., 2022).

7. Datasets and Empirical Resources

QDMR-based benchmarks and resources include:

Name	Description	Reference
BREAK	Crowd-annotated QDMR decompositions	(Hasson et al., 2021)
TeaBReaC	Synthetic QDMR-guided multi-step QA	(Trivedi et al., 2022)
D_QDMR(+/bronze)	Human and model-predicted QDMRs	(Zhu et al., 2023)

QDMR annotations cover diverse QA domains: DROP (numerical), ComplexWebQuestions (web), HotpotQA (multi-hop), SPIDER (text-to-SQL), ComQA (factoid), and ATIS (spoken). TeaBReaC contains 525K synthetic multistep instances with 900+ reasoning patterns, supporting both pretraining and robust evaluation.

In sum, QDMR provides a rigorous, adaptable meaning representation for compositional question answering. It supports interpretable, data-efficient, and robust multi-step reasoning across neural and symbolic paradigms, and underpins principled approaches to program synthesis, data augmentation, and domain adaptation in complex QA tasks (Hasson et al., 2021, Zhu et al., 2023, Trivedi et al., 2022, Wolfson et al., 2021).