QTree Construction for Hierarchical Querying

Updated 6 February 2026

QTree Construction is a structured 3-level, 3-ary tree that hierarchically decomposes a base query into subqueries.
It generates subqueries with strict inclusion or exclusion constraints to form connected, four-node outlines for comprehensive retrieval.
The construction pipeline integrates LLM-based generation, heuristic repair, and preference optimization to benchmark retrieval performance.

A QTree (Query Tree) denotes a specific hierarchical structure used for organizing subqueries in coverage-conditioned retrieval-augmented generation, with prominent recent instantiations represented in the QTree dataset for outline construction and evaluation in coverage-conditioned ( $C^2$ ) querying. In this context, a QTree is a strictly regular 3-level $k$ -ary tree (with $k=3$ , depth $D=3$ ), designed to systematically unfold the subtopic space of a base user question, enabling the creation of principled, efficiently searchable outlines that satisfy complex inclusion/exclusion criteria. QTree construction is utilized for benchmarking and optimizing LLM-guided information search, particularly within large-scale retrieval systems and preference-tuned outline planning (Kim et al., 2024).

1. Formal Definition and Notation

A QTree $T_{q_{base}}$ is defined on the basis of a “base” question $q_{base}$ and is composed as follows:

Depth: $D=3$
Branching factor: $b=3$
Node set: $V = \{q_{i_1 \dots i_d} \mid 1 \leq d \leq D,\, i_j \in \{1,\dots,b\}\}$ , with $|V| = \sum_{d=1}^D b^d = 39$ (excluding the root)
Each node $q_{i_1\dots i_d}$ represents a natural-language subquestion recursively refining $q_{base}$
Edge set: $E = \{(q_{i_1\dots i_{d-1}}, q_{i_1\dots i_d})\mid 1\leq d\leq D\}$ , forming an arborescence rooted at $q_{base}$

The tree is rooted, ordered, and strictly uniform: every parent at level $d$ has exactly $b$ children unless at maximum depth. Subquery identifiers are canonicalized as $q_1$ , $q_{2\,1}$ , $q_{2\,1\,3}$ , etc., to denote hierarchical position.

Coverage-conditioned queries or $C^2$ queries augment $q_{base}$ with a “coverage query” $q_{cov}$ specifying inclusion or exclusion of particular nodes/subtrees within $T_{q_{base}}$ .

2. Construction Pipeline

The QTree construction for the QTree dataset follows a multi-stage process (Kim et al., 2024):

A. Base Query Collection:

Base questions are sourced from datasets targeting information-seeking and expert queries (ASQA, Longform, ExpertQA)
Data cleaning removes extraneous or malformed entries, yielding $\sim10\,577$ unique train/test base queries

B. Hierarchical Decomposition (QTree Generation):

For each $q_{base}$ , an LLM (e.g., GPT-4) is prompted to generate a 3-level, strictly 3-ary hierarchy of subquestions
Requirements: depth and branching fixed at 3, all nodes in question format, no duplicates or semantic overlap
Outputs are filtered for shape validity; malformed generations are heuristically repaired or re-prompted

C. Coverage Query Synthesis:

For each tree $T_{q_{base}}$ , a node $s \in V$ is sampled as a “background query”
An intent operation (inclusion/exclusion) is randomly assigned
Coverage query $q_{cov}$ is synthesized via LLM prompt templates such as “Please include/avoid topic X”
For each base query, a single $C^2$ query $[q_{base};q_{cov}]$ is retained after parsing candidate coverages

D. Outline Candidate Generation:

For each $C^2$ query, the construction algorithm generates 3 candidate outlines by sequentially prompting the LLM to extract a connected set $O$ of exactly 4 nodes from $T_{q_{base}}$ that honor the coverage constraint
Outlines are validated to be connected (as path/tree subgraphs), deduplicated, and coverage-compliant (“inclusion” implies $O \cap$ subtree $(s)\neq\emptyset$ ; “exclusion” implies $O\cap$ subtree $(s) = \emptyset$ )

3. Coverage Constraints and Objective Metrics

Coverage-conditioned querying is formalized as follows:

For “Inclusion” intents, an outline $O\subseteq V$ must intersect the subtree rooted at $s$
For “Exclusion” intents, $O$ must be disjoint from subtree $(s)$
Outlines are scored by an LLM judge (GPT-4) on a 1–5 scale, representing the degree of compliance with $C^2$ constraints

These scores are used not only for dataset annotation but also as reward signals for subsequent preference-alignment in outline generation models.

4. QPlanner Model and Training Paradigm

The resulting dataset is leveraged to train "QPlanner," a 7B-parameter Llama-2-based model:

Model input: $C^2$ query, consisting of the base question and the coverage intent
Model output: a full QTree $T$ plus a 4-item compliant outline
Training is conducted via:
- Supervised fine-tuning (SFT) on $(C^2,\,O)$ pairs
- Direct Preference Optimization (DPO), which uses pairwise outline preferences derived from LLM-annotated scores, with a KL-regularization penalty imposed against a reference policy. Only the highest and lowest scored outlines per $C^2$ are used as preference signals.

DPO objective:

$L_{DPO} = \mathbb{E}_{(C^2, O^+, O^-)}\left[\log\sigma(\pi_\theta(O^+|C^2) - \pi_\theta(O^-|C^2)) \right]$

where $\sigma(\cdot)$ is the logistic sigmoid, $\pi_\theta$ is QPlanner's policy, and $\beta=0.01$ controls the KL term.

5. Illustrative Example: End-to-End Construction

For $q_{base}$ = “Describe the film The Woman Hunt”, the generated QTree features major facets such as plot, production, and reception. Suppose $q_{cov}$ is an exclusion of the “reception” subtree. The outline construction process yields an output such as:

“What is the plot of The Woman Hunt?”
“What are the main events in The Woman Hunt?”
“What initiates the conflict in The Woman Hunt?”
“What is the climax of The Woman Hunt?”

This outline forms a valid, connected subset, excludes nodes under the reception branch, and passes all structural and coverage constraints. An LLM judge rates the output, and the observed score contributes to preference-based training.

6. Experimental Results and Empirical Analysis

Empirical evaluation demonstrates:

All candidate outlines generated by QPlanner (as preference-tuned) better satisfy $C^2$ coverage criteria compared to SFT-only and baseline models, per both LLM and human judges
QTree structures support fine-grained, interpretable filtering and composition of subqueries under controlled inclusion/exclusion
On large-scale data, the QTree+QPlanner pipeline produces aligned outlines for $\sim10^4$ coverage-conditioned queries efficiently, supporting rapid evaluation in RAG contexts (Kim et al., 2024)

7. Scope, Limitations, and Practical Relevance

QTree construction, as instantiated in (Kim et al., 2024), is tailored to structured decomposition of information-seeking questions. It is agnostic to domain, provided the base queries are suitable for hierarchical topical expansion. The approach fundamentally depends on LLMs’ capacity for stable, high-coverage question generation and outline selection; malformed base queries or degenerate decompositions require manual or heuristic post-filtering. Outline extraction is strictly limited to connected 4-node subgraphs, which may not capture all semantically optimal combinations. A plausible implication is that QTree-based querying is most effective where topic structures are inherently hierarchical and easily segmented by breadth-first subtopic enumeration, and may need adaptation for deep or irregular hierarchies.

The QTree construction pipeline enables systematic benchmarking of retrieval-augmented generation with explicit outline constraints, and informs the development of preference-aligned planners that respect user-specified coverage in complex information spaces (Kim et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QTree Construction.