QTree Construction for Hierarchical Querying
- QTree Construction is a structured 3-level, 3-ary tree that hierarchically decomposes a base query into subqueries.
- It generates subqueries with strict inclusion or exclusion constraints to form connected, four-node outlines for comprehensive retrieval.
- The construction pipeline integrates LLM-based generation, heuristic repair, and preference optimization to benchmark retrieval performance.
A QTree (Query Tree) denotes a specific hierarchical structure used for organizing subqueries in coverage-conditioned retrieval-augmented generation, with prominent recent instantiations represented in the QTree dataset for outline construction and evaluation in coverage-conditioned () querying. In this context, a QTree is a strictly regular 3-level -ary tree (with , depth ), designed to systematically unfold the subtopic space of a base user question, enabling the creation of principled, efficiently searchable outlines that satisfy complex inclusion/exclusion criteria. QTree construction is utilized for benchmarking and optimizing LLM-guided information search, particularly within large-scale retrieval systems and preference-tuned outline planning (Kim et al., 2024).
1. Formal Definition and Notation
A QTree is defined on the basis of a “base” question and is composed as follows:
- Depth:
- Branching factor:
- Node set: , with (excluding the root)
- Each node represents a natural-language subquestion recursively refining
- Edge set: , forming an arborescence rooted at
The tree is rooted, ordered, and strictly uniform: every parent at level has exactly children unless at maximum depth. Subquery identifiers are canonicalized as , , , etc., to denote hierarchical position.
Coverage-conditioned queries or queries augment with a “coverage query” specifying inclusion or exclusion of particular nodes/subtrees within .
2. Construction Pipeline
The QTree construction for the QTree dataset follows a multi-stage process (Kim et al., 2024):
A. Base Query Collection:
- Base questions are sourced from datasets targeting information-seeking and expert queries (ASQA, Longform, ExpertQA)
- Data cleaning removes extraneous or malformed entries, yielding unique train/test base queries
B. Hierarchical Decomposition (QTree Generation):
- For each , an LLM (e.g., GPT-4) is prompted to generate a 3-level, strictly 3-ary hierarchy of subquestions
- Requirements: depth and branching fixed at 3, all nodes in question format, no duplicates or semantic overlap
- Outputs are filtered for shape validity; malformed generations are heuristically repaired or re-prompted
C. Coverage Query Synthesis:
- For each tree , a node is sampled as a “background query”
- An intent operation (inclusion/exclusion) is randomly assigned
- Coverage query is synthesized via LLM prompt templates such as “Please include/avoid topic X”
- For each base query, a single query is retained after parsing candidate coverages
D. Outline Candidate Generation:
- For each query, the construction algorithm generates 3 candidate outlines by sequentially prompting the LLM to extract a connected set of exactly 4 nodes from that honor the coverage constraint
- Outlines are validated to be connected (as path/tree subgraphs), deduplicated, and coverage-compliant (“inclusion” implies subtree; “exclusion” implies subtree)
3. Coverage Constraints and Objective Metrics
Coverage-conditioned querying is formalized as follows:
- For “Inclusion” intents, an outline must intersect the subtree rooted at
- For “Exclusion” intents, must be disjoint from subtree
- Outlines are scored by an LLM judge (GPT-4) on a 1–5 scale, representing the degree of compliance with constraints
These scores are used not only for dataset annotation but also as reward signals for subsequent preference-alignment in outline generation models.
4. QPlanner Model and Training Paradigm
The resulting dataset is leveraged to train "QPlanner," a 7B-parameter Llama-2-based model:
- Model input: query, consisting of the base question and the coverage intent
- Model output: a full QTree plus a 4-item compliant outline
- Training is conducted via:
- Supervised fine-tuning (SFT) on pairs
- Direct Preference Optimization (DPO), which uses pairwise outline preferences derived from LLM-annotated scores, with a KL-regularization penalty imposed against a reference policy. Only the highest and lowest scored outlines per are used as preference signals.
DPO objective:
where is the logistic sigmoid, is QPlanner's policy, and controls the KL term.
5. Illustrative Example: End-to-End Construction
For = “Describe the film The Woman Hunt”, the generated QTree features major facets such as plot, production, and reception. Suppose is an exclusion of the “reception” subtree. The outline construction process yields an output such as:
- “What is the plot of The Woman Hunt?”
- “What are the main events in The Woman Hunt?”
- “What initiates the conflict in The Woman Hunt?”
- “What is the climax of The Woman Hunt?”
This outline forms a valid, connected subset, excludes nodes under the reception branch, and passes all structural and coverage constraints. An LLM judge rates the output, and the observed score contributes to preference-based training.
6. Experimental Results and Empirical Analysis
Empirical evaluation demonstrates:
- All candidate outlines generated by QPlanner (as preference-tuned) better satisfy coverage criteria compared to SFT-only and baseline models, per both LLM and human judges
- QTree structures support fine-grained, interpretable filtering and composition of subqueries under controlled inclusion/exclusion
- On large-scale data, the QTree+QPlanner pipeline produces aligned outlines for coverage-conditioned queries efficiently, supporting rapid evaluation in RAG contexts (Kim et al., 2024)
7. Scope, Limitations, and Practical Relevance
QTree construction, as instantiated in (Kim et al., 2024), is tailored to structured decomposition of information-seeking questions. It is agnostic to domain, provided the base queries are suitable for hierarchical topical expansion. The approach fundamentally depends on LLMs’ capacity for stable, high-coverage question generation and outline selection; malformed base queries or degenerate decompositions require manual or heuristic post-filtering. Outline extraction is strictly limited to connected 4-node subgraphs, which may not capture all semantically optimal combinations. A plausible implication is that QTree-based querying is most effective where topic structures are inherently hierarchical and easily segmented by breadth-first subtopic enumeration, and may need adaptation for deep or irregular hierarchies.
The QTree construction pipeline enables systematic benchmarking of retrieval-augmented generation with explicit outline constraints, and informs the development of preference-aligned planners that respect user-specified coverage in complex information spaces (Kim et al., 2024).