Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU

Published 27 Apr 2026 in cs.AI | (2604.24219v1)

Abstract: Multi-intent natural language understanding requires retrieval systems that simultaneously achieve high accuracy and computational efficiency, yet existing approaches apply either uniform single-step retrieval that compromises recall or fixed-depth hierarchical decomposition that introduces excessive latency regardless of query complexity. This paper proposes Adaptive Tree-of-Retrieval (Adaptive ToR), a complexity-aware retrieval architecture that dynamically configures retrieval topology based on query characteristics. The system integrates four components: (1) a Query Tree Classifier computing a Query Complexity Index from weighted linguistic signals to route queries to either a rapid single-step path or an adaptive-depth hierarchical path; (2) a Tree-Based Retrieval module that recursively decomposes complex queries into focused sub-queries calibrated to predicted complexity; (3) an Adaptive Pruning Module employing two-stage filtering combining quantitative similarity gating with semantic relevance evaluation to suppress exponential node growth; and (4) a Retrieval Reranking Layer featuring a deduplicator-first pipeline and global LLM rescoring for production efficiency. Evaluation on the NLU++ benchmark (2,693 multi-intent queries across Banking and Hotel domains) yields 29.07% Subset Accuracy and 71.79% Micro-F1, a 9.7% relative improvement over fixed-depth baselines, while reducing latency by 37.6%, LLM invocations by 43.0%, and token consumption by 9.8%. Depth-wise analysis reveals that 26.92% of queries resolve within three seconds (2.45s mean latency) via single-step routing (d=0: 37.9% Subset Accuracy, 74.8% Micro-F1), while token consumption scales by 4.9x across depths, validating complexity-aware resource allocation and establishing Pareto-optimal balance across accuracy, latency, and computational efficiency.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a dynamic tree-based retrieval mechanism that assigns query depth based on complexity to achieve Pareto-optimal accuracy, latency, and cost trade-offs.
It employs a Query Tree Classifier and an adaptive pruning module to efficiently decompose multi-intent queries, reducing unnecessary computational overhead.
Evaluation on the NLU++ benchmark demonstrates significant improvements in subset accuracy, Micro-F1 scores, latency reduction, and token efficiency compared to fixed-depth baselines.

Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU

Motivation and Context

The paper addresses the structural inefficiencies inherent in traditional retrieval-augmented generation (RAG) and tree-based retrieval architectures applied to multi-intent natural language understanding (NLU). Uniform single-step retrieval either limits recall or fixed-depth hierarchical approaches induce excessive computational latency regardless of query complexity. With the proliferation of conversational AI requiring simultaneous interpretation of multiple intents—often composed in a single utterance—existing systems lack contextual resource allocation and optimal scaling. Adaptive ToR introduces a complexity-aware retrieval architecture, operationalized through dynamic routing and depth control, enabling Pareto-optimal trade-offs among accuracy, latency, and computational cost.

Architectural Overview

Adaptive ToR is defined by four key modules:

Query Tree Classifier (QTC): Computes a Query Complexity Index (QCI) using weighted linguistic signals—including interrogatives, conjunctions, comparison markers, sequence terms, and token length—to classify queries as either routine (single-step) or requiring adaptive-depth hierarchical decomposition.
Tree-Based Retrieval: For complex queries, QTC invokes a tree expansion module, recursively decomposing the query into calibrated sub-queries. The maximum depth $d \in \{1,2,3\}$ is dynamically determined, capping exponential node growth and preventing over-processing.
Adaptive Pruning Module (APM): Employs a two-stage filter. The quantitative gate uses cosine similarity to retain or discard candidates based on predetermined thresholds. Borderline candidates undergo semantic evaluation via an LLM acting as a judge, supporting precise node pruning prior to further expansion.
Retrieval Reranking Layer (RRL): Integrates a deduplicator-first pipeline, followed by global LLM rescoring. This design enables efficient candidate consolidation, minimizing token consumption and biasing reranking towards globally relevant evidence for intent classification.

Experimental Validation

Evaluation utilizes the NLU++ benchmark, consisting of 2,693 multi-intent queries spanning Banking and Hotel domains, annotated across 62 intent classes. Metrics include Subset Accuracy, Micro-F1, Macro-F1, latency (query-processing time), LLM invocation count, and token consumption.

Comparative Results

Adaptive ToR achieves:

29.07% Subset Accuracy (9.7% relative improvement over fixed-depth ToR-RAG)
71.79% Micro-F1 (up by 8.63 percentage points compared to Standard RAG)
Latency reduction: 9.73 s average (down 37.6% vs. ToR-RAG)
LLM call reduction: 6.01 per query (down 43.0% vs. ToR-RAG)
Token efficiency: 9.8% less consumption than uniform tree-based baselines

Depth-wise analysis shows complexity-cost alignment: token usage and latency scale monotonically with assigned depth, empirically validating QTC’s stratification mechanism. Notably, 26.92% of queries are resolved at $d = 0$ (single-step path), attaining the highest Subset Accuracy and lowest latency (2.45 s mean).

Mechanism Insights

Complexity-aware routing: QTC prevents systematic over-processing. Simple queries bypass expansion and reranking, achieving optimal speed and precision.
Deep decomposition: At maximum depth ( $d=3$ ), Subset Accuracy drops but Micro-F1 remains high, indicating partially recovered intents for genuinely hard queries.
Resource scaling: Token consumption varies $\times4.9$ across depths, underscoring proportional allocation.

Adaptive ToR consistently occupies the Pareto-optimal frontier (accuracy, efficiency, cost), outperforming both baselines in joint-objective space.

Theoretical and Practical Implications

Adaptive ToR’s most salient contribution is operationalizing complexity-aware retrieval. Proactive depth selection, as opposed to post-hoc reranking, fundamentally reconfigures the retrieval topology, yielding measurable operational savings and improved accuracy for multi-intent NLU. The tri-objective evaluation (accuracy, latency, token cost) establishes a rigorous framework for trade-off analysis, informing future scalable conversational AI architectures.

By allocating computational resources proportionally to query complexity, Adaptive ToR is suitable for production environments where heterogeneous query structures coexist. Its depth-stratified validation directly supports design for scalable, user-responsive NLU. The integration of deduplicator-first reranking reduces token consumption and avoids context fragmentation, further enhancing real-world usability.

Future Directions

Architectural lightweighting: QTC for edge/on-device deployment.
Neural complexity estimation: Replacing rule-based QCI with end-to-end models for improved cross-domain generalization.
Multimodal/cross-lingual extension: Adaptive retrieval in diverse AI deployments.
Ontological integration: Extension with GraphRAG-based knowledge structures for semantically informed decomposition, especially in domain-specialized reasoning.

Together with ToR-RAG (fixed-depth) and ToR-Lite (LLM-free), Adaptive ToR rounds out a lineage of Pareto-optimal retrieval frameworks, enabling selection based on deployment constraints (accuracy, latency sensitivity, token budget).

Conclusion

Adaptive ToR advances tree-based retrieval architectures for multi-intent NLU by dynamically adapting retrieval depth and resource allocation to intrinsic query complexity. Empirical validation demonstrates robust improvements in accuracy, speed, and efficiency, exceeding established baselines and achieving multi-objective Pareto dominance. Its architecture and stratification mechanisms set a precedent for future adaptive retrieval in conversational AI, with opportunities for further specialization and deployment optimization across domains and modalities.

Markdown Report Issue