Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 74 tok/s
Gemini 2.5 Flash 163 tok/s Pro
Gemini 2.5 Pro 46 tok/s Pro
Kimi K2 200 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

LLM-guided Hierarchical Retrieval (2510.13217v1)

Published 15 Oct 2025 in cs.IR and cs.LG

Abstract: Modern IR systems are increasingly tasked with answering complex, multi-faceted queries that require deep reasoning rather than simple keyword or semantic matching. While LLM-based IR has shown great promise, the prevailing retrieve-then-rerank paradigm inherits the limitations of embedding-based retrieval; parametric generative approaches are difficult to update with new information; and long-context methods that place the entire corpus in context are computationally infeasible for large document collections. To address these challenges, we introduce LATTICE, a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity by imposing a semantic tree structure on the corpus. Our approach consists of two stages: (1) an offline phase that organizes the corpus into a semantic hierarchy via either a bottom-up agglomerative strategy or a top-down divisive strategy using multi-level summaries and (2) an online traversal phase where a search LLM navigates this tree. A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy, making cross-branch and cross-level comparisons difficult. To overcome this, we propose a traversal algorithm that estimates calibrated latent relevance scores from local LLM outputs and aggregates them into a global path relevance metric. Our training-free framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark, demonstrating up to 9% improvement in Recall@100 and 5% in nDCG@10 over the next best zero-shot baseline. Furthermore, compared to the fine-tuned SOTA method DIVER-v2, LATTICE attains comparable results on BRIGHT subsets that use a static corpus for evaluation.

Summary

LLM-guided Hierarchical Retrieval

This paper presents LATTICE, a novel hierarchical retrieval framework designed to leverage the reasoning capabilities of LLMs for improved information retrieval (IR). LATTICE addresses the limitations of traditional retrieval-then-rerank systems by implementing a semantic hierarchy to efficiently guide search queries through a document corpus.

Methodology

Framework Structure

LATTICE organizes documents into a semantic tree, significantly enhancing search efficiency to logarithmic complexity. It includes an offline corpus organization phase using either a bottom-up agglomerative or a top-down divisive approach. In the bottom-up approach, documents are incrementally clustered into higher-level concepts, while the top-down approach recursively divides the corpus into semantically meaningful groups. Figure 1

Figure 1: A high-level overview of the proposed LATTICE framework, showcasing the offline and online stages.

Online Search Mechanism

In the online stage, LATTICE introduces a search LLM that traverses the semantic tree using a best-first traversal strategy. The algorithm employs a path relevance score, which aggregates locally estimated relevance scores from the LLM across the path from the root to guide the search coherently.

  • Score Calibration: The traversal algorithm addresses the noise in LLM-generated scores by calibrating these scores to ensure a globally coherent ranking mechanism.
  • Path Relevance: The system computes a path relevance score for nodes, combining local LLM outputs to prioritize document exploration accurately.

Experimental Results

LATTICE was evaluated on the BRIGHT benchmark, a reasoning-heavy IR challenge, and achieved notable improvements in retrieval metrics. Specifically, LATTICE demonstrated up to 9% better performance in Recall@100 and 5% in nDCG@10 compared to existing zero-shot baselines. Figure 2

Figure 2

Figure 2: Recall@100 performance showcasing LATTICE's superior retrieval over BM25 and ReasonIR-8B models.

Implementation Insights

Practical Considerations

  • Computational Cost: Although initially computationally intensive, organizing the corpus into a semantic tree is a one-time cost that reaps significant benefits in speed during the online search.
  • Scalability: The hierarchical model effectively scales with corpus size, only requiring logarithmic time complexity for retrieval, making it a compelling choice for large datasets.

Limitations and Future Work

A key limitation is LATTICE's reliance on a static index, which can be suboptimal in dynamic corpus scenarios where document availability changes. Future developments could focus on dynamic tree updates or adaptive path recalibration mechanisms to address such limitations.

Conclusion

LATTICE introduces an innovative utilization of LLMs in IR by combining their reasoning power with efficient search structures. By organizing documents hierarchically, LATTICE enhances retrieval accuracy and computational efficiency across complex, reasoning-intensive tasks. This research suggests a promising direction for advancing retrieval systems with LLM-native processes, potentially reshaping traditional paradigms in IR systems.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 7 tweets and received 72 likes.

Upgrade to Pro to view all of the tweets about this paper: