Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (2101.00436v3)

Published 2 Jan 2021 in cs.CL and cs.IR

Abstract: Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge. To retrieve evidence passages, multi-hop models must contend with a fast-growing search space across the hops, represent complex queries that combine multiple information needs, and resolve ambiguity about the best order in which to hop between training passages. We tackle these problems via Baleen, a system that improves the accuracy of multi-hop retrieval while learning robustly from weak training signals in the many-hop setting. To tame the search space, we propose condensed retrieval, a pipeline that summarizes the retrieved passages after each hop into a single compact context. To model complex queries, we introduce a focused late interaction retriever that allows different parts of the same query representation to match disparate relevant passages. Lastly, to infer the hopping dependencies among unordered training passages, we devise latent hop ordering, a weak-supervision strategy in which the trained retriever itself selects the sequence of hops. We evaluate Baleen on retrieval for two-hop question answering and many-hop claim verification, establishing state-of-the-art performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Omar Khattab (34 papers)
  2. Christopher Potts (113 papers)
  3. Matei Zaharia (101 papers)
Citations (51)

Summary

Overview of "Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval"

The paper "Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval" by Khattab et al. introduces Baleen, a system designed to enhance multi-hop reasoning in natural language processing tasks. These tasks require retrieving and reasoning over information spread across multiple documents from large corpora. Baleen addresses key challenges such as managing the expanding search space, modeling complex queries, and determining optimal hop sequences.

Key Contributions

  1. Condensed Retrieval: The paper presents a novel approach to condense retrieved information at each hop into a compact context, mitigating the search space expansion typical in multi-hop retrieval tasks. By summarizing retrieved passages, this strategy allows for efficient scaling across many hops.
  2. Focused Late Interaction: Baleen utilizes a focused late interaction retriever that allows for disparate sections of a query to match with different relevant passages, addressing the complexity inherent in multi-hop queries.
  3. Latent Hop Ordering: The authors propose a weak-supervision method for hop ordering where the retriever determines the best sequence of hops, harnessing weak training signals.

Experimental Results

Baleen demonstrated state-of-the-art performance on HotPotQA and HoVer datasets. In HotPotQA, Baleen achieved a 96.3% answer recall at the top-20 retrieved passages. For the many-hop HoVer task, Baleen outperformed strong baselines with a significant increase in retrieval accuracy, achieving over 90% top-100 retrieval accuracy and a remarkable improvement in evidence extraction F1 scores.

Implications and Future Directions

The condensed retrieval approach effectively balances retrieval accuracy and computational efficiency, offering a scalable solution for tasks requiring multiple retrieval hops. By condensing retrieved knowledge, Baleen reduces passage length input to downstream models, facilitating effective reasoning over the retrieved evidence. This could prove beneficial for extending to tasks requiring an even greater number of hops or spanning larger document collections.

The focused late interaction mechanism enhances the retrieval model's ability to handle complex queries by allowing for selective matching, potentially improving systems handling diverse and intricate queries. Future work could explore further enhancements to the focused late interaction paradigm or integrate more sophisticated query representations.

Furthermore, the concept of latent hop ordering via weak supervision opens avenues for applying similar strategies to other domains where labeled data may be scarce or costly. Additional paper on refining weak-supervision methods could enable more robust retrieval sequences in broadening application areas.

Conclusion

Baleen makes substantial advancements in tackling the challenges of multi-hop reasoning by innovating on retrieval architecture and supervision strategies. Its ability to handle extensive search spaces, complex queries, and infer hop sequences underscores its potential as a versatile tool in NLP tasks requiring sophisticated multi-document reasoning.

Youtube Logo Streamline Icon: https://streamlinehq.com