Resources and Evaluations for Multi-Distribution Dense Information Retrieval (2306.12601v1)

Published 21 Jun 2023 in cs.IR and cs.AI

Abstract: We introduce and define the novel problem of multi-distribution information retrieval (IR) where given a query, systems need to retrieve passages from within multiple collections, each drawn from a different distribution. Some of these collections and distributions might not be available at training time. To evaluate methods for multi-distribution retrieval, we design three benchmarks for this task from existing single-distribution datasets, namely, a dataset based on question answering and two based on entity matching. We propose simple methods for this task which allocate the fixed retrieval budget (top-k passages) strategically across domains to prevent the known domains from consuming most of the budget. We show that our methods lead to an average of 3.8+ and up to 8.0 points improvements in Recall@100 across the datasets and that improvements are consistent when fine-tuning different base retrieval models. Our benchmarks are made publicly available.

References (31)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

GitHub

GitHub - stanfordnlp/multi-distribution-retrieval: Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval (15 stars)

Tweets

https://twitter.com/sawubonagmbh/status/1777733736601391131

Resources and Evaluations for Multi-Distribution Dense Information Retrieval (2306.12601v1)

Summary

Follow-up Questions

Related Papers

GitHub

Tweets