Unsupervised Domain Adaption for Neural Information Retrieval (2310.09350v1)

Published 13 Oct 2023 in cs.CL and cs.AI

Abstract: Neural information retrieval requires costly annotated data for each target domain to be competitive. Synthetic annotation by query generation using LLMs or rule-based string manipulation has been proposed as an alternative, but their relative merits have not been analysed. In this paper, we compare both methods head-to-head using the same neural IR architecture. We focus on the BEIR benchmark, which includes test datasets from several domains with no training data, and explore two scenarios: zero-shot, where the supervised system is trained in a large out-of-domain dataset (MS-MARCO); and unsupervised domain adaptation, where, in addition to MS-MARCO, the system is fine-tuned in synthetic data from the target domain. Our results indicate that LLMs outperform rule-based methods in all scenarios by a large margin, and, more importantly, that unsupervised domain adaptation is effective compared to applying a supervised IR system in a zero-shot fashion. In addition we explore several sizes of open LLMs to generate synthetic data and find that a medium-sized model suffices. Code and models are publicly available for reproducibility.

References (29)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Unsupervised Domain Adaption for Neural Information Retrieval (2310.09350v1)

Summary

Related Papers