Out-of-Domain Semantics to the Rescue! Zero-Shot Hybrid Retrieval Models

Published 25 Jan 2022 in cs.IR | (2201.10582v1)

Abstract: The pre-trained LLM (eg, BERT) based deep retrieval models achieved superior performance over lexical retrieval models (eg, BM25) in many passage retrieval tasks. However, limited work has been done to generalize a deep retrieval model to other tasks and domains. In this work, we carefully select five datasets, including two in-domain datasets and three out-of-domain datasets with different levels of domain shift, and study the generalization of a deep model in a zero-shot setting. Our findings show that the performance of a deep retrieval model is significantly deteriorated when the target domain is very different from the source domain that the model was trained on. On the contrary, lexical models are more robust across domains. We thus propose a simple yet effective framework to integrate lexical and deep retrieval models. Our experiments demonstrate that these two models are complementary, even when the deep model is weaker in the out-of-domain setting. The hybrid model obtains an average of 20.4% relative gain over the deep retrieval model, and an average of 9.54% over the lexical model in three out-of-domain datasets.

Abstract PDF Upgrade to Chat

Citations (35)

View on Semantic Scholar

Summary

The paper’s main contribution is a hybrid retrieval model that fuses deep semantic and lexical signals to mitigate performance drops on out-of-domain tasks.
It employs Reciprocal Rank Fusion (RRF) to integrate the strengths of both model types, yielding a 20.4% relative recall boost over deep retrieval alone.
The study demonstrates that combining pretrained BERT-based models with BM25-style lexical approaches enhances robustness across diverse datasets.

Out-of-Domain Semantics to the Rescue: Zero-Shot Hybrid Retrieval Models

This paper addresses the challenge of generalizing pretrained deep retrieval models, such as those utilizing BERT, to out-of-domain tasks in a zero-shot setting. The contrasting performances of deep retrieval models, which excel on in-domain tasks but falter when exposed to significant domain shifts, and lexical models, which consistently demonstrate robustness across various domains, serve as the focal point of this investigation.

The authors select a diverse set of five datasets, which include both in-domain and out-of-domain examples, to explore the efficacy of deep retrieval models in a zero-shot context. The core finding is that while deep retrieval models perform notably well when applied to tasks similar to their training domain, their performance deteriorates significantly with increasing domain divergence. In stark contrast, traditional lexical retrieval models, exemplified by BM25, maintain a more consistent performance across these diverse datasets.

To overcome these limitations, the paper proposes a hybrid retrieval approach integrating both lexical and deep retrieval models. This hybrid model leverages the complementary strengths of both model types to enhance overall retrieval performance. The authors employ Reciprocal Rank Fusion (RRF), a non-parametric fusion strategy, to effectively combine the retrieval results of lexical and deep models. This allows for a simple yet flexible integration that can be applied across different datasets and domains without requiring extensive fine-tuning.

The experimental results provide compelling evidence for the hybrid model's superior performance. Specifically, the hybrid model demonstrates a relative recall improvement of 20.4% over deep retrieval models alone and a 9.54% boost relative to lexical models in out-of-domain datasets. Such gains underscore the hybrid model's ability to bridge vocabulary gaps more effectively by capturing both semantic relevance and exact matches.

The paper's contributions are multifold. Firstly, it systematically evaluates the zero-shot generalization capabilities of deep retrieval models across multiple domains. Secondly, it proposes, implements, and evaluates a hybrid retrieval approach that capitalizes on the unique strengths of both lexical and deep retrieval techniques. Finally, these experiments potentially open up pathways for further research into adaptive and robust retrieval systems that can effectively operate across heterogeneous data landscapes without sacrificing performance.

Looking forward, the implications of this research extend toward the continued development of AI systems that require minimal domain-specific tuning. The hybrid framework proposed could serve as a foundation for future retrieval systems designed to seamlessly transition between multiple domains and tasks, supporting a wide array of applications from web search to domain-specific information retrieval tasks. Additionally, the question of how best to leverage domain adaptation techniques in conjunction with such hybrid models remains a promising area for continued exploration. The findings here suggest that incorporating a diverse range of retrieval strategies could significantly enhance the adaptability and efficacy of AI systems in varied real-world settings.