RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! (2312.02724v1)

Published 5 Dec 2023 in cs.IR

Abstract: In information retrieval, proprietary LLMs such as GPT-4 and open-source counterparts such as LLaMA and Vicuna have played a vital role in reranking. However, the gap between open-source and closed models persists, with reliance on proprietary, non-transparent models constraining reproducibility. Addressing this gap, we introduce RankZephyr, a state-of-the-art, open-source LLM for listwise zero-shot reranking. RankZephyr not only bridges the effectiveness gap with GPT-4 but in some cases surpasses the proprietary model. Our comprehensive evaluations across several datasets (TREC Deep Learning Tracks; NEWS and COVID from BEIR) showcase this ability. RankZephyr benefits from strategic training choices and is resilient against variations in initial document ordering and the number of documents reranked. Additionally, our model outperforms GPT-4 on the NovelEval test set, comprising queries and passages past its training period, which addresses concerns about data contamination. To foster further research in this rapidly evolving field, we provide all code necessary to reproduce our results at https://github.com/castorini/rank_LLM.

PDF HTML Abstract

Introduction

The paper introduces RankZephyr, an open-source LLM specialized in zero-shot listwise reranking for information retrieval systems. Reranking is a critical step in refining the result set returned by a search system, and the use of LLMs in reranking tasks has gained prominence. The model aims to bridge the existing gap between open-source and proprietary LLMs, offering an efficient and transparent alternative for the academic community.

RankZephyr Model

RankZephyr's implementation builds upon recent advances in listwise reranking with LLMs. It differs from proprietary models with its focus on transparency and reproducibility, crucial for scientific progress. Despite having significantly fewer parameters than some proprietary counterparts, RankZephyr's evaluations indicate it can sometimes outperform these closed-source models in reranking tasks.

Evaluation and Insights

Extensive experiments were conducted using datasets from various sources, including TREC Deep Learning Tracks and the BEIR benchmark. Key insights from the evaluations revealed:

The robustness of RankZephyr against various initial document ordering, showing consistent performance even with shuffled candidate lists.
The model's efficacy when using different first-stage retrieval models and the positive impact of using higher-quality candidate lists.
The importance of strategic training choices, like the use of shuffled input orderings and variable window sizes during distillation from RankGPT, which enhanced RankZephyr's reranking capabilities.

Future Directions

The provided codebase invites further research into refining reranking models and developing more effective retrieval systems. RankZephyr's successful results demonstrate a significant step toward open-source, high-quality information retrieval using LLMs, challenging the dominance of proprietary models in the field.

The paper concludes that RankZephyr is not only competitive with state-of-the-art proprietary rerankers but, importantly, supports reproducible research. The model's resilience against varying input orderings suggests its high generalizability and effectiveness in real-world scenarios which an ever-changing web demands.