Understanding why some IR prompts outperform others

Determine the factors and mechanisms that cause certain natural-language prompts to yield better retrieval performance than others when conditioning instruction-trained dense bi-encoder retrievers such as Promptriever; characterize how prompt phrasing, length, and style influence effectiveness and robustness across datasets.

Background

The paper introduces Promptriever, a dense bi-encoder retriever trained with instance-level instructions and instruction-negatives to enable promptable, per-query control of relevance. On BEIR, certain prompts reliably improve performance, while others do not, and the model shows reduced variance to prompt phrasing compared to baselines.

Despite these empirical findings, the authors note that the reasons underlying why some prompts work better than others remain opaque, mirroring similar challenges observed in LLM prompting. This motivates a focused inquiry into the determinants of prompt effectiveness for retrieval models.

References

We also note that, similar to LLMs, it is often unclear why some IR prompts perform better than others.

— Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models (2409.11136 - Weller et al., 17 Sep 2024) in Section: Limitations

Understanding why some IR prompts outperform others

Background

References

Related Problems