Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey (2208.03197v1)

Published 5 Aug 2022 in cs.CL

Abstract: Dense retrieval (DR) approaches based on powerful pre-trained LLMs (PLMs) achieved significant advances and have become a key component for modern open-domain question-answering systems. However, they require large amounts of manual annotations to perform competitively, which is infeasible to scale. To address this, a growing body of research works have recently focused on improving DR performance under low-resource scenarios. These works differ in what resources they require for training and employ a diverse set of techniques. Understanding such differences is crucial for choosing the right technique under a specific low-resource scenario. To facilitate this understanding, we provide a thorough structured overview of mainstream techniques for low-resource DR. Based on their required resources, we divide the techniques into three main categories: (1) only documents are needed; (2) documents and questions are needed; and (3) documents and question-answer pairs are needed. For every technique, we introduce its general-form algorithm, highlight the open issues and pros and cons. Promising directions are outlined for future research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaoyu Shen (73 papers)
  2. Svitlana Vakulenko (31 papers)
  3. Marco del Tredici (13 papers)
  4. Gianni Barlacchi (10 papers)
  5. Bill Byrne (57 papers)
  6. AdriĆ  de Gispert (16 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.