Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PROD: Progressive Distillation for Dense Retrieval (2209.13335v3)

Published 27 Sep 2022 in cs.IR and cs.CL

Abstract: Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student. However, this expectation does not always come true. It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student. To bridge the gap, we propose PROD, a PROgressive Distillation method, for dense retrieval. PROD consists of a teacher progressive distillation and a data progressive distillation to gradually improve the student. We conduct extensive experiments on five widely-used benchmarks, MS MARCO Passage, TREC Passage 19, TREC Document 19, MS MARCO Document and Natural Questions, where PROD achieves the state-of-the-art within the distillation methods for dense retrieval. The code and models will be released.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Zhenghao Lin (14 papers)
  2. Yeyun Gong (78 papers)
  3. Xiao Liu (402 papers)
  4. Hang Zhang (164 papers)
  5. Chen Lin (75 papers)
  6. Anlei Dong (6 papers)
  7. Jian Jiao (44 papers)
  8. Jingwen Lu (5 papers)
  9. Daxin Jiang (138 papers)
  10. Rangan Majumder (12 papers)
  11. Nan Duan (172 papers)
Citations (23)

Summary

We haven't generated a summary for this paper yet.