Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking (2302.04112v1)

Published 8 Feb 2023 in cs.IR

Abstract: Although BERT-based ranking models have been commonly used in commercial search engines, they are usually time-consuming for online ranking tasks. Knowledge distillation, which aims at learning a smaller model with comparable performance to a larger model, is a common strategy for reducing the online inference latency. In this paper, we investigate the effect of different loss functions for uniform-architecture distillation of BERT-based ranking models. Here "uniform-architecture" denotes that both teacher and student models are in cross-encoder architecture, while the student models include small-scaled pre-trained LLMs. Our experimental results reveal that the optimal distillation configuration for ranking tasks is much different than general natural language processing tasks. Specifically, when the student models are in cross-encoder architecture, a pairwise loss of hard labels is critical for training student models, whereas the distillation objectives of intermediate Transformer layers may hurt performance. These findings emphasize the necessity of carefully designing a distillation strategy (for cross-encoder student models) tailored for document ranking with pairwise training samples.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xubo Qin (2 papers)
  2. Xiyuan Liu (18 papers)
  3. Xiongfeng Zheng (1 paper)
  4. Jie Liu (492 papers)
  5. Yutao Zhu (63 papers)

Summary

We haven't generated a summary for this paper yet.