Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 49 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 172 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models (2103.03335v4)

Published 4 Mar 2021 in cs.IR and cs.CL

Abstract: Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In contrast, each of our collections has a substantial number of queries, which enables a full-shot evaluation mode and improves reliability of our results. Furthermore, since source datasets licences often prohibit commercial use, we compare transfer learning to training on pseudo-labels generated by a BM25 scorer. We find that training on pseudo-labels -- possibly with subsequent fine-tuning using a modest number of annotated queries -- can produce a competitive or better model compared to transfer learning. Yet, it is necessary to improve the stability and/or effectiveness of the few-shot training, which, sometimes, can degrade performance of a pretrained model.

Citations (25)

Summary

  • The paper shows that BERT-based rankers require thousands of annotated queries to outperform BM25, highlighting the high cost of manual labeling.
  • The paper demonstrates pseudo-labeling, generating labels via BM25, consistently boosts performance by 5–15% when annotated data is scarce.
  • The paper finds that while transfer learning can underperform with domain shifts, fine-tuning pseudo-labeled models may achieve superior IR results.

Evaluation of Transfer Learning and Pseudo-labeling in BERT-based Ranking Models

The paper, presented at SIGIR 2021, evaluates the effectiveness of transfer learning and pseudo-labeling techniques in BERT-based neural ranking models across five distinct English datasets in the context of information retrieval (IR). This research diverges from previous studies by focusing on collections with substantial query numbers, allowing for a comprehensive full-shot evaluation. Additionally, it addresses the constraints posed by non-commercial dataset licenses that limit transfer learning's applicability, promoting pseudo-labeling as an alternative strategy.

Summary of Research Questions and Findings

  1. Data Needs for Training from Scratch:
    • A BERT-based ranker requires a significant number of annotated queries to outperform BM25. On datasets like Yahoo! Answers, thousands of queries are necessary, emphasizing the high cost of manual annotation and the importance of strategies that minimize this need.
  2. Effectiveness of Pseudo-labeling:
    • Models trained on pseudo-labels, generated via BM25, consistently surpassed BM25's performance by 5-15%. This suggests pseudo-labeling as a viable approach when annotated data is scarce, though improvements were smaller compared to earlier studies.
  3. Transfer Learning Performance:
    • Transfer learning doesn't universally outperform BM25. In fact, transferred models can sometimes underperform compared to both BM25 and models trained on pseudo-labels, especially when dataset characteristics differ significantly from the original training data.
  4. Comparison with Pseudo-labeling:
    • Transferred models generally show better performance than those trained solely on pseudo-labels. However, by fine-tuning pseudo-labeled models with a moderate amount of annotated data, performance can not only match but occasionally exceed that of transferred models.
  5. Impact of Few-shot Training:
    • Few-shot training, while capable of enhancing pseudo-labeled models, can paradoxically degrade the performance of transfer-learned models. This effect is particularly stark for datasets like Yahoo! Answers, indicating potential overfitting and a loss of learned knowledge from the transfer step.

Implications and Future Research Directions

The paper provides insights that question the reliability of transfer learning without substantial in-domain data, highlighting the potential of pseudo-labeling supported by limited annotated data as a competitive alternative. This approach might be more robust to domain shifts given that it does not rely on data from external sources which may introduce distribution mismatches.

Future research should focus on enhancing the stability and effectiveness of few-shot learning to mitigate instances of severe performance degradation observed in this paper. Additionally, the exploration of hybrid models that dynamically integrate transfer learning and pseudo-labeling, adapting the strategy based on available data and task-specific needs, represents an intriguing avenue for development in neural ranking techniques.

In conclusion, this paper underscores the complexity of applying transfer learning in IR tasks and advocates for a nuanced approach that combines the strengths of transfer learning and pseudo-labeling to achieve optimal results across diverse datasets.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com