Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature (2106.13375v2)

Published 25 Jun 2021 in cs.IR, cs.CL, and cs.DL

Abstract: Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Yu Wang (939 papers)
  2. Jinchao Li (22 papers)
  3. Tristan Naumann (41 papers)
  4. Chenyan Xiong (95 papers)
  5. Hao Cheng (190 papers)
  6. Robert Tinn (6 papers)
  7. Cliff Wong (14 papers)
  8. Naoto Usuyama (22 papers)
  9. Richard Rogahn (2 papers)
  10. Zhihong Shen (14 papers)
  11. Yang Qin (22 papers)
  12. Eric Horvitz (77 papers)
  13. Paul N. Bennett (10 papers)
  14. Jianfeng Gao (344 papers)
  15. Hoifung Poon (61 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.