Team IELAB at TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval with Neural Rankers and Large Language Models (2401.01566v1)
Abstract: We describe team ielab from CSIRO and The University of Queensland's approach to the 2023 TREC Clinical Trials Track. Our approach was to use neural rankers but to utilise LLMs to overcome the issue of lack of training data for such rankers. Specifically, we employ ChatGPT to generate relevant patient descriptions for randomly selected clinical trials from the corpus. This synthetic dataset, combined with human-annotated training data from previous years, is used to train both dense and sparse retrievers based on PubmedBERT. Additionally, a cross-encoder re-ranker is integrated into the system. To further enhance the effectiveness of our approach, we prompting GPT-4 as a TREC annotator to provide judgments on our run files. These judgments are subsequently employed to re-rank the results. This architecture tightly integrates strong PubmedBERT-based rankers with the aid of SOTA LLMs, demonstrating a new approach to clinical trial retrieval.
- Splade v2: Sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086.
- Luyu Gao and Jamie Callan. 2022. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2843–2853, Dublin, Ireland. Association for Computational Linguistics.
- Rethink training of bert rerankers in multi-stage retrieval pipeline. In European Conference on Information Retrieval, pages 280–286.
- Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 708–718, Online. Association for Computational Linguistics.
- Multi-stage document ranking with bert. arXiv preprint arXiv:1910.14424.
- Large language models can accurately predict searcher preferences. arXiv preprint arXiv:2309.10621.
- Fine-tuning large neural language models for biomedical natural language processing. Patterns, 4(4).
- Simlm: Pre-training with representation bottleneck for dense passage retrieval. arXiv preprint arXiv:2207.02578.
- Bert-based dense retrievers require interpolation with bm25 for effective passage retrieval. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’21, page 317–324, New York, NY, USA. Association for Computing Machinery.
- Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
- Pretrained transformers for text ranking: BERT and beyond. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials, pages 1–4, Online. Association for Computational Linguistics.
- Typos-aware bottlenecked pre-training for robust dense retrieval. arXiv preprint arXiv:2304.08138.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.