Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Effective Matching of Patients to Clinical Trials using Entity Extraction and Neural Re-ranking (2307.00381v1)

Published 1 Jul 2023 in cs.IR and cs.CL

Abstract: Clinical trials (CTs) often fail due to inadequate patient recruitment. This paper tackles the challenges of CT retrieval by presenting an approach that addresses the patient-to-trials paradigm. Our approach involves two key components in a pipeline-based model: (i) a data enrichment technique for enhancing both queries and documents during the first retrieval stage, and (ii) a novel re-ranking schema that uses a Transformer network in a setup adapted to this task by leveraging the structure of the CT documents. We use named entity recognition and negation detection in both patient description and the eligibility section of CTs. We further classify patient descriptions and CT eligibility criteria into current, past, and family medical conditions. This extracted information is used to boost the importance of disease and drug mentions in both query and index for lexical retrieval. Furthermore, we propose a two-step training schema for the Transformer network used to re-rank the results from the lexical retrieval. The first step focuses on matching patient information with the descriptive sections of trials, while the second step aims to determine eligibility by matching patient information with the criteria section. Our findings indicate that the inclusion criteria section of the CT has a great influence on the relevance score in lexical models, and that the enrichment techniques for queries and documents improve the retrieval of relevant trials. The re-ranking strategy, based on our training schema, consistently enhances CT retrieval and shows improved performance by 15\% in terms of precision at retrieving eligible trials. The results of our experiments suggest the benefit of making use of extracted entities. Moreover, our proposed re-ranking schema shows promising effectiveness compared to larger neural models, even with limited training data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS), 20(4):357–389, 2002.
  2. Extracting semantic aspects for structured representation of clinical trial eligibility criteria. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 243–248, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.clinicalnlp-1.27. URL https://aclanthology.org/2020.clinicalnlp-1.27.
  3. Automatic segregation and classification of inclusion and exclusion criteria of clinical trials to improve patient eligibility matching. In Explainable AI in Healthcare and Medicine, pages 291–296. Springer, 2021.
  4. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://www.aclweb.org/anthology/N19-1423.
  5. Physicians’ perceptions of an electronic health record-based clinical trial alert approach to subject recruitment: a survey. BMC medical informatics and decision making, 8(1):1–8, 2008.
  6. Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. In AMIA Annual Symposium proceedings. AMIA Symposium, volume 2021, pages 438–447. American Medical Informatics Association, 2022. URL http://arxiv.org/abs/2106.07799.
  7. William Falcon and The PyTorch Lightning team. Pytorch lightning, March 2019. URL https://doi.org/10.5281/zenodo.7545285.
  8. Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse. Journal of the American Medical Informatics Association, 24(3):607–613, 2017.
  9. Not a cute stroke: analysis of rule-and neural network-based information extraction systems for brain radiology reports. In Proceedings of the 11th international workshop on health text mining and information analysis, pages 24–37, 2020.
  10. Context: an algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of biomedical informatics, 42(5):839–851, 2009.
  11. Alibaba DAMO Academy at TREC Clinical Trials 2021: Exploring Embedding-based First-stage Retrieval with TrialMatcher. TREC 2021, 2021.
  12. A probabilistic model of information retrieval: development and comparative experiments: Part 2. Information processing & management, 36(6):809–840, 2000.
  13. EliIE: An open-source information extraction system for clinical trial eligibility criteria. Journal of the American Medical Informatics Association, 24(6):1062–1071, 04 2017. ISSN 1067-5027. doi: 10.1093/jamia/ocx019. URL https://doi.org/10.1093/jamia/ocx019.
  14. A test collection for matching patients to clinical trials. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 669–672, 2016.
  15. Cohort-based clinical trial retrieval. In Proceedings of the 25th Australasian Document Computing Symposium, pages 1–9, 2021.
  16. DOSSIER at TREC 2021 Clinical Trials Track. TREC 2021, 2021.
  17. Automation of citation screening for systematic literature reviews using neural networks: A replicability study. In Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, and Vinay Setty, editors, Advances in Information Retrieval, pages 584–598, Cham, 2022. Springer International Publishing. ISBN 978-3-030-99736-6. URL https://doi.org/10.1007/978-3-030-99736-6_39.
  18. Johannes Leveling. Patient selection for clinical trials based on concept-based retrieval and result filtering and ranking. In TREC, 2017.
  19. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-5034. URL https://www.aclweb.org/anthology/W19-5034.
  20. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. Journal of the American Medical Informatics Association, 22(1):166–178, 2015.
  21. Neural query synthesis and domain-specific ranking templates for multi-stage clinical trial matching. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2325–2330, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450387323. doi: 10.1145/3477495.3531853. URL https://doi.org/10.1145/3477495.3531853.
  22. Computational challenges and human factors influencing the design and use of clinical research participant eligibility pre-screening tools. BMC medical informatics and decision making, 12(1):1–11, 2012.
  23. Overview of the TREC 2015 Clinical Decision Support track. In TREC, 2015.
  24. Overview of the trec 2017 precision medicine track. In TREC, 2017.
  25. Overview of the trec 2019 precision medicine track. In The text REtrieval conference: TREC. Text REtrieval Conference, 2019.
  26. Overview of the TREC 2021 Clinical Trials Track. In Proceedings of the Thirtieth Text REtrieval Conference (TREC 2021), 2021.
  27. Overview of the TREC 2022 Clinical Trials Track. In Proceedings of the Thirtieth Text REtrieval Conference (TREC 2022), 2022.
  28. Clinical trial search: Using biomedical language understanding models for re-ranking. Journal of Biomedical Informatics, 109:103530, 2020.
  29. CSIROmed Team Report of TREC 2021 Clinical Trials track: Experiments with BERT Reranking Methods. 2022.
  30. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform, 7(2):e12239, Apr 2019. ISSN 2291-9694. doi: 10.2196/12239. URL https://doi.org/10.2196/12239.
  31. Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition. JMIR Med Inform, 9(1):e24008, Jan 2021. ISSN 2291-9694. doi: 10.2196/24008. URL https://doi.org/10.2196/24008.
  32. Textual inference for eligibility criteria resolution in clinical trials. Journal of biomedical informatics, 58:S211–S218, 2015.
  33. Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11–21, 1972.
  34. Comparison of rule-based and neural network models for negation detection in radiology reports. Natural Language Engineering, 27(2):203–224, 2021.
  35. Improvements to BM25 and language models examined. In Proceedings of the 2014 Australasian Document Computing Symposium, pages 58–65, 2014.
  36. Assertion detection in clinical notes: Medical language models to the rescue? In Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations, pages 35–40, 2021.
  37. Trial2vec: Zero-shot clinical trial document similarity search using self-supervision. arXiv preprint arXiv:2206.14719, 2022.
  38. Curriculum learning for dense retrieval distillation. arXiv preprint arXiv:2204.13679, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Wojciech Kusa (16 papers)
  2. Petr Knoth (19 papers)
  3. Gabriella Pasi (25 papers)
  4. Allan Hanbury (45 papers)
  5. Óscar E. Mendoza (1 paper)
Citations (6)

Summary

We haven't generated a summary for this paper yet.