Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Developing Healthcare Language Model Embedding Spaces (2403.19802v1)

Published 28 Mar 2024 in cs.CL and cs.AI

Abstract: Pre-trained LLMs often struggle on out-of-domain datasets like healthcare focused text. We explore specialized pre-training to adapt smaller LLMs to different healthcare datasets. Three methods are assessed: traditional masked LLMing, Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR), and a novel pre-training objective utilizing metadata categories from the healthcare settings. These schemes are evaluated on downstream document classification tasks for each dataset, with additional analysis of the resultant embedding spaces. Contrastively trained models outperform other approaches on the classification tasks, delivering strong performance from limited labeled data and with fewer model parameter updates required. While metadata-based pre-training does not further improve classifications across the datasets, it yields interesting embedding cluster separability. All domain adapted LLMs outperform their publicly available general base LLM, validating the importance of domain-specialization. This research illustrates efficient approaches to instill healthcare competency in compact LLMs even under tight computational budgets, an essential capability for responsible and sustainable deployment in local healthcare settings. We provide pre-training guidelines for specialized healthcare LLMs, motivate continued inquiry into contrastive objectives, and demonstrates adaptation techniques to align small LLMs with privacy-sensitive medical tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  2. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.6. URL https://aclanthology.org/2020.emnlp-demos.6.
  3. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  4. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243.
  5. Generative Representational Instruction Tuning, February 2024. URL http://arxiv.org/abs/2402.09906. arXiv:2402.09906 [cs].
  6. Clinicalbert: Modeling clinical notes and predicting hospital readmission, 2019. URL https://arxiv.org/abs/1904.05342.
  7. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, June 2019a. Association for Computational Linguistics. doi: 10.18653/v1/W19-1909. URL https://aclanthology.org/W19-1909.
  8. Gpt-3 models are poor few-shot learners in the biomedical domain, 2021. URL https://arxiv.org/abs/2109.02555.
  9. Efficient Few-Shot Learning Without Prompts, September 2022. URL http://arxiv.org/abs/2209.11055. arXiv:2209.11055 [cs].
  10. Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again. 2022. doi: 10.48550/ARXIV.2203.08410. URL https://arxiv.org/abs/2203.08410. Publisher: arXiv Version Number: 3.
  11. Text Classification via Large Language Models, May 2023. URL http://arxiv.org/abs/2305.08377. arXiv:2305.08377 [cs].
  12. Does Synthetic Data Generation of LLMs Help Clinical Text Mining?, April 2023. URL http://arxiv.org/abs/2303.04360. arXiv:2303.04360 [cs].
  13. Roberta: A robustly optimized bert pretraining approach. 2019a. doi: 10.48550/ARXIV.1907.11692. URL https://arxiv.org/abs/1907.11692.
  14. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, August 2019. URL http://arxiv.org/abs/1908.10084. arXiv:1908.10084 [cs].
  15. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 879–895, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.72. URL https://aclanthology.org/2021.acl-long.72.
  16. Semantic re-tuning with contrastive tension. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Ov_sMNau-PF.
  17. UK NHS. Abbreviations you may find in your health records - NHS App help and support, June 2021. URL https://www.nhs.uk/nhs-app/nhs-app-help-and-support/health-records-in-the-nhs-app/abbreviations-commonly-found-in-medical-records/.
  18. Clinical Prompt Learning With Frozen Language Models. IEEE Transactions on Neural Networks and Learning Systems, pages 1–11, 2023a. ISSN 2162-2388. doi: 10.1109/TNNLS.2023.3294633. URL https://ieeexplore.ieee.org/document/10215061.
  19. National Prescription Patterns of Antidepressants in the Treatment of Adults With Major Depression in the US Between 1996 and 2015: A Population Representative Survey Based Analysis. Frontiers in Psychiatry, 11, 2020. ISSN 1664-0640.
  20. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, November 2023. URL http://arxiv.org/abs/2311.16079. arXiv:2311.16079 [cs].
  21. SimCSE: Simple Contrastive Learning of Sentence Embeddings, May 2022. URL http://arxiv.org/abs/2104.08821. arXiv:2104.08821 [cs].
  22. Linkbert: Pretraining language models with document links. In Association for Computational Linguistics (ACL), 2022a.
  23. Deep Bidirectional Language-Knowledge Graph Pretraining, October 2022b. URL http://arxiv.org/abs/2210.09338. arXiv:2210.09338 [cs].
  24. Self-alignment pretraining for biomedical entity representations. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4228–4238, June 2021.
  25. Mimic-iii, a freely accessible critical care database. Scientific Data, 3, 5 2016. ISSN 20524463. doi: 10.1038/sdata.2016.35.
  26. Clinical prompt learning with frozen language models. IEEE Transactions on Neural Networks and Learning Systems, pages 1–11, 2023b. doi: 10.1109/TNNLS.2023.3294633.
  27. NHS Digital. Mental health bulletin: 2019-20 annual report. Technical report, 2020. URL https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-bulletin/2019-20-annual-report.
  28. NHS England » National patient safety incident reports. URL https://www.england.nhs.uk/patient-safety/national-patient-safety-incident-reports/.
  29. RoBERTa: A Robustly Optimized BERT Pretraining Approach, July 2019b. URL http://arxiv.org/abs/1907.11692. arXiv:1907.11692 [cs].
  30. Longformer: The long-document transformer. 2020. URL https://arxiv.org/abs/2004.05150.
  31. Efficiently Scaling Transformer Inference, November 2022. URL http://arxiv.org/abs/2211.05102. arXiv:2211.05102 [cs].
  32. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, June 2022. URL http://arxiv.org/abs/2205.14135. arXiv:2205.14135 [cs].
  33. Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning, April 2021. URL http://arxiv.org/abs/2011.01403. arXiv:2011.01403 [cs].
  34. Cramming: Training a Language Model on a Single GPU in One Day, December 2022. URL http://arxiv.org/abs/2212.14034. arXiv:2212.14034 [cs].
  35. Biobert: a pre-trained biomedical language representation model for biomedical text mining. CoRR, abs/1901.08746, 2019. URL http://arxiv.org/abs/1901.08746.
  36. Publicly Available Clinical BERT Embeddings. April 2019b. URL http://arxiv.org/abs/1904.03323.
  37. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227, April 1979. ISSN 1939-3539. doi: 10.1109/TPAMI.1979.4766909. URL https://ieeexplore.ieee.org/document/4766909. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
  38. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association : JAMIA, 18(5):552–556, 2011. ISSN 1067-5027. doi: 10.1136/amiajnl-2011-000203. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168320/.
  39. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. Journal of the American Medical Informatics Association : JAMIA, 20(5):806–813, September 2013. ISSN 1067-5027. doi: 10.1136/amiajnl-2013-001628. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756273/.
  40. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. Journal of Biomedical Informatics, 58:S11–S19, December 2015. ISSN 1532-0464. doi: 10.1016/j.jbi.2015.06.007. URL https://www.sciencedirect.com/science/article/pii/S1532046415001173.
  41. Lightweight Transformers for Clinical Natural Language Processing, February 2023. URL http://arxiv.org/abs/2302.04725. arXiv:2302.04725 [cs].

Summary

We haven't generated a summary for this paper yet.