Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Q&A of Clinical Documents with Large Language Models

Published 19 Jan 2024 in cs.IR and cs.AI | (2401.10733v2)

Abstract: Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using LLMs for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural language, receiving relevant answers from clinical notes. Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands. Model optimization, including weight quantization, improves latency by approximately 48 times. Promising results indicate potential, yet challenges such as model hallucinations and limited diverse medical case evaluations remain. Addressing these gaps is crucial for unlocking the value in clinical notes and advancing AI-driven clinical decision-making.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. URL: https://arxiv.org/abs/1908.10084.
  2. Clinical benefits of electronic health record use: National findings. Health Services Research, 49(1pt2):392–404, 2014. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/1475-6773.12135, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/1475-6773.12135, doi:https://doi.org/10.1111/1475-6773.12135.
  3. Importance of accurately identifying disease in studies using electronic health records. BMJ (Clinical research ed.), 341:c4226, 08 2010. doi:10.1136/bmj.c4226.
  4. Importance of accurately identifying disease in studies using electronic health records. Bmj, 341, 2010.
  5. Attention is all you need, 2023. arXiv:1706.03762.
  6. Large language models for information retrieval: A survey, 2023. arXiv:2308.07107.
  7. Improving efficiency and robustness of transformer-based information retrieval systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 3433–3435, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3477495.3532681.
  8. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 09 2019. arXiv:https://academic.oup.com/bioinformatics/article-pdf/36/4/1234/48983216/bioinformatics_36_4_1234.pdf, doi:10.1093/bioinformatics/btz682.
  9. BioMedBERT: A pre-trained biomedical language model for QA and IR. In Donia Scott, Nuria Bel, and Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, pages 669–679, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. URL: https://aclanthology.org/2020.coling-main.59, doi:10.18653/v1/2020.coling-main.59.
  10. Towards expert-level medical question answering with large language models, 2023. arXiv:2305.09617.
  11. Impact of word embedding models on text analytics in deep learning environment: A review. Artif. Intell. Rev., 56(9):10345–10425, feb 2023. doi:10.1007/s10462-023-10419-1.
  12. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021. arXiv:2005.11401.
  13. MIMIC-IV (version 2.2). PhysioNet, 2023. URL: https://doi.org/10.13026/6mm1-ek67.
  14. MIMIC-IV-Note: Deidentified free-text clinical notes. PhysioNet, 2023. URL: https://doi.org/10.13026/1n74-ne17.
  15. Hugging Face. Language models leaderboard, 2023. Accessed: 07-2023. URL: https://huggingface.co/spaces/mteb/leaderboard.
  16. Hugging Face. Embedding models leaderboard, 2023. Accessed: 07-2023. URL: https://huggingface.co/spaces/mteb/leaderboard.
  17. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL: https://lmsys.org/blog/2023-03-30-vicuna/.
  18. Wizardlm: Empowering large language models to follow complex instructions, 2023. arXiv:2304.12244.
  19. Together Computer. Redpajama: An open source recipe to reproduce llama training dataset, 2023. URL: https://github.com/togethercomputer/RedPajama-Data.
  20. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  21. Medalpaca – an open-source collection of medical conversational ai models and training data, 2023. arXiv:2304.08247.
  22. Chavinlo. Gpt-4 x alpaca, 2023. URL: https://huggingface.co/chavinlo/gpt4-x-alpaca.
  23. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023. arXiv:2306.05685.
  24. Scaling instruction-finetuned language models, 2022. URL: https://arxiv.org/abs/2210.11416, doi:10.48550/ARXIV.2210.11416.
  25. 64bits. Lexpodlm-13b, 2023. URL: https://huggingface.co/64bits/LexPodLM-13B.
  26. Sharegpt, 2022. URL: https://sharegpt.com/.
  27. SmoothQuant: Accurate and efficient post-training quantization for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 38087–38099. PMLR, 23–29 Jul 2023. URL: https://proceedings.mlr.press/v202/xiao23c.html.
  28. Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation, 2023. arXiv:2303.08302.
  29. Qlora: Efficient finetuning of quantized llms, 2023. arXiv:2305.14314.
Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.