Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models (2403.08607v1)

Published 13 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs have shown impressive capabilities in generating human-like responses. However, their lack of domain-specific knowledge limits their applicability in healthcare settings, where contextual and comprehensive responses are vital. To address this challenge and enable the generation of patient-centric responses that are contextually relevant and comprehensive, we propose MedInsight:a novel retrieval augmented framework that augments LLM inputs (prompts) with relevant background information from multiple sources. MedInsight extracts pertinent details from the patient's medical record or consultation transcript. It then integrates information from authoritative medical textbooks and curated web resources based on the patient's health history and condition. By constructing an augmented context combining the patient's record with relevant medical knowledge, MedInsight generates enriched, patient-specific responses tailored for healthcare applications such as diagnosis, treatment recommendations, or patient education. Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses. Quantitative evaluation using the Ragas metric and TruLens for answer similarity and answer correctness demonstrates the model's efficacy. Furthermore, human evaluation studies involving Subject Matter Expert (SMEs) confirm MedInsight's utility, with moderate inter-rater agreement on the relevance and correctness of the generated responses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813, 2023.
  2. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232, 2023.
  3. Fine-tuning or retrieval? comparing knowledge injection in llms. arXiv preprint arXiv:2312.05934, 2023.
  4. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  5. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
  6. Recall and learn: Fine-tuning deep pretrained language models with less forgetting. arXiv preprint arXiv:2004.12651, 2020.
  7. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747, 2023.
  8. Meta-learning via language model in-context tuning. arXiv preprint arXiv:2110.07814, 2021.
  9. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  10. Can language models learn from explanations in context? arXiv preprint arXiv:2204.02329, 2022.
  11. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  12. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022.
  13. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109, 2023.
  14. Class-based n-gram models of natural language. Comput. Linguist, (1950), 1992.
  15. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  16. Localintel: Generating organizational threat intelligence from global and local cyber knowledge. arXiv preprint arXiv:2401.10036, 2024.
  17. Recommendation as instruction following: A large language model empowered recommendation approach. arXiv preprint arXiv:2305.07001, 2023.
  18. Large language models are zero-shot rankers for recommender systems. arXiv preprint arXiv:2305.08845, 2023.
  19. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  20. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  21. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  22. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  23. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  24. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023.
  25. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020.
  26. Fine-tuning language models to find agreement among humans with diverse preferences. Advances in Neural Information Processing Systems, 35:38176–38189, 2022.
  27. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  28. Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models. PLoS digital health, 2(2):e0000198, 2023.
  29. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617, 2023.
  30. Almanac: Retrieval-augmented language models for clinical medicine. Research Square, 2023.
  31. Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. arXiv preprint arXiv:2305.18395, 2023.
  32. Clinfo. ai: An open-source retrieval-augmented large language model system for answering medical questions using scientific literature. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024, pages 8–23. World Scientific, 2023.
  33. Improving accuracy of gpt-3/4 results on biomedical data using a retrieval-augmented language model. arXiv preprint arXiv:2305.17116, 2023.
  34. Augmenting black-box llms with medical textbooks for clinical question answering. arXiv preprint arXiv:2309.02233, 2023.
  35. [Online] MTSAMPLES. Transcribed medical transcription sample reports and examples. https://mtsamples.com/.
  36. [Online] Hugging Face. Thebloke/mistral-7b-instruct-v0.2-gguf. https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF.
  37. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023.
  38. TruLens. Trulens: Don’t just vibe check your llm app!
  39. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  40. Lin CY ROUGE. A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization of ACL, Spain, volume 5, 2004.
  41. Likert scales and data analyses. Quality progress, 40(7):64–65, 2007.
  42. Mary L McHugh. Interrater reliability: the kappa statistic. Biochemia medica, 22(3):276–282, 2012.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Subash Neupane (17 papers)
  2. Shaswata Mitra (14 papers)
  3. Sudip Mittal (66 papers)
  4. Noorbakhsh Amiri Golilarz (8 papers)
  5. Shahram Rahimi (36 papers)
  6. Amin Amirlatifi (4 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.