Papers
Topics
Authors
Recent
Search
2000 character limit reached

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

Published 3 Jan 2024 in cs.AI and cs.CL | (2401.01596v1)

Abstract: In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. A. B. Abacha and D. Demner-Fushman. On the summarization of consumer health questions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2228–2234, 2019.
  2. Overview of the mediqa 2021 shared task on summarization in the medical domain. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 74–85, 2021.
  3. An investigation of evaluation metrics for automated medical note generation. arXiv preprint arXiv:2305.17364, 2023.
  4. S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  7. A. Das and B. Gambäck. Identifying languages at the word level in code-mixed indian social media text. arXiv preprint arXiv:2302.13971, 2014.
  8. QIAI at MEDIQA 2021: Multimodal radiology report summarization. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 285–290, Online, June 2021. Association for Computational Linguistics.
  9. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  11. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  13. Clipsyntel: Clip and llm synergy for multimodal question summarization in healthcare. arXiv preprint arXiv:2312.11541, 2023.
  14. A dataset for medical instructional video classification and question answering, 2022.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  16. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  17. Large language models are zero-shot reasoners. arxiv, 2023.
  18. Diving into a sea of opinions: Multi-modal abstractive summarization with comment sensitivity. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 1117–1126, 2023.
  19. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  20. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  21. Medical-vlbert: Medical visual language bert for covid-19 ct report generation with alternate learning. IEEE Transactions on Neural Networks and Learning Systems, 32(9):3786–3797, 2021.
  22. Joint summarization-entailment optimization for consumer health question understanding. In Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations, pages 58–65, 2021.
  23. Telemedicine practice: review of the current ethical and legal challenges. Telemedicine and e-Health, 26(12):1427–1437, 2020.
  24. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  25. Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training. arXiv preprint arXiv:2001.04063, 2020.
  26. Xraygpt: Chest radiographs summarization using medical vision-language models. arXiv preprint arXiv:2306.07971, 2023.
  27. Dr. can see: towards a multi-modal disease diagnosis virtual assistant. In Proceedings of the 31st ACM international conference on information & knowledge management, pages 1935–1944, 2022.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  29. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  30. Zephyr: Direct distillation of lm alignment, 2023.
  31. Reinforcement learning for abstractive question summarization with question-aware semantic rewards. arXiv preprint arXiv:2107.00176, 2021.
  32. Prompting multilingual large language models to generate code-mixed texts: The case of south east asian languages. In Sixth Workshop on Computational Approaches to Linguistic Code-Switching, 2023.
  33. Vision-language models for vision tasks: A survey. arXiv preprint arXiv:2304.00685, 2023.
  34. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328–11339. PMLR, 2020.
  35. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  36. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023.
  37. Skingpt-4: An interactive dermatology diagnostic system with visual large language model. arXiv preprint arXiv:2304.10691, 2023.
  38. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
Citations (6)

Summary

  • The paper introduces MedSumm, a novel framework that integrates textual and visual data to summarize code-mixed clinical queries.
  • The paper employs pre-trained language models, vision transformers, and QLoRA fine-tuning to generate comprehensive and accurate medical summaries.
  • The paper demonstrates superior performance over text-only models through both automatic and human evaluations, supported by the new MMCQS dataset.

Introduction to Multimodal Medical Summarization

In the landscape of medical information processing, the ability to quickly and accurately summarize patient queries is essential for maintaining efficient doctor-patient communication. Traditionally, summarization efforts have concentrated on textual data, largely overlooking the potential of multimodal information that includes visual aids. In this paper, an innovative approach known as MedSumm is presented, which leverages both LLMs and Vision-LLMs (VLMs) to summarize medical queries that include code-mixed language (Hindi-English) and visual content.

The MedSumm Framework

MedSumm operates by creating a comprehensive representation of a patient's medical condition through the integration of textual and visual data. The framework processes code-mixed patient queries and the corresponding visual cues. To achieve this, it employs several components: firstly, it uses pre-trained LLMs for generating text embeddings; secondly, it makes use of vision encoders like Vision Transformer (ViT) to encode visual information; then, it leverages a technique known as QLoRA for efficient fine-tuning; and lastly, it carries out the inference process to generate a summary that encapsulates both the symptom name and a synopsis of the condition. The framework's application is clearly illustrated in the paper with a pipeline diagram.

MMCQS Dataset: A New Resource for Research

Supporting this novel summarization technique is the Multimodal Medical Codemixed Question Summarization (MMCQS) dataset, released as part of the research. It is the first of its kind, containing 3,015 samples of Hindi-English code-mixed medical queries with visual cues and corresponding English summaries, paving the way for future exploration in this domain. The dataset is meticulously crafted, with medical doctors and students involved in the data collection and annotation process to ensure clinical relevance and accuracy.

Evaluation and Implications

The paper details both automatic and human-led evaluations. Medical experts and postgraduate medical students rated the model-generated summaries and found that MedSumm performed significantly better compared to unimodal, text-only models. This points to the importance of visual data in creating a more holistic understanding of medical queries. In the broader context of healthcare, the benefits of such multimodal summarization tools include enhanced communication between patients and doctors, leading to more responsive and personalized medical care.

The release of the MMCQS dataset alongside the MedSumm framework, with open access to the dataset, code, and pre-trained models, empowers researchers and healthcare professionals to develop advanced tools that could potentially reshape medical triaging and consultations – ensuring language barriers are lowered and quality of healthcare is improved.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 7 likes about this paper.