Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding (2311.15876v3)

Published 27 Nov 2023 in cs.CV, cs.AI, and cs.LG
End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding

Abstract: Recent advances in AI foundation models have significant potential for lightening the clinical workload by mimicking the comprehensive and multi-faceted approaches used by medical professionals. In the field of radiation oncology, the integration of multiple modalities holds great importance, so the opportunity of foundational model is abundant. Inspired by this, here we present RO-LMM, a multi-purpose, comprehensive large multimodal model (LMM) tailored for the field of radiation oncology. This model effectively manages a series of tasks within the clinical workflow, including clinical context summarization, radiation treatment plan suggestion, and plan-guided target volume segmentation by leveraging the capabilities of LMM. In particular, to perform consecutive clinical tasks without error accumulation, we present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LMM's robustness to noisy inputs while preserving the consistency of handling clean inputs. We further extend this concept to LMM-driven segmentation framework, leading to a novel Consistency Embedding Segmentation~(CESEG) techniques. Experimental results including multi-centre validation confirm that our RO-LMM with CEFTune and CESEG results in promising performance for multiple clinical tasks with generalization capabilities.

Introduction

AI has dramatically impacted the medical field by providing tools to assist in clinical decisions and reducing workloads. Despite that, most AI models are designed to handle single tasks with uni-modal data, which does not align well with the multifaceted nature of medical professional responsibilities. This paper introduces a novel AI model, RO-LLaMA, which operates as a generalist LLM specifically for the clinical workflow in radiation oncology.

Methodology

RO-LLaMA exhibits capabilities in three crucial areas: (1) efficiently summarizing comprehensive patient histories into concise clinical notes, (2) proposing treatment plans from a clinical expert perspective, and (3) delineating radiation target volumes directly from clinical reports. To enhance robustness against inevitable errors during sequential tasks, two pioneering techniques are introduced: Noisy Embedding Fine-Tuning (NEFTune), which injects noise into embeddings during training, and Consistency Embedding Fine-Tuning (CEFTune), which enforces prediction consistency between noisy and clean inputs. These techniques, when applied to 3D segmentation tasks, lead to Noisy Embedding Segmentation (NESEG) and Consistency Embedding Segmentation (CESEG), thereby boosting the model's generalization abilities.

Experiments and Results

A comprehensive set of experiments conducted on multi-centre cohorts established RO-LLaMA's promise. For text-related tasks like clinical report summarization and treatment plan suggestion, the model—augmented with NEFTune and CEFTune—outperformed baseline methods on both internal and external datasets. When assessing the 3D target volume segmentation task, RO-LLaMA, combined with NESEG and CESEG, advanced beyond traditional methods, validating its adeptness in multi-modal reasoning.

Discussion and Conclusion

RO-LLaMA is poised as a versatile, multifunctional tool that could revolutionize the integration of AI into routine medical workflows. It extends beyond current AI solutions, which are often constrained to uni-modal, single-task applications. This model's innovations in noise augmentation and consistency regularization may lead to the development of fully generalist medical AI models, capable of holistically grasping clinical workflows in departments such as radiation oncology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156, 2020.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
  5. Clinical evaluation of atlas-and deep learning-based automatic segmentation of multiple organs and clinical target volumes for breast cancer. Radiotherapy and Oncology, 153:139–145, 2020.
  6. Clinical feasibility of deep learning-based auto-segmentation of target volumes and organs-at-risk in breast cancer patients after breast-conserving surgery. Radiation Oncology, 16(1):1–10, 2021.
  7. 3d u-net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, pages 424–432. Springer, 2016.
  8. Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023.
  9. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE transactions on medical imaging, 25(11):1451–1461, 2006.
  10. Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
  11. Open-vocabulary universal image segmentation with maskclip, 2023.
  12. Palm-e: An embodied multimodal language model, 2023.
  13. Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166, 2023.
  14. Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
  15. Artificial intelligence in radiology. Nature Reviews Cancer, 18(8):500–510, 2018.
  16. Lora: Low-rank adaptation of large language models, 2021.
  17. Noise stability regularization for improving bert fine-tuning. arXiv preprint arXiv:2107.04835, 2021.
  18. Contextual net: A multimodal vision-language model for segmentation of pneumothorax. arXiv preprint arXiv:2303.01615, 2023.
  19. Neftune: Noisy embeddings improve instruction finetuning. arXiv preprint arXiv:2310.05914, 2023.
  20. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437, 2019.
  21. Fda approved artificial intelligence and machine learning (ai/ml)-enabled medical devices: An updated 2022 landscape. medRxiv, pages 2022–12, 2022.
  22. Zegot: Zero-shot segmentation through optimal transport of text prompts. arXiv preprint arXiv:2301.12171, 2023.
  23. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  24. Publicly shareable clinical large language model built on synthetic clinical notes. arXiv preprint arXiv:2309.00237, 2023.
  25. Lisa: Reasoning segmentation via large language model. arXiv preprint arXiv:2308.00692, 2023.
  26. Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546, 2022.
  27. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023a.
  28. Lvit: Language meets vision transformer in medical image segmentation. IEEE Transactions on Medical Imaging, pages 1–1, 2023b.
  29. Open-vocabulary semantic segmentation with mask-adapted clip, 2023.
  30. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  31. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
  32. G-eval: Nlg evaluation using gpt-4 with better human alignment, may 2023. arXiv preprint arXiv:2303.16634.
  33. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  34. Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265, 2023.
  35. Llm-driven multimodal target volume contouring in radiation oncology, 2023.
  36. OpenAI. Chatgpt. OpenAI Blog, 2021.
  37. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  38. Training language models to follow instructions with human feedback, 2022. URL https://arxiv. org/abs/2203.02155, 13, 2022.
  39. The current and future state of ai interpretation of medical images. New England Journal of Medicine, 388(21):1981–1990, 2023.
  40. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
  41. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
  42. Stanford alpaca: An instruction-following llama model, 2023.
  43. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  44. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  45. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023c.
  46. Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334, 2023.
  47. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022a.
  48. Cris: Clip-driven referring image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11686–11695, 2022b.
  49. Pmc-llama: Further finetuning llama on medical papers. arXiv preprint arXiv:2304.14454, 2023a.
  50. Towards generalist foundation model for radiology. arXiv preprint arXiv:2308.02463, 2023b.
  51. Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems, 34:27263–27277, 2021.
  52. Ifseg: Image-free semantic segmentation via vision-language model, 2023.
  53. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070, 2023.
  54. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  55. Moverscore: Text generation evaluating with contextualized embeddings and earth mover distance. arXiv preprint arXiv:1909.02622, 2019.
  56. Conditional prompt learning for vision-language models, 2022.
  57. Freelb: Enhanced adversarial training for natural language understanding. arXiv preprint arXiv:1909.11764, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Kwanyoung Kim (12 papers)
  2. Yujin Oh (23 papers)
  3. Sangjoon Park (22 papers)
  4. Hwa Kyung Byun (5 papers)
  5. Jin Sung Kim (18 papers)
  6. Yong Bae Kim (3 papers)
  7. Jong Chul Ye (210 papers)
  8. Joongyo Lee (1 paper)

HackerNews