Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine (2402.18028v2)

Published 28 Feb 2024 in cs.CV

Abstract: The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and Gemini, has reshaped the landscape of research (academia and industry) in machine learning and many other research areas. However, domain-specific applications of such foundation models (e.g., in medicine) remain untouched or often at their very early stages. It will require an individual set of transfer learning and model adaptation techniques by further expanding and injecting these models with domain knowledge and data. The development of such technologies could be largely accelerated if the bundle of data, algorithms, and pre-trained foundation models were gathered together and open-sourced in an organized manner. In this work, we present OpenMEDLab, an open-source platform for multi-modality foundation models. It encapsulates not only solutions of pioneering attempts in prompting and fine-tuning large language and vision models for frontline clinical and bioinformatic applications but also building domain-specific foundation models with large-scale multi-modal medical data. Importantly, it opens access to a group of pre-trained foundation models for various medical image modalities, clinical text, protein engineering, etc. Inspiring and competitive results are also demonstrated for each collected approach and model in a variety of benchmarks for downstream tasks. We welcome researchers in the field of medical artificial intelligence to continuously contribute cutting-edge methods and models to OpenMEDLab, which can be accessed via https://github.com/openmedlab.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Bianque: Balancing the questioning and suggestion ability of health llms with multi-turn health conversations polished by chatgpt. arXiv preprint arXiv:2310.15896, 2023.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
  5. T. Dao. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691, 2023.
  6. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 248–255, 2009.
  7. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  8. A baseline for few-shot image classification. arXiv preprint arXiv:1909.02729, 2019.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, 2021.
  10. The calla dataset: Probing llms’ interactive knowledge acquisition from chinese medical literature. arXiv preprint arXiv:2309.04198, 2023.
  11. Sensecare: A research platform for medical image informatics and interactive 3d visualization. arXiv preprint arXiv:2004.07031, 2020.
  12. A. E. Elo. The proposed uscf rating system, its development, theory, and applications. Chess Life, 22(8):242–247, 1967.
  13. Snow: Semi-supervised, noisy and/or weak data for deep learning in digital pathology. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pages 1869–1872. IEEE, 2019.
  14. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022.
  15. Applying deep matching networks to chinese medical question answering: A study and a dataset. BMC Medical Informatics and Decision Making, 19(2):52, 2019. doi: 10.1186/s12911-019-0761-8.
  16. Pathoduet: Foundation models for pathological slide analysis of h&e and ihc stains. arXiv preprint arXiv:2312.09894, 2023.
  17. Segment anything model for medical images? Medical Image Analysis, 92:103061, 2024.
  18. A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine, 29(9):2307–2316, 2023a.
  19. Stu-net: Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. arXiv preprint arXiv:2304.06716, 2023b.
  20. Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. arXiv preprint arXiv:2401.00153, 2024.
  21. Deblurring masked autoencoder is better recipe for ultrasound image recognition. arXiv preprint arXiv:2306.08249, 2023.
  22. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  23. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  24. One-shot weakly-supervised segmentation in 3d medical images. IEEE Transactions on Medical Imaging, 2023a.
  25. Medlsam: Localize and segment anything model for 3d medical images. arXiv preprint arXiv:2306.14752, 2023b.
  26. D-lmbmap: a fully automated deep-learning pipeline for whole-brain profiling of neural circuitry. Nature Methods, pages 1–12, 2023.
  27. Awq: Activation-aware weight quantization for llm compression and acceleration. arXiv preprint arXiv:2306.00978, 2023.
  28. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409, 2022a.
  29. Word: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from ct image. Medical image analysis, 82:102642, 2022b.
  30. Segrap2023: A benchmark of organs-at-risk and gross tumor volume segmentation for radiotherapy planning of nasopharyngeal carcinoma. arXiv preprint arXiv:2312.09576, 2023.
  31. Segment anything in medical images. Nature Communications, 15(1):654, 2024.
  32. Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265, 2023.
  33. Medical image understanding with pretrained vision language models: A comprehensive study. In Proceedings of the International Conference on Learning Representations, 2022.
  34. Improving language understanding by generative pre-training. OpenAI, 2018.
  35. Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning, pages 8748–8763, 2021.
  36. Fhist: A benchmark for few-shot classification of histological images. arXiv preprint arXiv: 2206.00092, 2022.
  37. Few-shot medical image segmentation using a global correlation network with discriminative embedding. arXiv preprint arXiv:2012.05440, 2020.
  38. Templ: A novel deep learning model for zero-shot prediction of protein stability and activity based on temperature-guided language modeling. arXiv preprint arXiv:2304.03780, 2023.
  39. Rethinking few-shot image classification: a good embedding is all you need? In Proceedings of the European Conference on Computer Vision, pages 266–282. Springer, 2020.
  40. A real-world dataset and benchmark for foundation model adaptation in medical image classification. Scientific Data, 10, 2023a.
  41. Medfmc: A real-world dataset and benchmark for foundation model adaptation in medical image classification. Scientific Data, 2023b.
  42. Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset. arXiv preprint arXiv:2306.16925, 2023c.
  43. Sam-med3d. arXiv preprint arXiv:2310.15161, 2023d.
  44. Foundation model for endoscopy video analysis via large-scale self-supervised pre-train. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 101–111. Springer, 2023e.
  45. Pattern-aware transformer: Hierarchical pattern propagation in sequential medical images. IEEE Transactions on Medical Imaging, 2023a.
  46. Brow: Better features for whole slide image based on self-distillation. arXiv preprint arXiv:2309.08259, 2023b.
  47. Sa-med2d-20m dataset: Segment anything in 2d medical imaging with 20 million masks. arXiv preprint arXiv:2311.11969, 2023.
  48. S. Zhang and D. Metaxas. On the challenges and perspectives of foundation models for medical image analysis. Medical Image Analysis, 91:102996, 2024.
  49. Text-guided foundation model adaptation for pathological image classification. In Proceedings of the International conference on Medical Image Computing and Computer Assisted Intervention, pages 272–282. Springer, 2023.
  50. Data-centric foundation models in computational healthcare: A survey. arXiv preprint arXiv:2401.02458, 2024.
  51. A foundation model for generalizable disease detection from retinal images. Nature, pages 1–8, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Xiaosong Wang (42 papers)
  2. Xiaofan Zhang (79 papers)
  3. Guotai Wang (67 papers)
  4. Junjun He (77 papers)
  5. Zhongyu Li (72 papers)
  6. Wentao Zhu (73 papers)
  7. Yi Guo (115 papers)
  8. Qi Dou (163 papers)
  9. Xiaoxiao Li (144 papers)
  10. Dequan Wang (37 papers)
  11. Liang Hong (67 papers)
  12. Qicheng Lao (27 papers)
  13. Tong Ruan (22 papers)
  14. Yukun Zhou (29 papers)
  15. Yixue Li (3 papers)
  16. Jie Zhao (214 papers)
  17. Kang Li (207 papers)
  18. Xin Sun (151 papers)
  19. Lifeng Zhu (9 papers)
  20. Shaoting Zhang (133 papers)
Citations (6)