Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Foundational Multimodal Vision Language AI Assistant for Human Pathology (2312.07814v1)

Published 13 Dec 2023 in cs.CV and cs.AI

Abstract: The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative AI, there has been limited study on building general purpose, multimodal AI assistants tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology using an in-house developed foundational vision encoder pretrained on 100 million histology images from over 100,000 patient cases and 1.18 million pathology image-caption pairs. The vision encoder is then combined with a pretrained LLM and the whole system is finetuned on over 250,000 diverse disease agnostic visual language instructions. We compare PathChat against several multimodal vision language AI assistants as well as GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4. When relevant clinical context is provided with the histology image, PathChat achieved a diagnostic accuracy of 87% on multiple-choice questions based on publicly available cases of diverse tissue origins and disease models. Additionally, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision language AI assistant that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (126)
  1. Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical image analysis 33, 170–175 (2016).
  2. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nature Cancer 3, 1026–1038 (2022).
  3. Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering 1–20 (2023).
  4. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nature reviews Clinical oncology 16, 703–715 (2019).
  5. The future of artificial intelligence in digital pathology–results of a survey across stakeholder groups. Histopathology 80, 1121–1127 (2022).
  6. Artificial intelligence and computational pathology. Laboratory Investigation 101, 412–422 (2021).
  7. Abels, E. et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association. The Journal of pathology 249, 286–294 (2019).
  8. Waqas, A. et al. Revolutionizing digital pathology with the power of generative artificial intelligence and foundation models. Laboratory Investigation 100255 (2023).
  9. Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer cell 40, 1095–1110 (2022).
  10. Coudray, N. et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature medicine 24, 1559–1567 (2018).
  11. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nature biomedical engineering 5, 555–570 (2021).
  12. Bulten, W. et al. Automated deep-learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. The Lancet Oncology 21, 233–241 (2020).
  13. Nagpal, K. et al. Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer. NPJ digital medicine 2, 48 (2019).
  14. Huang, S.-C. et al. Deep neural network trained on gigapixel images improves lymph node metastasis detection in clinical settings. Nature communications 13, 3347 (2022).
  15. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25, 1301–1309 (2019).
  16. Ehteshami Bejnordi, B. et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA 318, 2199–2210 (2017).
  17. Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Science translational medicine 3 (2011).
  18. Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences 115, E2970–E2979 (2018).
  19. Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
  20. Derivation of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning. Nat. Biomed. Eng (2022).
  21. Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nature medicine 25, 1519–1525 (2019).
  22. Lu, C. et al. A prognostic model for overall survival of patients with early-stage non-small cell lung cancer: a multicentre, retrospective study. The Lancet Digital Health 2, e594–e606 (2020).
  23. Amgad, M. et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nature Medicine 1–13 (2023).
  24. Boehm, K. M. et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nature cancer 3, 723–733 (2022).
  25. Sammut, S.-J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).
  26. Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to pd-(l) 1 blockade in patients with non-small cell lung cancer. Nature cancer 3, 1151–1164 (2022).
  27. Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precision Oncology 7, 14 (2023).
  28. Lu, M. Y. et al. Ai-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
  29. Zhu, L. et al. An accurate prediction of the origin for bone metastatic cancer using deep learning on digital pathological images. EBioMedicine 87 (2023).
  30. Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nature Biomedical Engineering 6, 1420–1434 (2022).
  31. Kalra, S. et al. Yottixel–an image search engine for large archives of histopathology whole slide images. Medical Image Analysis 65, 101757 (2020).
  32. Hegde, N. et al. Similar image search for histopathology: Smily. NPJ digital medicine 2, 56 (2019).
  33. Wang, X. et al. Retccl: clustering-guided contrastive learning for whole-slide image retrieval. Medical image analysis 83, 102645 (2023).
  34. Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nature cancer 1, 789–799 (2020).
  35. Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nature cancer 1, 800–810 (2020).
  36. Saldanha, O. L. et al. Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology. NPJ Precision Oncology 7, 35 (2023).
  37. Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell 41, 1650–1661 (2023).
  38. Yala, A. et al. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nature medicine 28, 136–143 (2022).
  39. Zhou, Y. et al. Multi-site cross-organ calibrated deep learning (muscld): Automated diagnosis of non-melanoma skin cancer. Medical image analysis 84, 102702 (2023).
  40. Laleh, N. G. et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Medical image analysis 79 (2022).
  41. Graham, S. et al. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical image analysis 58, 101563 (2019).
  42. Graham, S. et al. One model is all you need: multi-task learning enables simultaneous histology image segmentation and classification. Medical Image Analysis 83, 102685 (2023).
  43. Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, 9650–9660 (2021).
  44. Oquab, M. et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023).
  45. He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009 (2022).
  46. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607 (PMLR, 2020).
  47. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9729–9738 (2020).
  48. Zhou, J. et al. Image bert pre-training with online tokenizer. In International Conference on Learning Representations (2021).
  49. Chen, R. J. et al. A general-purpose self-supervised model for computational pathology. arXiv preprint arXiv:2308.15474 (2023).
  50. Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis 81, 102559 (2022).
  51. Lai, J. et al. Domain-specific optimization and diverse evaluation of self-supervised models for histopathology. arXiv preprint arXiv:2310.13259 (2023).
  52. Vorontsov, E. et al. Virchow: A million-slide digital pathology foundation model. arXiv preprint arXiv:2309.07778 (2023).
  53. Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nature Biomedical Engineering 1–24 (2023).
  54. Campanella, G. et al. Computational pathology at health system scale–self-supervised foundation models from three billion images. arXiv preprint arXiv:2310.07033 (2023).
  55. Benchmarking self-supervised learning on diverse pathology datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3344–3354 (2023).
  56. Self supervised contrastive learning for digital histopathology. Machine Learning with Applications 7, 100198 (2022).
  57. Radford, A. et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763 (PMLR, 2021).
  58. Jia, C. et al. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, 4904–4916 (PMLR, 2021).
  59. Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. In Proceedings of the AAAI conference on artificial intelligence, vol. 34, 11336–11344 (2020).
  60. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32 (2019).
  61. Chen, Y.-C. et al. Uniter: Universal image-text representation learning. In European conference on computer vision, 104–120 (Springer, 2020).
  62. Li, X. et al. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16, 121–137 (Springer, 2020).
  63. Wang, J. et al. Git: A generative image-to-text transformer for vision and language. arXiv preprint arXiv:2205.14100 (2022).
  64. Zhang, P. et al. Vinvl: Revisiting visual representations in vision-language models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5579–5588 (2021).
  65. Hu, X. et al. Scaling up vision-language pre-training for image captioning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 17980–17989 (2022).
  66. Wang, W. et al. Image as a foreign language: Beit pretraining for all vision and vision-language tasks. arXiv preprint arXiv:2208.10442 (2022).
  67. Yu, J. et al. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022).
  68. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, 12888–12900 (PMLR, 2022).
  69. Li, J. et al. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems 34, 9694–9705 (2021).
  70. Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Modern pathology 33, 2169–2185 (2020).
  71. Schuhmann, C. et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35, 25278–25294 (2022).
  72. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, 2–25 (PMLR, 2022).
  73. Radiology objects in context (roco): a multimodal image dataset. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 3, 180–189 (Springer, 2018).
  74. Leveraging medical twitter to build a visual–language foundation model for pathology ai. bioRxiv 2023–03 (2023).
  75. Zhang, S. et al. Large-scale domain-specific pretraining for biomedical vision-language processing. arXiv preprint arXiv:2303.00915 (2023).
  76. Multiple instance captioning: Learning representations from histopathology textbooks and articles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16549–16559 (2021).
  77. Ikezogwo, W. O. et al. Quilt-1m: One million image-text pairs for histopathology. arXiv preprint arXiv:2306.11207 (2023).
  78. Lin, W. et al. PMC-CLIP: Contrastive language-image pre-training using biomedical documents. In Lecture Notes in Computer Science, 525–536 (Springer Nature Switzerland, 2023). URL https://doi.org/10.1007/978-3-031-43993-3_51.
  79. Lu, M. Y. et al. Towards a visual-language foundation model for computational pathology. arXiv preprint arXiv:2307.12914 (2023).
  80. Lu, M. Y. et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19764–19775 (2023).
  81. Tiu, E. et al. Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. Nature Biomedical Engineering 6, 1399–1406 (2022).
  82. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3942–3951 (2021).
  83. Boecking, B. et al. Making the most of text semantics to improve biomedical vision–language processing. In European conference on computer vision, 1–21 (Springer, 2022).
  84. Does clip benefit visual question answering in the medical domain as much as it does in the general domain? arXiv preprint arXiv:2112.13906 (2021).
  85. Zhang, H. et al. Pathnarratives: Data annotation for pathological human-ai collaborative diagnosis. Frontiers in Medicine 9, 1070072 (2023).
  86. Inference of captions from histopathological patches. In International Conference on Medical Imaging with Deep Learning, 1235–1250 (PMLR, 2022).
  87. Evaluating and interpreting caption prediction for histopathology images. In Machine Learning for Healthcare Conference, 418–435 (PMLR, 2020).
  88. Vision-language transformer for interpretable pathology visual question answering. IEEE Journal of Biomedical and Health Informatics 27, 1681–1690 (2022).
  89. He, X. Towards visual question answering on pathology images. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, vol. 2 (2021).
  90. Vaswani, A. et al. Attention Is All You Need. In Neural Information Processing Systems (NeurIPS) (2017).
  91. Improving language understanding by generative pre-training (2018).
  92. Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).
  93. Brown, T. et al. Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020).
  94. Hoffmann, J. et al. An empirical analysis of compute-optimal large language model training. Advances in Neural Information Processing Systems 35, 30016–30030 (2022).
  95. Wei, J. et al. Emergent abilities of large language models. Transactions on Machine Learning Research (2022).
  96. Ouyang, L. et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, 27730–27744 (2022).
  97. Touvron, H. et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  98. Touvron, H. et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  99. Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 5485–5551 (2020).
  100. Zhao, W. X. et al. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
  101. Chowdhery, A. et al. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
  102. Anil, R. et al. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
  103. Moor, M. et al. Med-flamingo: a multimodal medical few-shot learner. arXiv preprint arXiv:2307.15189 (2023).
  104. Li, C. et al. Multimodal foundation models: From specialists to general-purpose assistants. arXiv preprint arXiv:2309.10020 (2023).
  105. Visual instruction tuning. In NeurIPS (2023).
  106. Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems 35, 23716–23736 (2022).
  107. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
  108. Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  109. Alsentzer, E. et al. Publicly available clinical bert embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (2019).
  110. Singhal, K. et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).
  111. Tu, T. et al. Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334 (2023).
  112. Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 1–6 (2023).
  113. Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA internal medicine (2023).
  114. Nori, H. et al. Can generalist foundation models outcompete special-purpose tuning? case study in medicine (2023). arXiv:2311.16452.
  115. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375 (2023).
  116. Accuracy of a vision-language model on challenging medical cases. arXiv preprint arXiv:2311.05591 (2023).
  117. Sun, Y. et al. Pathasst: Redefining pathology through generative foundation ai assistant for pathology. arXiv preprint arXiv:2305.15072 (2023).
  118. Li, C. et al. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890 (2023).
  119. Wu, C. et al. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. arXiv preprint arXiv:2310.09909 (2023).
  120. Bridging bytes and biopsies: A comparative analysis of chatgpt and histopathologists in pathology diagnosis and collaborative potential. Histopathology (2023).
  121. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
  122. Lin, T.-Y. et al. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 740–755 (Springer, 2014).
  123. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744 (2023).
  124. Chiang, W.-L. et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality (2023). URL https://lmsys.org/blog/2023-03-30-vicuna/.
  125. Zeng, Y. et al. What matters in training a gpt4-style language model with multimodal inputs? arXiv preprint arXiv:2307.02469 (2023).
  126. Jaegle, A. et al. Perceiver: General perception with iterative attention. In International conference on machine learning, 4651–4664 (PMLR, 2021).
Citations (16)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com