Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PathAlign: A vision-language model for whole slide images in histopathology (2406.19578v1)

Published 27 Jun 2024 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-LLMing raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-LLM based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen LLM for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (17)
  1. Faruk Ahmed (17 papers)
  2. Andrew Sellergren (8 papers)
  3. Lin Yang (212 papers)
  4. Shawn Xu (6 papers)
  5. Boris Babenko (9 papers)
  6. Abbi Ward (3 papers)
  7. Niels Olson (2 papers)
  8. Arash Mohtashamian (2 papers)
  9. Yossi Matias (61 papers)
  10. Quang Duong (6 papers)
  11. Shravya Shetty (21 papers)
  12. Daniel Golden (9 papers)
  13. Yun Liu (213 papers)
  14. David F. Steiner (7 papers)
  15. Ellery Wulczyn (14 papers)
  16. Greg S. Corrado (37 papers)
  17. Dale R. Webster (20 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.