Transcriptomics-guided Slide Representation Learning in Computational Pathology (2405.11618v1)
Abstract: Self-supervised learning (SSL) has been successful in building patch embeddings of small histology images (e.g., 224x224 pixels), but scaling these models to learn slide embeddings from the entirety of giga-pixel whole-slide images (WSIs) remains challenging. Here, we leverage complementary information from gene expression profiles to guide slide representation learning using multimodal pre-training. Expression profiles constitute highly detailed molecular descriptions of a tissue that we hypothesize offer a strong task-agnostic training signal for learning slide embeddings. Our slide and expression (S+E) pre-training strategy, called Tangle, employs modality-specific encoders, the outputs of which are aligned via contrastive learning. Tangle was pre-trained on samples from three different organs: liver (n=6,597 S+E pairs), breast (n=1,020), and lung (n=1,012) from two different species (Homo sapiens and Rattus norvegicus). Across three independent test datasets consisting of 1,265 breast WSIs, 1,946 lung WSIs, and 4,584 liver WSIs, Tangle shows significantly better few-shot performance compared to supervised and SSL baselines. When assessed using prototype-based classification and slide retrieval, Tangle also shows a substantial performance improvement over all baselines. Code available at https://github.com/mahmoodlab/TANGLE.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Developments in toxicogenomics: understanding and predicting compound-induced toxicity from gene expression data. Mol. Omics, 14:218–236, 2018.
- Joint analysis of expression levels and histological images identifies genes associated with tissue morphology. Nature Communications, 12, 2021.
- Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nature Biomedical Engineering, pages 1–24, 2023.
- Artificial Intelligence-Assisted image analysis of Acetaminophen-Induced acute hepatic injury in Sprague-Dawley rats. Diagnostics (Basel), 12(6), 2022.
- Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature medicine, 8:816–24, 2002.
- Emerging properties in self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9630–9640, 2021.
- Histopathology whole slide image analysis with heterogeneous graph representation learning. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Transactions on Medical Imaging, 41(4):757–770, 2020.
- Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022a.
- Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell, 40(8):865–878, 2022b.
- Towards a general-purpose foundation model for computational pathology. Nature Medicine, 2024.
- Self supervised contrastive learning for digital histopathology. Machine Learning with Applications, 7, 2022.
- Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature Medicine, 24(10):1559–1567, 2018.
- Bayes-MIL: A new probabilistic perspective on attention-based multiple instance learning for whole slide images. In The Eleventh International Conference on Learning Representations, 2023.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2018.
- Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, 89(1-2):31–71, 1997.
- Pathology-and-genomics multimodal transformer for survival outcome prediction. In International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pages 622–631, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Scaling self-supervised learning for histopathology with masked image modeling. medRxiv, 2023.
- Multiple instance captioning: Learning representations from histopathology textbooks and articles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16549–16559, 2021.
- Rankme: Assessing the downstream performance of pretrained self-supervised representations by their rank. In International conference on machine learning, 2022.
- Visualizing and interpreting cancer genomics data via the xena platform. Nature biotechnology, 38(6):675–678, 2020.
- Inter-observer variability between general pathologists and a specialist in breast pathology in the diagnosis of lobular neoplasia, columnar cell lesions, atypical ductal hyperplasia and ductal carcinoma in situ of the breast. Diagnostic pathology, 9:121, 2014.
- Mmo-net (multi-magnification organ network): A use case for organ identification using multiple magnifications in preclinical pathology studies. Journal of Pathology Informatics, 13:100126, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
- Histonet: A deep learning-based model of normal histology. Toxicologic Pathology, 49(4):784–797, 2021. PMID: 33653171.
- A visual–language foundation model for pathology image analysis using medical twitter. Nature Medicine, 29:1–10, 2023.
- A comparative study on the implementation of deep learning algorithms for detection of hepatic necrosis in toxicity studies. Toxicological Research, 39(3):399–408, 2023.
- Open TG-GATEs: a large-scale toxicogenomics database. Nucleic Acids Research, 43(D1):D921–D927, 2014.
- Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, 2018.
- Modeling dense multimodal interactions between biological pathways and histology for survival prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Additive MIL: Intrinsically interpretable multiple instance learning for pathology. In Advances in Neural Information Processing Systems, 2022.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
- Masked pre-training of transformers for histology image analysis. arXiv preprint arXiv:2304.07434, 2023.
- Gene-induced multimodal pre-training for image-omic classification. In International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2023.
- Benchmarking self-supervised learning on diverse pathology datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023a.
- Benchmarking self-supervised learning on diverse pathology datasets. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3344–3354, 2023b.
- Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nature Medicine, 25(7):1054–1056, 2019.
- Pan-cancer image-based detection of clinically actionable genetic alterations. Nature cancer, 1(8):789–799, 2020.
- Self-path: Self-supervision for classification of pathology images with limited annotations. IEEE Transactions on Medical Imaging, 2021.
- Giga-ssl: Self-supervised learning for gigapixel images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4304–4313, 2023.
- Derivation of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning. Nat. Biomed. Eng, 2022.
- Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14318–14328, 2021a.
- Task-specific fine-tuning via variational information bottleneck for weakly-supervised pathology whole slide image classification. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023a.
- Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021b.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023b.
- HFBSurv: hierarchical multimodal fusion with factorized bilinear models for cancer survival prediction. Bioinformatics, 38(9):2587–2594, 2022.
- Scaling language-image pre-training via masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23390–23400, 2023c.
- Interventional multi-instance learning with deconfounded instance-level prediction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Pmc-clip: Contrastive language-image pre-training using biomedical documents. In International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
- Towards a visual-language foundation model for computational pathology. Nature Medicine, 2024.
- Data-efficient and weakly supervised computational pathology on whole-slide images. Nature biomedical engineering, 5(6):555–570, 2021.
- Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19764–19775, 2023.
- Deep learning approaches and applications in toxicologic histopathology: Current status and future perspectives. Journal of Pathology Informatics, 12(1):42, 2021.
- Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, 115(13):E2970–E2979, 2018.
- Convolutional neural network models for cancer type prediction based on gene expression. BMC Med Genomics, 2019.
- Sampler: unsupervised representations for rapid analysis of whole slide tissue images. eBioMedicine, 99:104908, 2024.
- Accounting for dependencies in deep learning based multiple instance learning for whole slide imaging. In International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pages 329–338, 2021.
- Co-pilot: Dynamic top-down point cloud with conditional neighborhood aggregation for multi-gigapixel histopathology image representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21063–21073, 2023.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Bi-directional weakly supervised knowledge distillation for whole slide image classification. In Advances in Neural Information Processing Systems, 2022.
- Boosting whole slide image classification from the perspectives of distribution, correlation and magnification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21463–21473, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Breast cancer histopathology image-based gene expression prediction using spatial transcriptomics data and deep learning. Sci. Rep., 13(13604):1–11, 2023.
- A deep learning model to predict rna-seq expression of tumours from whole slide images. Nature Communications, 11, 2020.
- Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Advances in Neural Information Processing Systems, 34:2136–2147, 2021.
- Lnpl-mil: Learning from noisy pseudo labels for promoting multiple instance learning in whole slide image. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21495–21505, 2023.
- Deep learning-based image-analysis algorithm for classification and quantification of multiple histopathological lesions in rat liver. Journal of Toxicologic Pathology, 35(2):135–147, 2022.
- Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nature Cancer, 3(9):1026–1038, 2022.
- Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15638–15650, 2022.
- Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering, 2023.
- Morphological prototyping for unsupervised slide representation learning in computational pathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Multiple instance learning framework with masked hard instance mining for whole slide image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Contrastive multiple instance learning: An unsupervised framework for learning slide-level representations of whole slide histopathology images without labels. Cancers, 14:5778, 2022.
- Dual-curriculum contrastive multi-instance learning for cancer prognosis analysis with whole slide images. In Advances in Neural Information Processing Systems, pages 29484–29497. Curran Associates, Inc., 2022.
- Attention Is All You Need. In Neural Information Processing Systems (NeurIPS), 2017.
- Virchow: A million-slide digital pathology foundation model, 2023.
- Image as a foreign language: Beit pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19175–19186, 2023.
- Transpath: Transformer-based self-supervised learning for histopathological image classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 186–195. Springer, 2021a.
- SCL-WC: Cross-slide contrastive learning for weakly-supervised whole-slide image classification. In Advances in Neural Information Processing Systems, 2022a.
- Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis, 81:102559, 2022b.
- GPDBN: deep bilinear network integrating both genomic data and pathological images for breast cancer prognosis prediction. Bioinformatics, 37(18):2963–2970, 2021b.
- Medclip: Contrastive learning from unpaired medical images and text. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3876–3887, 2022c.
- Exploring low-rank property in multiple instance learning for whole slide image classification. In The Eleventh International Conference on Learning Representations, 2023.
- Spatially resolved gene expression prediction from h&e histology images via bi-modal contrastive learning. In NeurIPS, 2023.
- Simmim: a simple framework for masked image modeling. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9643–9653, 2022.
- Multimodal optimal transport-based co-attention transformer with global structure consistency for survival prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Concl: Concept contrastive learning for dense prediction pre-training in pathology images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 523–539, 2022.
- Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Medical Image Analysis, 65, 2020.
- Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022.
- Slpd: Slide-level prototypical distillation for wsis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 259–269. Springer, 2023.
- Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18780–18790, 2022.
- Cross-modal translation and alignment for survival analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21485–21494, 2023.
- Image BERT pre-training with online tokenizer. In International Conference on Learning Representations, 2022.