Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology (2405.11643v1)
Abstract: Representation learning of pathology whole-slide images (WSIs) has been has primarily relied on weak supervision with Multiple Instance Learning (MIL). However, the slide representations resulting from this approach are highly tailored to specific clinical tasks, which limits their expressivity and generalization, particularly in scenarios with limited data. Instead, we hypothesize that morphological redundancy in tissue can be leveraged to build a task-agnostic slide representation in an unsupervised fashion. To this end, we introduce PANTHER, a prototype-based approach rooted in the Gaussian mixture model that summarizes the set of WSI patches into a much smaller set of morphological prototypes. Specifically, each patch is assumed to have been generated from a mixture distribution, where each mixture component represents a morphological exemplar. Utilizing the estimated mixture parameters, we then construct a compact slide representation that can be readily used for a wide range of downstream tasks. By performing an extensive evaluation of PANTHER on subtyping and survival tasks using 13 datasets, we show that 1) PANTHER outperforms or is on par with supervised MIL baselines and 2) the analysis of morphological prototypes brings new qualitative and quantitative insights into model interpretability.
- Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nature Biomedical Engineering, pages 1–24, 2023.
- Detecting and visualizing cell phenotype differences from microscopy images using transport-based morphometry. Proceedings of the National Academy of Sciences, 111(9):3448–3453, 2014.
- Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Science Translational Medicine, 3(108):108ra113–108ra113, 2011.
- Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA, 318(22):2199–2210, 2017.
- Das-mil: Distilling across scales for mil classification of histological wsis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 248–258. Springer, 2023.
- TNM classification of malignant tumours. John Wiley & Sons, 2017.
- On the representation power of set pooling networks. Advances in Neural Information Processing Systems, 34:17170–17182, 2021.
- Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge. Nature Medicine, 28(1):154–163, 2022.
- Automated deep-learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. The Lancet Oncology, 21(2):233–241, 2020.
- Learning single-cell perturbation responses using neural optimal transport. Nature methods, pages 1–10, 2023.
- Deep learning based tissue analysis predicts outcome in colorectal cancer. Scientific Reports, 8(1), 2018.
- Histopathology image classification using bag of features and kernel functions. In Artificial Intelligence in Medicine: 12th Conference on Artificial Intelligence in Medicine, AIME 2009, Verona, Italy, July 18-22, 2009. Proceedings 12, pages 126–135. Springer, 2009.
- Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine, 25(8):1301–1309, 2019.
- Incorporating intratumoral heterogeneity into weakly-supervised deep learning models via variance pooling. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 387–397. Springer, 2022.
- Histosegnet: Semantic segmentation of histological tissue type in whole slide images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10662–10671, 2019.
- This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32, 2019.
- Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16144–16155, 2022.
- Towards a general-purpose foundation model for computational pathology. Nature Medicine, 2024.
- Image categorization by learning and reasoning with regions. The Journal of Machine Learning Research, 5:913–939, 2004.
- Towards understanding the mixture-of-experts layer in deep learning. Advances in neural information processing systems, 35:23049–23062, 2022.
- Self supervised contrastive learning for digital histopathology. Machine Learning with Applications, 7:100198, 2022.
- Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature medicine, 24(10):1559–1567, 2018.
- Visual pattern mining in histology image collections using bag of features. Artificial intelligence in medicine, 52(2):91–106, 2011.
- Retrieval-augmented multiple instance learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- Learning prototype-oriented set representations for meta-learning. In International Conference on Learning Representations, 2022.
- Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- A multiple instance learning approach toward optimal classification of pathology slides. In 2010 20th International Conference on Pattern Recognition, pages 2732–2735. IEEE, 2010.
- Domain adaptation using optimal transport for invariant learning using histopathology datasets. In Medical Imaging with Deep Learning, 2023.
- Scaling self-supervised learning for histopathology with masked image modeling. medRxiv, pages 2023–07, 2023.
- Burden and centralised treatment in europe of rare tumours: results of rarecarenet—a population-based study. The Lancet Oncology, 18(8):1022–1039, 2017.
- Multi-scale domain-adversarial multiple-instance cnn for cancer subtype classification with unannotated histopathological images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3852–3861, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Mapping spatial heterogeneity in the tumor microenvironment: a new era for digital pathology. Laboratory investigation, 95(4):377–384, 2015.
- Patch-based convolutional neural network for whole slide tissue image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2424–2433, 2016.
- H^ 2-mil: exploring hierarchical representation with heterogeneous multiple instance learning for whole slide image analysis. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 933–941, 2022.
- The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nature communications, 12(1):4423, 2021.
- A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine, 29(9):2307–2316, 2023.
- Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, 2018.
- Transcriptomics-guided slide representation learning in computational pathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Modeling dense multimodal interactions between biological pathways and histology for survival prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
- Additive mil: intrinsically interpretable multiple instance learning for pathology. Advances in Neural Information Processing Systems, 35:20689–20702, 2022.
- Hierarchical discriminative learning improves visual representations of biomedical microscopy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19798–19808, 2023.
- Yottixel–an image search engine for large archives of histopathology whole slide images. Medical Image Analysis, 65:101757, 2020.
- Benchmarking self-supervised learning on diverse pathology datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3344–3354, 2023.
- Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Medicine, 16(1), 2019.
- Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC medical research methodology, 18(1):1–12, 2018.
- Minyoung Kim. Differentiable expectation-maximization for set representation learning. In International Conference on Learning Representations, 2022.
- Optimal mass transport: Signal processing and machine-learning applications. IEEE Signal Processing Magazine, 34(4):43–59, 2017.
- Universal encoding of pan-cancer histology by deep texture representations. Cell Reports, 38(9), 2022.
- Giga-ssl: Self-supervised learning for gigapixel images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4304–4313, 2023.
- A sparse texture representation using local affine regions. IEEE transactions on pattern analysis and machine intelligence, 27(8):1265–1278, 2005.
- Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning, pages 3744–3753. PMLR, 2019.
- Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14318–14328, 2021.
- Interventional bag multi-instance learning on whole-slide pathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19830–19839, 2023.
- An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell, 173(2):400–416, 2018.
- A visual-language foundation model for computational pathology. Nature Medicine, pages 1–12, 2024.
- Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19764–19775, 2023.
- Data-efficient and weakly supervised computational pathology on whole-slide images. Nature biomedical engineering, 5(6):555–570, 2021.
- Capturing cellular topology in multi-gigapixel pathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 260–261, 2020.
- Learning to rank for censored survival data. arXiv preprint arXiv:1806.01984, 2018.
- Intratumor heterogeneity: the rosetta stone of therapy resistance. Cancer cell, 37(4):471–484, 2020.
- A trainable optimal transport embedding for feature aggregation and its relationship to attention. In International Conference on Learning Representations, 2021.
- Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, 115(13):E2970–E2979, 2018.
- DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research, 2024.
- Human-machine interactive tissue prototype learning for label-efficient histopathology image segmentation. In International Conference on Information Processing in Medical Imaging, pages 679–691. Springer, 2023.
- VoLTA: Vision-language transformer with weakly-supervised local-feature alignment. Transactions on Machine Learning Research, 2023.
- Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unlabeled, unannotated pathology slides, 2023.
- The digital brain tumour atlas, an open histopathology resource. Scientific Data, 9(1):55, 2022.
- Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell reports, 23(1):181–193, 2018.
- Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019.
- Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Advances in Neural Information Processing Systems, 34:2136–2147, 2021.
- Sivic and Zisserman. Video google: A text retrieval approach to object matching in videos. In Proceedings ninth IEEE international conference on computer vision, pages 1470–1477. IEEE, 2003.
- Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. The Lancet, 395(10221):350–360, 2020.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
- Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering, 1(12):930–949, 2023.
- Multiple instance learning framework with masked hard instance mining for whole slide image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4078–4087, 2023.
- Differentiable zooming for multiple instance learning on whole-slide images. In European Conference on Computer Vision, pages 699–715. Springer, 2022.
- Intratumoral heterogeneity in cancer progression and response to immunotherapy. Nature Medicine, 27(2):212–224, 2021.
- Handcrafted histological transformer (h2t): Unsupervised representation of whole slide images. Medical Image Analysis, 85:102743, 2023.
- Histopathological image classification using discriminative feature-oriented dictionary learning. IEEE transactions on medical imaging, 35(3):738–751, 2015.
- Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis, 81:102559, 2022.
- Exploring low-rank property in multiple instance learning for whole slide image classification. In The Eleventh International Conference on Learning Representations, 2023.
- Multimodal optimal transport-based co-attention transformer with global structure consistency for survival prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21241–21251, 2023.
- Multiple clustered instance learning for histopathology cancer image classification, segmentation and clustering. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 964–971. IEEE, 2012.
- TPMIL: Trainable prototype enhanced multiple instance learning for whole slide image classification. In Medical Imaging with Deep Learning, 2023.
- Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Medical Image Analysis, 65:101789, 2020.
- PTaRL: Prototype-based tabular representation learning via space calibration. In The Twelfth International Conference on Learning Representations, 2024.
- Hierarchical optimal transport for comparing histopathology datasets. In Medical Imaging with Deep Learning, 2022.
- Prototypical multiple instance learning for predicting lymph node metastasis of breast cancer from whole-slide pathological images. Medical Image Analysis, 85:102748, 2023.
- Bias in cross-entropy-based training of deep survival networks. IEEE transactions on pattern analysis and machine intelligence, 43(9):3126–3137, 2020.
- Deep sets. Advances in neural information processing systems, 30, 2017.
- Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18802–18812, 2022.
- Local features and kernels for classification of texture and object categories: A comprehensive study. International journal of computer vision, 73:213–238, 2007.
- Wsisa: Making survival prediction from whole slide histopathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7234–7242, 2017.