Self supervised contrastive learning for digital histopathology (2011.13971v2)

Published 27 Nov 2020 in eess.IV and cs.CV

Abstract: Unsupervised learning has been a long-standing goal of machine learning and is especially important for medical image analysis, where the learning can compensate for the scarcity of labeled datasets. A promising subclass of unsupervised learning is self-supervised learning, which aims to learn salient features using the raw input as the learning signal. In this paper, we use a contrastive self-supervised learning method called SimCLR that achieved state-of-the-art results on natural-scene images and apply this method to digital histopathology by collecting and pretraining on 57 histopathology datasets without any labels. We find that combining multiple multi-organ datasets with different types of staining and resolution properties improves the quality of the learned features. Furthermore, we find using more images for pretraining leads to a better performance in multiple downstream tasks. Linear classifiers trained on top of the learned features show that networks pretrained on digital histopathology datasets perform better than ImageNet pretrained networks, boosting task performances by more than 28% in F1 scores on average. These findings may also be useful when applying newer contrastive techniques to histopathology data. Pretrained PyTorch models are made publicly available at https://github.com/ozanciga/self-supervised-histopathology.

PDF Abstract

Self-Supervised Contrastive Learning for Digital Histopathology

The paper "Self supervised contrastive learning for digital histopathology" presents a compelling exploration of self-supervised learning methods in the field of medical image analysis, particularly focusing on histopathology. This work leverages the SimCLR contrastive learning framework, previously noted for its success on natural-scene images, and applies it to a vast collection of digital histopathology datasets. The aim is to address label scarcity in medical imaging, a crucial bottleneck in deploying effective machine learning solutions in clinical settings.

Methodological Framework

The investigation is anchored around self-supervised contrastive learning via SimCLR, which contrasts data representations through data augmentation techniques. This approach does not require a memory bank or specialized architectural modifications, relying instead on large batch sizes to provide sufficient negative examples. The effectiveness of this method is corroborated by extensive experiments across 57 histopathology datasets, which vary in staining, resolution, and tissue type.

Empirical Evaluation and Results

A significant accomplishment of this exploration is the demonstration that pretraining on a heterogeneous dataset spanning various tissues and staining methodologies enhances the generalizability of learned representations. Notably, the self-supervised pretrained networks outperformed those initialized on ImageNet, achieving significant improvements in downstream tasks evaluated by an average increase of more than 28% in $F_1$ scores across classification tasks. The results are robust across classification, regression, and segmentation tasks, illustrating the versatility of the pretrained models.

Furthermore, experiments indicate the positive correlation between the quantity of pretraining data and task performance, marking a crucial factor in the architecture's efficacy. Exploring the impact of image resolution and tissue-specific pretraining further underscores the strength of substantial visual diversity in training data, showcasing the benefits of comprehensive multi-resolution datasets.

Discussion and Implications

The findings contribute critical insights into the utility of self-supervised learning within the field of digital pathology. By eschewing reliance on annotated data, this research opens pathways for broader applications in histopathological analysis without requiring extensive expert labeling. The paper also reveals key implications for future AI development in medical imaging, highlighting the importance of dataset diversity and augmentation strategies akin to natural scene image methodologies.

Further exploration into sophisticated augmentations and the refinement of contrastive objectives could propel the performance of self-supervised models in this domain. The paper heralds a promising avenue for future research, suggesting that such methodological innovations could significantly reduce the manual labor associated with medical image annotation, thus streamlining the pathway from experimental research to real-world clinical applications.

In summary, this paper eloquently bridges self-supervised learning from traditional computer vision to the highly specialized field of digital pathology, with empirical validations and methodological rigor that equip researchers with a scalable, annotation-light framework for advancing histopathological imaging technology.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Ozan Ciga (6 papers)
Tony Xu (9 papers)
Anne L. Martel (25 papers)

Citations (275)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - ozanciga/self-supervised-histopathology: Pretrained model for self supervised histopathology (99 stars)