Self-supervised driven consistency training for annotation efficient histopathology image analysis (2102.03897v3)

Published 7 Feb 2021 in cs.CV

Abstract: Training a neural network with a large labeled dataset is still a dominant paradigm in computational histopathology. However, obtaining such exhaustive manual annotations is often expensive, laborious, and prone to inter and Intra-observer variability. While recent self-supervised and semi-supervised methods can alleviate this need by learn-ing unsupervised feature representations, they still struggle to generalize well to downstream tasks when the number of labeled instances is small. In this work, we overcome this challenge by leveraging both task-agnostic and task-specific unlabeled data based on two novel strategies: i) a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning; ii) a new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific un-labeled data. We carry out extensive validation experiments on three histopathology benchmark datasets across two classification and one regression-based tasks, i.e., tumor metastasis detection, tissue type classification, and tumor cellularity quantification. Under limited-label data, the proposed method yields tangible improvements, which is close or even outperforming other state-of-the-art self-supervised and supervised baselines. Furthermore, we empirically show that the idea of bootstrapping the self-supervised pretrained features is an effective way to improve the task-specific semi-supervised learning on standard benchmarks. Code and pretrained models will be made available at: https://github.com/srinidhiPY/SSL_CR_Histo

PDF Abstract

Self-Supervised Driven Consistency Training for Annotation Efficient Histopathology Image Analysis

The paper "Self-supervised driven consistency training for annotation efficient histopathology image analysis" addresses the challenge of limited labeled data in histopathology image analysis by introducing a novel framework that utilizes both self-supervised and semi-supervised learning paradigms. The primary motivation is to alleviate the burdensome task of acquiring extensive manual annotations, which is labor-intensive and necessitates domain expertise, particularly in histopathology. The research leverages readily available unlabeled data to enhance model performance in scenarios where labeled data is scarce.

Methodology

The proposed methodology comprises two key strategies:

Self-Supervised Learning (SSL): The work introduces a pretext task called Resolution Sequence Prediction (RSP), designed to leverage the multi-resolution nature of histology whole-slide images. This task requires the model to predict the order of sequences of image patches sampled at different resolutions, facilitating the learning of robust feature representations without requiring labeled data. The approach ensures that the learned features capture both contextual and fine-grained information inherent to the hierarchically structured histopathology images.
Semi-Supervised Consistency Training: A teacher-student framework is employed to transfer the self-supervised learned representations to downstream tasks. The methodology uses a teacher network, initialized from the SSL-pretrained model, to generate pseudo labels for unlabeled data. A student network is then trained to maintain consistency between the teacher’s predictions and labels predicted by the student under strong data augmentations. This consistency regularization helps in effectively utilizing both labeled and unlabeled data to enhance task-specific learning, particularly when the labeled dataset is limited.

Experimental Evaluation

The efficacy of the proposed method is validated on three histopathology benchmark datasets: BreastPathQ, Camelyon16, and Kather multi-class. The experiments span across regression and classification tasks including tumor cellularity quantification, tumor metastasis detection, and tissue type classification. The results highlight that the proposed methodology yields significant improvements over state-of-the-art self-supervised and supervised baselines, particularly in datasets with limited annotations.

BreastPathQ Dataset: The approach achieved an intra-class correlation coefficient (ICC) surpassing notable benchmarks, demonstrating its efficacy in quantifying tumor cellularity with minimal annotations.
Camelyon16 Dataset: The method's performance in detecting metastases at the slide level is comparable to leading fully-supervised models trained on large annotated datasets, achieving a competitive area under the curve (AUC) with considerably fewer labeled samples.
Kather Multiclass Dataset: The framework showcases strong generalization capabilities, achieving state-of-the-art accuracy in predicting tissue types across domains with different histological structures, thereby validating the transferability of the pretrained features.

Implications and Future Directions

This research presents a significant step towards reducing the dependency on large annotated datasets in computational histopathology, thereby enhancing the feasibility of deploying deep learning models in clinical settings. The novel integration of self-supervised and semi-supervised learning strategies not only enriches the feature representation but also improves the adaptability of models to new tasks with limited labeled data.

Future developments could explore further integration of contrastive learning with task-specific pretext tasks to enhance feature invariance across diverse datasets. Understanding the generalizability of such models across different types of histopathology datasets remains an avenue for continual research, potentially contributing to the development of universal feature encoders applicable to various medical image analysis challenges.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Chetan L. Srinidhi (4 papers)
Seung Wook Kim (23 papers)
Fu-Der Chen (6 papers)
Anne L. Martel (25 papers)

Citations (100)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - srinidhiPY/SSL_CR_Histo: Official code for "Self-Supervised driven Consistency Training for Annotation Efficient Histopathology Image Analysis" Published in Medical Image Analysis (MedIA) Journal, Oct, 2021. (62 stars)