Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning (2011.08939v3)

Published 17 Nov 2020 in cs.CV and cs.LG

Abstract: We address the challenging problem of whole slide image (WSI) classification. WSIs have very high resolutions and usually lack localized annotations. WSI classification can be cast as a multiple instance learning (MIL) problem when only slide-level labels are available. We propose a MIL-based method for WSI classification and tumor detection that does not require localized annotations. Our method has three major components. First, we introduce a novel MIL aggregator that models the relations of the instances in a dual-stream architecture with trainable distance measurement. Second, since WSIs can produce large or unbalanced bags that hinder the training of MIL models, we propose to use self-supervised contrastive learning to extract good representations for MIL and alleviate the issue of prohibitive memory cost for large bags. Third, we adopt a pyramidal fusion mechanism for multiscale WSI features, and further improve the accuracy of classification and localization. Our model is evaluated on two representative WSI datasets. The classification accuracy of our model compares favorably to fully-supervised methods, with less than 2% accuracy gap across datasets. Our results also outperform all previous MIL-based methods. Additional benchmark results on standard MIL datasets further demonstrate the superior performance of our MIL aggregator on general MIL problems. GitHub repository: https://github.com/binli123/dsmil-wsi

Authors (3)

Bin Li (514 papers)
Yin Li (150 papers)
Kevin W. Eliceiri (10 papers)

Citations (503)

View on Semantic Scholar

Summary

The paper introduces a novel dual-stream MIL network that improves whole slide image classification and tumor localization using a trainable distance metric.
It leverages self-supervised contrastive learning to produce robust patch representations, mitigating challenges from unbalanced instance bags.
Experimental results on Camelyon16 and TCGA datasets show near fully-supervised accuracy and enhanced localization compared to traditional MIL methods.

Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification

The paper presents a novel methodology addressing the challenge of classifying Whole Slide Images (WSIs) using a Multiple Instance Learning (MIL) approach enhanced by self-supervised contrastive learning. WSIs are characterized by high resolution and lack of localized annotations, which complicates their use in automated disease detection. This research introduces a new MIL framework, termed the Dual-stream Multiple Instance Learning Network (DSMIL), demonstrating robust classification and tumor localization capabilities without the need for detailed annotations.

Methodological Innovations

The research proposes a threefold approach. Firstly, the DSMIL employs a unique MIL aggregator that operates within a dual-stream architecture. It incorporates a trainable distance measurement to evaluate relations between instances, advancing beyond traditional attention-based MIL models. This dual-stream setup enables DSMIL to identify critical instances through instance-level classification, addressing the inadequacies of simple aggregation methods like max-pooling.

Secondly, the authors implement self-supervised contrastive learning to enhance feature extraction. This approach helps mitigate the issues posed by large or unbalanced instance bags within WSIs by honing robust patch representations, thereby alleviating memory constraints typically associated with MIL training.

Thirdly, DSMIL utilizes a pyramidal fusion of multiscale features. This technique facilitates the integration of tissue characteristics across scales, from cellular to macroscopic, therefore boosting classification accuracy and the precision of tumor localization.

Experimental Results

DSMIL was evaluated on two prominent WSI datasets: Camelyon16 and TCGA lung cancer. The results show that the DSMIL model outperformed traditional MIL methods, with classification accuracies that closely approximate those achieved by fully-supervised techniques, maintaining an accuracy gap of less than 2%. On the Camelyon16 dataset, DSMIL achieved notable improvements with a localization accuracy surpassing previous MIL models, evidenced by a higher Free Response Operating Characteristic (FROC) score. In experiments involving the TCGA dataset, DSMIL maintained superior accuracy in distinguishing between lung cancer subtypes compared to traditional methodologies.

Implications and Future Developments

The DSMIL framework presents a potent tool for weakly supervised WSI classification, indicating promising applications in computational pathology. By reducing the dependency on localized annotations, DSMIL aligns with the practical needs of clinical environments where exhaustive annotations are infeasible. Its integration of self-supervised learning marks a significant stride towards more efficient training of MIL models, offering a scalable solution adaptable to other imaging domains.

Future research directions could explore the refinement of self-supervised strategies tailored to the nuances of histopathological images. Additionally, incorporating spatial relationship modeling could further enhance macroscale feature capture, potentially yielding even higher classification and localization precision. DSMIL exemplifies a significant advancement in leveraging weak supervision and multi-scale analysis within the scope of automated pathology, paving the way for its broader adoption in medical image analytics.

PDF Markdown

Related Papers

GitHub

GitHub - binli123/dsmil-wsi: DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image (427 stars)