Cluster-to-Conquer: A Framework for End-to-End Multi-Instance Learning for Whole Slide Image Classification

Published 19 Mar 2021 in eess.IV, cs.CV, and cs.LG | (2103.10626v2)

Abstract: In recent years, the availability of digitized Whole Slide Images (WSIs) has enabled the use of deep learning-based computer vision techniques for automated disease diagnosis. However, WSIs present unique computational and algorithmic challenges. WSIs are gigapixel-sized ($\sim$100K pixels), making them infeasible to be used directly for training deep neural networks. Also, often only slide-level labels are available for training as detailed annotations are tedious and can be time-consuming for experts. Approaches using multiple-instance learning (MIL) frameworks have been shown to overcome these challenges. Current state-of-the-art approaches divide the learning framework into two decoupled parts: a convolutional neural network (CNN) for encoding the patches followed by an independent aggregation approach for slide-level prediction. In this approach, the aggregation step has no bearing on the representations learned by the CNN encoder. We have proposed an end-to-end framework that clusters the patches from a WSI into ${k}$-groups, samples ${k}'$ patches from each group for training, and uses an adaptive attention mechanism for slide level prediction; Cluster-to-Conquer (C2C). We have demonstrated that dividing a WSI into clusters can improve the model training by exposing it to diverse discriminative features extracted from the patches. We regularized the clustering mechanism by introducing a KL-divergence loss between the attention weights of patches in a cluster and the uniform distribution. The framework is optimized end-to-end on slide-level cross-entropy, patch-level cross-entropy, and KL-divergence loss (Implementation: https://github.com/YashSharma/C2C).

Abstract PDF Upgrade to Chat

Citations (121)

View on Semantic Scholar

Summary

The paper introduces the Cluster-to-Conquer framework that unifies patch clustering and adaptive attention for enhanced whole slide image classification.
It employs clustering-based sampling and attention-based pooling with KL-divergence to capture diverse, discriminative features under weak supervision.
Empirical results show that C2C achieves competitive performance on gastrointestinal and breast cancer datasets, highlighting its clinical potential.

Cluster-to-Conquer: A Novel Framework for Whole Slide Image Classification

The paper introduces an innovative framework, "Cluster-to-Conquer (C2C)", to address the challenges associated with Whole Slide Image (WSI) classification, specifically targeting histopathology-based disease diagnosis through automated systems. A major hindrance in deploying deep learning models for WSIs lies in their gigapixel scale, which necessitates substantial computation and storage resources. Furthermore, WSIs often come with slide-level rather than detailed pixel or patch-level labels, complicating model training.

The authors propose C2C, a model designed to optimize the WSI classification process end-to-end by efficiently leveraging multiple-instance learning (MIL) principles. Unlike traditional two-stage MIL processes that separate patch encoding from aggregation, C2C integrates these stages, enhancing the coherence of feature learning and aggregation.

The core contributions of the C2C framework are as follows:

Clustering-Based Sampling for Patch Diversity: By clustering patches within individual WSIs, the model ensures that a diverse set of image patches is sampled for model training. This clustering strategy promotes exposure to varied discriminative features that might be overlooked when using fixed or random sampling strategies.
Adaptive Attention for Aggregation: Employing attention-based pooling methods, C2C aggregates patch representations, focusing on significant regions of the WSI that are crucial for accurate predictions. This mechanism assigns varied importance to different patches, driven by the adaptive attention module, refining the decision-making process at the slide level.
Regularization via KL-Divergence: The inclusion of KL-divergence in the loss function acts as a regularizer that aligns attention weights of patches within the same cluster to a uniform distribution. This prevents overfitting to singular patches and maintains a balanced representation of all discriminative regions within clusters.

Methodological Insights

The methodological framework leverages a convolutional neural network encoder based on ResNet architecture to generate feature representations for each patch. Employing K-means clustering, patches are segregated into k subsets, from which training patches are sampled. The attention mechanism then computes the significance of each patch representation, and the aggregated slide representation is used for final slide-level classification through a cross-entropy-based optimization.

An additional benefit outlined in the study is the strategic inclusion of weak supervision through the patch-level classifier. This approach compensates for the lack of explicitly annotated data by assuming that negatively labeled slides contain only non-disease patches, whereas positively labeled slides have at least one diseased patch.

Empirical Evaluation

C2C was evaluated on multiple datasets, including gastrointestinal biopsy slides tailored for Celiac disease diagnostics and the CAMELYON16 dataset for breast cancer metastasis. Impressively, in the gastrointestinal dataset, C2C achieved and exceeded the performance metrics of state-of-the-art methods such as Campanella-MIL, which traditionally require detailed annotations. In the CAMELYON16 challenge, C2C achieved a competitive ROC-AUC performance, demonstrating its robustness against fully-supervised approaches despite being trained under weak supervision conditions.

The qualitative review by medical experts indicated that high-attention clusters identified by the C2C model aligned well with pathologically significant regions, corroborating the model's capability to discern clinically relevant features autonomously.

Implications and Future Prospects

C2C's innovative approach to handling the enormous scale and label sparsity of WSIs represents a substantial contribution to computational pathology. By integrating clustering-based diversity and attention mechanisms within a cohesive model architecture, C2C signifies a shift towards more applicable AI systems in medical image analysis.

Anticipated future developments may involve extending the framework to multiclass and subtype classification problems, optimizing computational efficiencies, and exploring the framework's applicability across varied disease pathologies. Each of these prospective endeavours underscores the framework's potential to enhance diagnostic precision and reliability in clinical settings.

In summary, "Cluster-to-Conquer" provides a compelling solution to WSI classification challenges, advancing automated histopathology and presenting broader possibilities for integration within the clinical diagnostic process.

Markdown