DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification (2203.12081v1)

Published 22 Mar 2022 in cs.CV, cs.AI, and cs.LG

Abstract: Multiple instance learning (MIL) has been increasingly used in the classification of histopathology whole slide images (WSIs). However, MIL approaches for this specific classification problem still face unique challenges, particularly those related to small sample cohorts. In these, there are limited number of WSI slides (bags), while the resolution of a single WSI is huge, which leads to a large number of patches (instances) cropped from this slide. To address this issue, we propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags, on which a double-tier MIL framework is built to effectively use the intrinsic features. Besides, we also contribute to deriving the instance probability under the framework of attention-based MIL, and utilize the derivation to help construct and analyze the proposed framework. The proposed method outperforms other latest methods on the CAMELYON-16 by substantially large margins, and is also better in performance on the TCGA lung cancer dataset. The proposed framework is ready to be extended for wider MIL applications. The code is available at: https://github.com/hrzhang1123/DTFD-MIL

PDF Abstract

Overview of "DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification"

The paper presents a novel framework called Double-Tier Feature Distillation Multiple Instance Learning (DTFD-MIL) aimed at addressing challenges in classifying histopathology whole slide images (WSIs) using multiple instance learning (MIL). Histopathological analysis using WSIs involves handling images with enormous sizes, leading to practical difficulties for direct application of standard machine learning techniques developed for smaller images. The proposed DTFD-MIL framework offers solutions to the limitations posed by MIL in histopathology, particularly in scenarios with limited WSI counts and subsequently overabundant instance counts per slide.

Key Contributions and Methodology

Introduction of Pseudo-Bags: To alleviate the limited number of slides available for training, the paper suggests the novel concept of generating 'pseudo-bags'. Here, instances from a slide are divided into pseudo-bags, each assigned the original slide label. This approach boosts the training data size virtually without needing additional actual slides.
Double-Tier MIL Framework: The proposed method incorporates a two-tier MIL model. The first tier employs an attention-based MIL (AB-MIL) model to process pseudo-bags independently. Features distilled from pseudo-bags serve at the second tier, which refines the representation by using another AB-MIL model built upon distilled features of the pseudo-bags, ultimately improving the model's performance in slide classification tasks.
Instance Probability Derivation: The paper contributes significantly by deriving instance probabilities within the AB-MIL context. This derivation addresses previous assumptions that individual instance probabilities could not be extracted straightforwardly from AB-MIL frameworks. The authors leverage the Grad-CAM mechanism to achieve this derivation, enhancing the capability to identify potential positive instances in slides by focusing on critical instance-level insights.
Comprehensive Evaluation: The proposed framework is rigorously evaluated on large datasets, notably the CAMELYON-16 and the TCGA Lung Cancer dataset. DTFD-MIL demonstrated superior performance over existing state-of-the-art methods, illustrating its effectiveness in overcoming the inherent data sparsity challenges faced by traditional MIL techniques.

Results and Implications

Performance Results: The method outperformed several existing state-of-the-art models significantly on the CAMELYON-16 dataset in terms of AUC, accuracy, and F1 scores. On the TCGA Lung Cancer dataset, the performance improvements were evident but less pronounced, given the dataset's relatively larger tumor regions, which reduce the challenge.
Technical Implications: The derivation of instance probabilities within AB-MIL provides a pathway for more nuanced feature representation, which can directly enhance model prediction accuracy and interpretability in histopathology. This allows better handling of the inherent noise and potential overfitting caused by imbalance in instance distributions.
Potential for Wider Applications: The generalizability of the proposed DTFD-MIL framework is highlighted, suggesting its applicability across other domains of MIL beyond histopathology, offering promising directions for research where data limitations and instance-level confusion prevail.

Conclusion

The development of DTFD-MIL represents a substantial advancement in MIL techniques applied to WSIs. The introduction of pseudo-bags, coupled with a robust dual-tiered MIL approach, offers a practical solution to the challenges of large-scale, yet limited-sample datasets prevalent in histopathology. Furthermore, the novel derivation of instance probabilities enriches the toolkit available to researchers and practitioners by enabling more granular analysis of instance contributions to bag-level outcomes. As AI models continue pushing the boundaries of histopathological and medical imaging research, frameworks like DTFD-MIL pave the way for more accurate, reliable, and interpretable machine learning models in medical diagnostics.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Hongrun Zhang (9 papers)
Yanda Meng (18 papers)
Yitian Zhao (34 papers)
Yihong Qiao (3 papers)
Xiaoyun Yang (21 papers)
Yalin Zheng (22 papers)
Sarah E. Coupland (1 paper)

Citations (229)

View on Semantic Scholar

DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification (2203.12081v1)

Overview of "DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification"

Key Contributions and Methodology

Results and Implications

Conclusion

Related Papers