Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology (2402.17228v4)

Published 27 Feb 2024 in cs.CV

Abstract: Multiple instance learning (MIL) is the most widely used framework in computational pathology, encompassing sub-typing, diagnosis, prognosis, and more. However, the existing MIL paradigm typically requires an offline instance feature extractor, such as a pre-trained ResNet or a foundation model. This approach lacks the capability for feature fine-tuning within the specific downstream tasks, limiting its adaptability and performance. To address this issue, we propose a Re-embedded Regional Transformer (R$^2$T) for re-embedding the instance features online, which captures fine-grained local features and establishes connections across different regions. Unlike existing works that focus on pre-training powerful feature extractor or designing sophisticated instance aggregator, R$^2$T is tailored to re-embed instance features online. It serves as a portable module that can seamlessly integrate into mainstream MIL models. Extensive experimental results on common computational pathology tasks validate that: 1) feature re-embedding improves the performance of MIL models based on ResNet-50 features to the level of foundation model features, and further enhances the performance of foundation model features; 2) the R$^2$T can introduce more significant performance improvements to various MIL models; 3) R$^2$T-MIL, as an R$^2$T-enhanced AB-MIL, outperforms other latest methods by a large margin.The code is available at: https://github.com/DearCaat/RRT-MIL.

References (50)

Citations (11)

View on Semantic Scholar

Summary

The paper introduces the R²T module that re-embeds instance features in MIL, achieving levels of performance similar to foundation models.
It integrates regional and cross-region self-attention mechanisms to enhance the extraction and fusion of fine-grained features from pathology images.
Experimental results on datasets like CAMELYON-16, TCGA-BRCA, LUAD, and LUSC show significant improvements in accuracy, AUC, F1-score, and C-index.

Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology

Introduction

The paper introduces a novel approach for enhancing the performance of Multiple Instance Learning (MIL) models in computational pathology. Computational pathology merges pathology, image analysis, and computer science to analyze pathological images effectively. Traditional MIL paradigms employ offline feature extractors like ResNet, which lack the adaptability required for specific pathology tasks as they do not allow feature fine-tuning for downstream applications.

Re-Embedded Regional Transformer (R $^2$ T)

Concept and Design

The proposed Re-embedded Regional Transformer (R $^2$ T) aims to facilitate instance feature re-embedding, thereby enhancing the discriminability of features used within MIL frameworks. This module allows for the integration of fine-grained local features within individual regions of whole slide images (WSI) and their connection throughout different regions.

Figure 1: Overview of proposed R $^2$ T-MIL. This method processes tissue patches through region partition, feature re-embedding, and cross-region fusion to finalize MIL model predictions.

Architecture Components

Regional Multi-head Self-attention (R-MSA): This component conducts localized feature extraction by dividing slides into multiple regions and computing self-attention within each region separately. It reduces computation while highlighting salient local features.
Cross-region Multi-head Self-attention (CR-MSA): Facilitates the fusion of features across regions to improve the comprehensiveness of the instance features.
Embedded Position Encoding Generator (EPEG): Enhances feature encoding by embedding positional information directly into the MSA, providing a more efficient encoding solution compared to traditional positional encodings.
Figure 2: Illustration of Embedded Position Encoding Generator.

Methodology

The integration of R $^2$ T into an MIL framework follows these steps:

Instance Feature Extraction: Features from each patch are initially extracted utilizing a pre-trained model.
Feature Re-embedding: Through R $^2$ T, features are re-embedded, addressing the limitations of offline feature learning.
Aggregation and Bag Classification: Enhanced features are aggregated, and a bag-level prediction is made.

This process ensures end-to-end learning of instance features, enabling improved adaptability and performance of MIL on pathology datasets.

Results

Benchmarks and Improvements

Experimental results demonstrate significant performance improvements over baseline methods across various datasets:

CAMELYON-16 and TCGA-BRCA: The R $^2$ T-enhanced models show superior accuracy, AUC, and F1-score.
LUAD and LUSC prognosis tasks: The re-embedded features yield a marked increase in C-index, indicating better survival prediction.
Figure 3: Performance improvement by adding R $^2T^.$

Ablation Studies

Detailed ablation studies highlight the efficacy of R $^2$ T's components, particularly the necessity of feature re-embedding and the benefits of localized attention mechanisms in computational pathology scenarios.

Figure 4: The performances under different region partition strategies on two datasets.

Conclusion

The R $^2$ T module significantly enhances feature adaptability and discriminability in MIL frameworks for computational pathology, pushing the performance to levels comparable to models pre-trained on extensive datasets. This research suggests a new paradigm for integrating feature re-embedding within MIL models, showcasing the potential of the R $^2$ T module in varied computational pathology tasks, extending beyond diagnosis to survival predictions and subtype classification.

The results support the hypothesis that localized, attention-based feature re-embedding can mitigate the shortcomings of traditional MIL feature extraction pipelines, marking a step towards more efficient and interpretable computational pathology applications.