SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation (2108.06227v4)

Published 13 Aug 2021 in cs.CV, cs.AI, and cs.LG

Abstract: Automated segmentation in medical image analysis is a challenging task that requires a large amount of manually labeled data. However, most existing learning-based approaches usually suffer from limited manually annotated medical data, which poses a major practical problem for accurate and robust medical image segmentation. In addition, most existing semi-supervised approaches are usually not robust compared with the supervised counterparts, and also lack explicit modeling of geometric structure and semantic information, both of which limit the segmentation accuracy. In this work, we present SimCVD, a simple contrastive distillation framework that significantly advances state-of-the-art voxel-wise representation learning. We first describe an unsupervised training strategy, which takes two views of an input volume and predicts their signed distance maps of object boundaries in a contrastive objective, with only two independent dropout as mask. This simple approach works surprisingly well, performing on the same level as previous fully supervised methods with much less labeled data. We hypothesize that dropout can be viewed as a minimal form of data augmentation and makes the network robust to representation collapse. Then, we propose to perform structural distillation by distilling pair-wise similarities. We evaluate SimCVD on two popular datasets: the Left Atrial Segmentation Challenge (LA) and the NIH pancreas CT dataset. The results on the LA dataset demonstrate that, in two types of labeled ratios (i.e., 20% and 10%), SimCVD achieves an average Dice score of 90.85% and 89.03% respectively, a 0.91% and 2.22% improvement compared to previous best results. Our method can be trained in an end-to-end fashion, showing the promise of utilizing SimCVD as a general framework for downstream tasks, such as medical image synthesis, enhancement, and registration.

Authors (5)

Chenyu You (66 papers)
Yuan Zhou (251 papers)
Ruihan Zhao (17 papers)
Lawrence Staib (13 papers)
James S. Duncan (67 papers)

Citations (200)

View on Semantic Scholar

Summary

The paper introduces SimCVD, a simple contrastive distillation framework for semi-supervised medical image segmentation that enhances accuracy with limited labeled data.
SimCVD employs boundary-aware contrastive learning using signed distance maps and structural distillation to effectively capture geometric and semantic information.
Experimental results show SimCVD significantly outperforms state-of-the-art methods on the LA and NIH pancreas datasets, achieving higher Dice scores with sparse labeling.

SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation

This paper introduces SimCVD, a novel framework designed to enhance the accuracy of semi-supervised medical image segmentation when labeled data is scarce. The primary motivation is to address the limitations in existing semi-supervised learning methods, which often lack robustness compared to fully-supervised models and fail to effectively leverage geometric and semantic information, leading to suboptimal segmentation accuracy.

Key Contributions and Methodology

SimCVD proposes a simple contrastive distillation framework that advances state-of-the-art voxel-wise representation learning. The framework operates on a mean-teacher architecture, which consists of two networks: a student and a teacher. The student network learns from both labeled and unlabeled data, while the teacher network parameters are updated as an exponential moving average of the student network's parameters. This strategy has been proven effective in previous work for improving training stability and final performance.

Key Aspects of SimCVD:

Boundary-aware Contrastive Learning: SimCVD predicts signed distance maps (SDMs) of object boundaries using two independent dropout masks to avoid representation collapse. This dropout mechanism acts as a minimal form of data augmentation, making the model robust with far less labeled data.
Structural Distillation: To address geometric information loss, SimCVD utilizes a distillation process that focuses on pair-wise similarities. This process jointly predicts segmentation maps and distance maps for labeled data, enforcing a global shape constraint and enabling the model to better capture boundary-aware features.
Unified Loss Function: The overall training objective includes supervised segmentation and SDM loss for labeled data, combined with contrastive loss, pair-wise distillation loss, and consistency loss for the unlabeled dataset. This multi-faceted loss function ensures that both global and local features of the data are effectively utilized.

Results and Evaluation

The paper presents comprehensive experimental results on two benchmark datasets: the Left Atrial Segmentation Challenge (LA) dataset and the NIH pancreas CT dataset. The SimCVD framework demonstrates significant improvements over state-of-the-art methods in semi-supervised segmentation tasks:

When tested on the LA dataset with 20% and 10% labeled data, SimCVD achieved Dice scores of 90.85% and 89.03%, representing improvements of 0.91% and 2.22% over the previous best methods.
The generalizability of SimCVD was further validated on the pancreas dataset, where it surpassed existing techniques with up to a 6.72% increase in Dice performance.

Implications and Future Work

Practically, the SimCVD framework offers a robust solution for medical image segmentation tasks in scenarios where labeled data is limited. By effectively utilizing a contrastive distillation mechanism, SimCVD reduces the reliance on large annotated datasets, thus mitigating one of the major obstacles in applying deep learning in medical fields.

Theoretically, this framework highlights the potential of employing dropout as a data augmentation tool in contrastive learning and paves the way for further exploration into incorporating additional geometric constraints in semi-supervised learning models.

Looking ahead, future developments could explore extending SimCVD to handle multi-class segmentation tasks, refining the architecture for various medical imaging modalities, or integrating the framework with more complex data augmentation strategies to further improve model robustness and accuracy. The framework's foundational ideas also open avenues for its application in other domains where labeled data is scarce, yet unlabeled data is abundant.