Semi-Supervised Medical Image Segmentation via Cross Teaching between CNN and Transformer (2112.04894v2)

Published 9 Dec 2021 in eess.IV and cs.CV

Abstract: Recently, deep learning with Convolutional Neural Networks (CNNs) and Transformers has shown encouraging results in fully supervised medical image segmentation. However, it is still challenging for them to achieve good performance with limited annotations for training. In this work, we present a very simple yet efficient framework for semi-supervised medical image segmentation by introducing the cross teaching between CNN and Transformer. Specifically, we simplify the classical deep co-training from consistency regularization to cross teaching, where the prediction of a network is used as the pseudo label to supervise the other network directly end-to-end. Considering the difference in learning paradigm between CNN and Transformer, we introduce the Cross Teaching between CNN and Transformer rather than just using CNNs. Experiments on a public benchmark show that our method outperforms eight existing semi-supervised learning methods just with a simpler framework. Notably, this work may be the first attempt to combine CNN and transformer for semi-supervised medical image segmentation and achieve promising results on a public benchmark. The code will be released at: https://github.com/HiLab-git/SSL4MIS.

Citations (168)

View on Semantic Scholar

Summary

The paper's main contribution is the cross teaching framework where CNN and transformer mutually supervise by using each network's predictions as pseudo labels.
It employs a semi-supervised approach with limited annotated data, achieving a 3.8% DSC improvement and a 3.6 mm reduction in HD95 on the ACDC dataset.
The study demonstrates that combining CNNs and transformers can reduce annotation costs while enhancing segmentation accuracy in clinical applications.

Semi-Supervised Medical Image Segmentation via Cross Teaching between CNN and Transformer

This paper presents an innovative framework for semi-supervised medical image segmentation, leveraging the complementary strengths of Convolutional Neural Networks (CNNs) and transformers. The central proposition involves a mechanism termed "Cross Teaching," which facilitates cross-network supervision between two distinct types of architectures—CNNs and transformers—by using each network's predictions as pseudo labels for the other.

Technical Contributions

Medical image segmentation stands as a cornerstone in various clinical applications, yet the formidable challenge remains the reliance on extensive pixel-level annotations. This motivated the exploration of semi-supervised approaches that can maximize the utility of limited annotated data with abundant unlabeled datasets.

Cross Teaching Framework: The framework distinguishes itself by proposing cross teaching instead of typical consistency regularization. In essence, it exploits the predictive output of one network to train the other, hence harnessing the strengths of both models: CNNs with their proficiency in local feature extraction, and transformers with their capability in modeling global dependencies.
Comparison of Learning Paradigms: Previous efforts predominantly focused on CNNs for semi-supervised learning, largely ignoring the potential of transformers due to their data-intensive nature. The paper posits and demonstrates that a combined approach not only is feasible but also offers superior results.
Implementation: The framework was evaluated on the ACDC dataset—a benchmark in medical image segmentation—with CNN-based UNet and transformer-based Swin-UNet architectures. It achieved notable performance improvements over eight contemporary semi-supervised approaches.

Experimental Evaluation

The experiments yielded compelling numerical results, where the Cross Teaching framework significantly outperformed traditional semi-supervised methods:

When training with only 7 labeled cases, the proposed method achieved a mean Dice Coefficient (DSC) of 0.864, which was an improvement of 3.8% over the next best method, and a 3.6 mm reduction in 95% Hausdorff Distance (HD $_{95}$ ).
The framework demonstrated its robustness even when trained with a mere 3 labeled cases, evidencing its potential to mitigate the annotation bottleneck.

Practical and Theoretical Implications

The approach addresses two critical needs in medical imaging: reducing the reliance on extensive labeled datasets and enhancing segmentation accuracy with limited annotations. From a practical perspective, integrating Cross Teaching into clinical workflows can streamline the segmentation process, offering a pragmatic solution with lower annotation costs.

Theoretically, this work opens avenues for further exploration in combining heterogeneous network architectures within semi-supervised settings. Future research may explore optimizing such network collaborations or extending them across more complex medical image analysis tasks.

Speculations for Future AI Developments

This novel integration of CNNs and transformers paves the way for future implementations where learning paradigms are not isolated but synergistically harnessed. As AI in healthcare continues to evolve, models effectively leveraging semi-supervised learning can lead to scalable, efficient, and less resource-intensive solutions.

Thus, this paper not only advances the state-of-the-art in medical image segmentation but also serves as a testament to the promising prospects of hybrid learning frameworks in artificial intelligence research.

PDF Markdown

Related Papers

GitHub

GitHub - HiLab-git/SSL4MIS: Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations. (2,197 stars)