SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation

Published 20 Dec 2024 in cs.CV | (2412.15526v1)

Abstract: Although semi-supervised learning has made significant advances in the field of medical image segmentation, fully annotating a volumetric sample slice by slice remains a costly and time-consuming task. Even worse, most of the existing approaches pay much attention to image-level information and ignore semantic features, resulting in the inability to perceive weak boundaries. To address these issues, we propose a novel Semantic-Guided Triplet Co-training (SGTC) framework, which achieves high-end medical image segmentation by only annotating three orthogonal slices of a few volumetric samples, significantly alleviating the burden of radiologists. Our method consist of two main components. Specifically, to enable semantic-aware, fine-granular segmentation and enhance the quality of pseudo-labels, a novel semantic-guided auxiliary learning mechanism is proposed based on the pretrained CLIP. In addition, focusing on a more challenging but clinically realistic scenario, a new triple-view disparity training strategy is proposed, which uses sparse annotations (i.e., only three labeled slices of a few volumes) to perform co-training between three sub-networks, significantly improving the robustness. Extensive experiments on three public medical datasets demonstrate that our method outperforms most state-of-the-art semi-supervised counterparts under sparse annotation settings. The source code is available at https://github.com/xmeimeimei/SGTC.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces SGTC, a novel framework that leverages semantic guidance and a triple-view disparity strategy to improve segmentation in sparsely annotated medical images.
The SGAL mechanism employs pretrained CLIP text features to generate high-quality pseudo-labels, enhancing the delineation of weak anatomical boundaries.
Experiments on LA2018, KiTS19, and LiTS datasets demonstrate that SGTC surpasses state-of-the-art semi-supervised methods with superior Dice coefficients.

Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation

The paper presents a novel framework named Semantic-Guided Triplet Co-training (SGTC) aimed at enhancing semi-supervised medical image segmentation, managing to bypass the costly and labor-intensive task of manual, slice-by-slice annotation of volumetric medical images. This is achieved by annotating only three orthogonal slices per volume, significantly reducing the annotation burden on radiologists while maintaining high segmentation performance. Given the scarcity of fully annotated medical image datasets, the SGTC framework offers a promising alternative by leveraging semantic guidance and a unique disparity training strategy to optimize the use of sparse annotations.

Framework and Innovation

The SGTC framework introduces two pivotal components:

Semantic-Guided Auxiliary Learning (SGAL) Mechanism: This component employs the pretrained CLIP model to extract semantic features via text representations that enhance the segmentation process. By integrating text-based representations into the segmentation pipeline, the model can generate higher-quality pseudo-labels, which are crucial for guiding the learning of unlabeled data. The semantic-aware aspects of this component directly address the common challenge of accurately discerning weak boundaries in medical images.
Triple-View Disparity Training (TVDT) Strategy: This training strategy capitalizes on disparity among three distinct sub-networks, each supervised by differently oriented slices (sagittal, coronal, axial). The sub-networks collaboratively improve segmentation accuracy by sharing complementary information gleaned from their respective views. Importantly, this configuration is anticipated to enhance the representation of 3D medical data distributions in a manner more aligned with real-world clinical scenarios, where exhaustive annotations are impractical.

Experimental Validation

The effectiveness of SGTC is substantiated through extensive experimentation conducted on three public medical imaging datasets: LA2018, KiTS19, and LiTS. Across these benchmarks, SGTC consistently surpasses recent state-of-the-art (SOTA) semi-supervised methods, particularly under the constraints of sparse annotation. On the LA2018 dataset, for instance, SGTC demonstrates a Dice coefficient improvement over SOTA methods, highlighting the framework's robust performance in enabling precise anatomical segmentation with minimal labeled data. Similarly, on KiTS19 and LiTS datasets, SGTC achieves superior segmentation metrics, underscoring the advantage of integrating semantic guidance and triplet co-training mechanisms.

Significance and Future Directions

By significantly reducing the need for extensive labeled datasets, the SGTC framework aligns with the practical demands of clinical environments, where time and resource constraints often limit data annotation. This approach not only enhances segmentation performance under limited supervision but also suggests a shift towards leveraging multimodal resources like language representations in medical imaging tasks.

Looking ahead, further exploration into the optimization of text prompts and their domain alignment with medical imaging is necessary to maximize the efficacy of semantic-guided segmentation. Additionally, the versatility of SGTC under varying medical image modalities and its scalability across other medical imaging tasks remain promising avenues for research, potentially extending the applicability and efficiency of semi-supervised learning frameworks in healthcare diagnostics.