Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training (2304.06716v1)

Published 13 Apr 2023 in cs.CV

Abstract: Large-scale models pre-trained on large-scale datasets have profoundly advanced the development of deep learning. However, the state-of-the-art models for medical image segmentation are still small-scale, with their parameters only in the tens of millions. Further scaling them up to higher orders of magnitude is rarely explored. An overarching goal of exploring large-scale models is to train them on large-scale medical segmentation datasets for better transfer capacities. In this work, we design a series of Scalable and Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14 million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image segmentation model to date. Our STU-Net is based on nnU-Net framework due to its popularity and impressive performance. We first refine the default convolutional blocks in nnU-Net to make them scalable. Then, we empirically evaluate different scaling combinations of network depth and width, discovering that it is optimal to scale model depth and width together. We train our scalable STU-Net models on a large-scale TotalSegmentator dataset and find that increasing model size brings a stronger performance gain. This observation reveals that a large model is promising in medical image segmentation. Furthermore, we evaluate the transferability of our model on 14 downstream datasets for direct inference and 3 datasets for further fine-tuning, covering various modalities and segmentation targets. We observe good performance of our pre-trained model in both direct inference and fine-tuning. The code and pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net.

Citations (48)

Summary

  • The paper introduces STU-Net, a scalable U-Net model using large-scale supervised pre-training that significantly improves Dice scores in CT image segmentation.
  • It refines the nnU-Net framework by scaling model depth and width and integrating residual connections to optimize segmentation performance.
  • The study demonstrates robust transferability across 14 downstream datasets, confirming the model's adaptability to diverse medical imaging tasks.

An Overview of STU-Net: Scalable and Transferable Medical Image Segmentation Models

The paper "STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training" presents a comprehensive paper on the advancement of scalable and transferable models for medical image segmentation. In the landscape of medical image analysis, segmentation acts as a pivotal process, facilitating the annotation of anatomical structures and lesions, which is crucial for downstream clinical tasks such as registration, quantification, and image-guided interventions.

Objectives and Methodology

The primary aim of the research is to bridge the gap between the scalability observed in non-medical contexts and the relatively static development in medical image segmentation. Large models pre-trained on extensive datasets have proven tremendously beneficial in other domains like computer vision and natural language processing, notably with advancements like GPT-3 and Vision Transformers. By identifying this potential, the authors propose the Scalable and Transferable U-Net (STU-Net) models, ranging from 14 million to 1.4 billion parameters—the latter marking the largest model in its field to date.

STU-Net is built upon the nnU-Net framework, a widely-recognized standard due to its self-configuring capabilities and adaptability across diverse medical segmentation tasks. The authors enhance the nnU-Net architecture, incorporating modifications to scale depth and width efficiently, and refining convolutional blocks with residual connections to avoid gradient issues. Upsampling techniques are altered, employing interpolation followed by a convolution operation rather than transpose convolution, reducing the computational burden and maintaining model adaptability across tasks.

Evaluation and Results

The paper conducts a thorough empirical evaluation of these models on the TotalSegmentator dataset, a substantial medical image repository comprising 1204 CT images with 104 annotated structures. STU-Net's performance evidently scales with size, as demonstrated by substantial improvements in Dice Similarity Coefficient (DSC) scores across evaluated categories. The 1.4B model shows notable superiority in segmentation accuracy, highlighting the efficacy of scaling in the medical domain.

Additionally, the research extends into the transferability of STU-Net, assessing it on 14 downstream datasets for direct inference and 3 datasets for further fine-tuning across various imaging modalities and segmentation targets. Results indicate impressive generalization capabilities, affirming the model's effectiveness without additional tuning on sets with overlapping target categories.

Implications and Future Work

The introduction of large-scale models specifically designed for medical imaging is promising for the progression towards Medical Artificial General Intelligence (MedAGI). Beyond superior performance on segmentation tasks, large models like STU-Net showcase potential in multi-task capacities, indicating future applicability in areas such as detection, classification, and beyond.

Researchers might explore integrating these foundation models into more generalized medical image processing tasks, with prospects for handling diverse datasets and objectives in clinical environments. The implications of such models extend into practical applications, enhancing diagnostic accuracy, treatment planning, and patient outcomes.

Overall, the STU-Net demonstrates the power of leveraging scalability in model architecture, potentially transforming the landscape of medical image analysis through enriched pre-training and cross-task adaptability. Further investigations could delve into expanding dataset variety, incorporating additional modalities, or enhancing model efficiency, thereby building stronger foundational models aligned with the vast potential of MedAGI.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com