- The paper introduces offset adjustment strategies in Mask2Former to refine deformable attention and improve segmentation accuracy for small organs.
- It leverages multi-resolution feature fusion and an FCN-based auxiliary decoder to enhance global feature extraction and mitigate background distractions.
- Experimental results on HaN-Seg and SegRap2023 datasets demonstrate significant improvements in mDice and mIoU over baseline methods.
Introduction
This paper addresses the complex challenge of medical image segmentation, specifically targeting small organs, which often pose difficulties due to their compact regions amidst extensive backgrounds. The proposed method leverages the Mask2Former architecture, augmented with deformable attention and novel offset adjustment strategies. Transformer-based models, although powerful, are typically resource-intensive and struggle with small organ segmentation due to high variance in organ size and position. Mask2Former, enhanced with offset adjustments and auxiliary mechanisms, provides a solution by improving segmentation accuracy for clinical applications.
Methodology
Offset Adjustment Strategies
In medical images, the focus is on compact regions, necessitating efficient targeting of attention mechanisms to those specific areas. The paper introduces three offset adjustment strategies to refine the deformable attention process:
- Manual Threshold Constraint: Controlling offsets by setting a manual threshold to prevent excessive enlargement, thereby narrowing down the focus on adjacent points.
- Softmax Scaling: Utilizing a softmax function to naturally reduce offset variance and focus sampling within closer proximities.
- Softmax with Scaling Factor: Amplifying attention towards potential organ areas by applying a scaling factor post-softmax to emphasize areas more likely belonging to small organs.
These methods facilitate the model’s capacity to handle small organ segmentation robustly.
Figure 1: Results on Different Models for the HaN-Seg Dataset and Visualization of Selected Results.
Feature Fusion and Auxiliary Mechanism
By incorporating unused feature maps in Mask2Former, the model effectively fuses multi-resolution data without additional computational load, enhancing global feature extraction. Moreover, introducing an FCN-based background-location sensitive auxiliary decoder significantly mitigates distraction caused by extensive backgrounds typical in medical images. This auxiliary branch leverages a contrastive approach to refine organ localization further.
Figure 2: Overview of Our Offset-Adjusted Mask2Former with Feature Fusion, Background-Location Sensitive Auxiliary Branch, and Three Offset Adjustment Strategies.
Experimental Evaluation
Datasets and Preprocessing
The proposed model was evaluated on the HaN-Seg and SegRap2023 datasets, demonstrating superior performance compared to state-of-the-art methods. A pre-processing trick involving enhanced three-channel CT stacks showed improvement without further complexity. Training configurations included an RTX 4090 GPU cluster, using a combined Dice loss and cross-entropy loss for optimization.
Comparative Analysis
The model achieved a remarkable improvement in mDice and mIoU metrics, particularly outperforming nn-UNet baselines under similar conditions. The comprehensive evaluation against other models demonstrated its efficacy in handling diverse organ sizes with precision.
Figure 3: Comparison in some mid-sized and small organ classes with nn-UNet baseline and previous SOTA.
Figure 4: Qualitative results of some re-implemented models on SegRap2023.
Conclusion
The paper’s contributions significantly extend Mask2Former’s capabilities, particularly for small organ segmentation, achieving state-of-the-art results across multiple datasets. By integrating offset adjustment strategies, feature fusion, and auxiliary decoders, the model excels in efficiency and accuracy, paving the way for improved applications in medical image analysis. The framework’s design facilitates robust and resource-efficient performance, critical for clinical adoption.
In conclusion, the work delineates a promising enhancement for transformer-based medical segmentation tasks, offering insights and methodologies that could influence future research and practical implementations in the domain. Future studies could explore further optimization of offset strategies and auxiliary branches to scale these benefits across varied clinical imaging modalities.