Query Nearby: Offset-Adjusted Mask2Former enhances small-organ segmentation (2506.05897v1)

Published 6 Jun 2025 in cs.CV

Abstract: Medical segmentation plays an important role in clinical applications like radiation therapy and surgical guidance, but acquiring clinically acceptable results is difficult. In recent years, progress has been witnessed with the success of utilizing transformer-like models, such as combining the attention mechanism with CNN. In particular, transformer-based segmentation models can extract global information more effectively, compensating for the drawbacks of CNN modules that focus on local features. However, utilizing transformer architecture is not easy, because training transformer-based models can be resource-demanding. Moreover, due to the distinct characteristics in the medical field, especially when encountering mid-sized and small organs with compact regions, their results often seem unsatisfactory. For example, using ViT to segment medical images directly only gives a DSC of less than 50\%, which is far lower than the clinically acceptable score of 80\%. In this paper, we used Mask2Former with deformable attention to reduce computation and proposed offset adjustment strategies to encourage sampling points within the same organs during attention weights computation, thereby integrating compact foreground information better. Additionally, we utilized the 4th feature map in Mask2Former to provide a coarse location of organs, and employed an FCN-based auxiliary head to help train Mask2Former more quickly using Dice loss. We show that our model achieves SOTA (State-of-the-Art) performance on the HaNSeg and SegRap2023 datasets, especially on mid-sized and small organs.Our code is available at link https://github.com/earis/Offsetadjustment\_Background-location\_Decoder\_Mask2former.

Summary

The paper introduces offset adjustment strategies in Mask2Former to refine deformable attention and improve segmentation accuracy for small organs.
It leverages multi-resolution feature fusion and an FCN-based auxiliary decoder to enhance global feature extraction and mitigate background distractions.
Experimental results on HaN-Seg and SegRap2023 datasets demonstrate significant improvements in mDice and mIoU over baseline methods.

Query Nearby: Offset-Adjusted Mask2Former Enhances Small-Organ Segmentation

Introduction

This paper addresses the complex challenge of medical image segmentation, specifically targeting small organs, which often pose difficulties due to their compact regions amidst extensive backgrounds. The proposed method leverages the Mask2Former architecture, augmented with deformable attention and novel offset adjustment strategies. Transformer-based models, although powerful, are typically resource-intensive and struggle with small organ segmentation due to high variance in organ size and position. Mask2Former, enhanced with offset adjustments and auxiliary mechanisms, provides a solution by improving segmentation accuracy for clinical applications.

Methodology

Offset Adjustment Strategies

In medical images, the focus is on compact regions, necessitating efficient targeting of attention mechanisms to those specific areas. The paper introduces three offset adjustment strategies to refine the deformable attention process:

Manual Threshold Constraint: Controlling offsets by setting a manual threshold to prevent excessive enlargement, thereby narrowing down the focus on adjacent points.
Softmax Scaling: Utilizing a softmax function to naturally reduce offset variance and focus sampling within closer proximities.
Softmax with Scaling Factor: Amplifying attention towards potential organ areas by applying a scaling factor post-softmax to emphasize areas more likely belonging to small organs.

These methods facilitate the model’s capacity to handle small organ segmentation robustly.

Figure 1: Results on Different Models for the HaN-Seg Dataset and Visualization of Selected Results.

Feature Fusion and Auxiliary Mechanism

By incorporating unused feature maps in Mask2Former, the model effectively fuses multi-resolution data without additional computational load, enhancing global feature extraction. Moreover, introducing an FCN-based background-location sensitive auxiliary decoder significantly mitigates distraction caused by extensive backgrounds typical in medical images. This auxiliary branch leverages a contrastive approach to refine organ localization further.

Figure 2: Overview of Our Offset-Adjusted Mask2Former with Feature Fusion, Background-Location Sensitive Auxiliary Branch, and Three Offset Adjustment Strategies.

Experimental Evaluation

Datasets and Preprocessing

The proposed model was evaluated on the HaN-Seg and SegRap2023 datasets, demonstrating superior performance compared to state-of-the-art methods. A pre-processing trick involving enhanced three-channel CT stacks showed improvement without further complexity. Training configurations included an RTX 4090 GPU cluster, using a combined Dice loss and cross-entropy loss for optimization.

Comparative Analysis

The model achieved a remarkable improvement in mDice and mIoU metrics, particularly outperforming nn-UNet baselines under similar conditions. The comprehensive evaluation against other models demonstrated its efficacy in handling diverse organ sizes with precision.

Figure 3: Comparison in some mid-sized and small organ classes with nn-UNet baseline and previous SOTA.

Figure 4: Qualitative results of some re-implemented models on SegRap2023.

Conclusion

The paper’s contributions significantly extend Mask2Former’s capabilities, particularly for small organ segmentation, achieving state-of-the-art results across multiple datasets. By integrating offset adjustment strategies, feature fusion, and auxiliary decoders, the model excels in efficiency and accuracy, paving the way for improved applications in medical image analysis. The framework’s design facilitates robust and resource-efficient performance, critical for clinical adoption.

In conclusion, the work delineates a promising enhancement for transformer-based medical segmentation tasks, offering insights and methodologies that could influence future research and practical implementations in the domain. Future studies could explore further optimization of offset strategies and auxiliary branches to scale these benefits across varied clinical imaging modalities.