Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dual Cross-Attention for Medical Image Segmentation (2303.17696v1)

Published 30 Mar 2023 in cs.CV, cs.LG, and eess.IV

Abstract: We propose Dual Cross-Attention (DCA), a simple yet effective attention module that is able to enhance skip-connections in U-Net-based architectures for medical image segmentation. DCA addresses the semantic gap between encoder and decoder features by sequentially capturing channel and spatial dependencies across multi-scale encoder features. First, the Channel Cross-Attention (CCA) extracts global channel-wise dependencies by utilizing cross-attention across channel tokens of multi-scale encoder features. Then, the Spatial Cross-Attention (SCA) module performs cross-attention to capture spatial dependencies across spatial tokens. Finally, these fine-grained encoder features are up-sampled and connected to their corresponding decoder parts to form the skip-connection scheme. Our proposed DCA module can be integrated into any encoder-decoder architecture with skip-connections such as U-Net and its variants. We test our DCA module by integrating it into six U-Net-based architectures such as U-Net, V-Net, R2Unet, ResUnet++, DoubleUnet and MultiResUnet. Our DCA module shows Dice Score improvements up to 2.05% on GlaS, 2.74% on MoNuSeg, 1.37% on CVC-ClinicDB, 1.12% on Kvasir-Seg and 1.44% on Synapse datasets. Our codes are available at: https://github.com/gorkemcanates/Dual-Cross-Attention

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Gorkem Can Ates (3 papers)
  2. Prasoon Mohan (1 paper)
  3. Emrah Celik (1 paper)
Citations (39)

Summary

Overview of Dual Cross-Attention for Medical Image Segmentation

Introduction

The paper introduces a novel attention module termed Dual Cross-Attention (DCA) for enhancing skip-connections in U-Net-based architectures, which are widely used for medical image segmentation tasks. The proposed DCA module is designed to mitigate the semantic gap between encoder and decoder features by sequentially capturing channel and spatial dependencies across multi-scale encoder stages.

Methodology

The DCA module integrates two main components: Channel Cross-Attention (CCA) and Spatial Cross-Attention (SCA). These components work sequentially to extract global channel-wise and spatial dependencies within the multi-scale encoder features. Specifically, CCA captures global interactions across channels by cross-attention between channel tokens, whereas SCA focuses on spatial dependencies across spatial tokens.

  1. Channel Cross-Attention (CCA): This component first normalizes the channel tokens and subsequently applies cross-attention across them using depth-wise convolutions for generating queries, keys, and values.
  2. Spatial Cross-Attention (SCA): Following CCA, SCA takes the enriched channel-wise features and further refines them by capturing spatial dependencies. It also employs depth-wise convolutions for generating the necessary projections and applies cross-attention in the spatial domain.

The DCA module is designed to be lightweight, incorporating 2D average pooling and depth-wise convolutions to minimize computational overhead. This module can be seamlessly integrated into various encoder-decoder architectures with skip-connections, including U-Net, ResUnet++, V-Net, and their variants.

Experiments and Results

The authors conducted extensive experiments using six U-Net-based architectures on five benchmark medical image segmentation datasets: GlaS, MoNuSeg, CVC-ClinicDB, Kvasir-Seg, and Synapse.

Performance Improvements:

  • The DCA module delivered Dice Score improvements of up to 2.05% on the GlaS dataset, 2.74% on MoNuSeg, 1.37% on CVC-ClinicDB, 1.12% on Kvasir-Seg, and 1.44% on Synapse.
  • These improvements are accompanied by a minimal increase in the number of parameters. For instance, the parameter increase for U-Net was only 0.3%, while DoubleUnet exhibited a 3.4% increase due to its three skip-connection schemes.

Qualitative Enhancements:

  • Visual comparisons indicate that models integrated with the DCA mechanism provide more consistent boundaries and preserve accurate shape information. They also exhibit superior capability in differentiating discrete parts by eliminating false positive predictions.

Implications and Future Developments

Practical Implications:

  • Enhanced performance in medical image segmentation directly aids clinical decision-making by providing more accurate segmentation of anatomical structures and pathological regions. This could improve diagnostics and treatment planning, especially in fields such as oncology and gastroenterology.

Theoretical Implications:

  • The success of the dual attention mechanism underscores the importance of capturing long-range dependencies in both channel and spatial dimensions. It suggests that enhancing feature representation through cross-attention mechanisms can significantly narrow the semantic gap in encoder-decoder architectures.

Future Developments

Further work could explore the extension of the DCA module to other encoder-decoder architectures beyond U-Net and its variants. Additionally, combining DCA with other attention mechanisms or architectural innovations might offer further performance boosts. Researchers might also consider applying DCA to 3D medical image segmentation tasks where capturing long-range dependencies is even more critical.

Conclusion

The introduction of the Dual Cross-Attention (DCA) module provides a significant enhancement to skip-connections in U-Net-based architectures for medical image segmentation. By effectively capturing long-range dependencies, the DCA module offers both improved performance and minimal computational overhead, marking a substantial contribution to the field of medical image analysis.