EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation (2405.06880v1)
Abstract: An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroup/EMCAD.
- Dataset of breast ultrasound images. Data in brief, 28:104863, 2020.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(12):2481–2495, 2017.
- Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput. Med. Imaging Graph., 43:99–111, 2015.
- Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nature methods, 16(12):1247–1253, 2019.
- Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537, 2021.
- An integrated micro-and macroarchitectural analysis of the drosophila brain by computer-assisted serial section electron microscopy. PLoS biology, 8(10):e1000502, 2010.
- Aau-net: an adaptive attention u-net for breast lesions segmentation in ultrasound images. IEEE Trans. Med. Imaging, 2022.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5659–5667, 2017a.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 40(4):834–848, 2017b.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Eur. Conf. Comput. Vis., pages 801–818, 2018a.
- Reverse attention for salient object detection. In Eur. Conf. Comput. Vis., pages 234–250, 2018b.
- Conditional positional encodings for vision transformers. arXiv preprint arXiv:2102.10882, 2021.
- Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
- Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In IEEE Int. Symp. Biomed. Imaging, pages 168–172. IEEE, 2018.
- Imagenet: A large-scale hierarchical image database. In IEEE Conf. Comput. Vis. Pattern Recog., pages 248–255. Ieee, 2009.
- Polyp-pvt: Polyp segmentation with pyramid vision transformers. arXiv preprint arXiv:2108.06932, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
- Pranet: Parallel reverse attention network for polyp segmentation. In Int. Conf. Med. Image Comput. Comput. Assist. Interv., pages 263–273. Springer, 2020.
- Deep residual learning for image recognition. In IEEE Conf. Comput. Vis. Pattern Recog., pages 770–778, 2016.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Squeeze-and-excitation networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7132–7141, 2018.
- Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP, pages 1055–1059. IEEE, 2020.
- Missformer: An effective medical image segmentation transformer. arXiv preprint arXiv:2109.07162, 2021.
- Acc-unet: A completely convolutional unet model for the 2020s. In Int. Conf. Med. Image Comput. Comput. Assist. Interv., pages 692–702. Springer, 2023.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Int. Conf. Mach. Learn., pages 448–456. pmlr, 2015.
- How much position information do convolutional neural networks encode? arXiv preprint arXiv:2001.08248, 2020.
- Kvasir-seg: A segmented polyp dataset. In Int. Conf. Multimedia Model., pages 451–462. Springer, 2020.
- Uacanet: Uncertainty augmented context attention for polyp segmentation. In ACM Int. Conf. Multimedia, pages 2167–2175, 2021.
- Convolutional deep belief networks on cifar-10. Unpublished manuscript, 40(7):1–9, 2010.
- Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst., 25, 2012.
- Convformer: Plug-and-play cnn-style transformers for improving medical image segmentation. In Int. Conf. Med. Image Comput. Comput. Assist. Interv., pages 642–651. Springer, 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Int. Conf. Comput. Vis., pages 10012–10022, 2021.
- A convnet for the 2020s. In IEEE Conf. Comput. Vis. Pattern Recog., pages 11976–11986, 2022.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Dc-unet: rethinking the u-net architecture with dual channel efficient cnn for medical image segmentation. In Med. Imaging 2021: Image Process., pages 758–768. SPIE, 2021.
- Caranet: context axial reverse attention network for segmentation of small medical objects. In Med. Imaging 2022: Image Process., pages 81–92. SPIE, 2022.
- Rectified linear units improve restricted boltzmann machines. In Int. Conf. Mach. Learn., pages 807–814, 2010.
- Neounet: Towards accurate colon polyp segmentation and neoplasm detection. In Adv. Vis. Comput. – Int. Symp., pages 15–28. Springer, 2021.
- Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.
- Medical image segmentation via cascaded attention decoding. In IEEE/CVF Winter Conf. Appl. Comput. Vis., pages 6222–6231, 2023a.
- Multi-scale hierarchical vision transformer with cascaded attention decoding for medical image segmentation. In Med. Imaging Deep Learn., 2023b.
- U-net: Convolutional networks for biomedical image segmentation. In Int. Conf. Med. Image Comput. Comput. Assist. Interv., pages 234–241. Springer, 2015.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4510–4520, 2018.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Going deeper with convolutions. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1–9, 2015.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In Int. Conf. Mach. Learn., pages 6105–6114. PMLR, 2019.
- Maxvit: Multi-axis vision transformer. In Eur. Conf. Comput. Vis., pages 459–479. Springer, 2022.
- Jeya Maria Jose Valanarasu and Vishal M Patel. Unext: Mlp-based rapid medical image segmentation network. In Int. Conf. Med. Image Comput. Comput. Assist. Interv., pages 23–33. Springer, 2022.
- A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthc. Eng., 2017, 2017.
- Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In AAAI, pages 2441–2449, 2022a.
- Mixed transformer u-net for medical image segmentation. In ICASSP, pages 2390–2394. IEEE, 2022b.
- Stepwise feature fusion: Local guides global. arXiv preprint arXiv:2203.03635, 2022c.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Int. Conf. Comput. Vis., pages 568–578, 2021.
- Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media, 8(3):415–424, 2022d.
- Cbam: Convolutional block attention module. In Eur. Conf. Comput. Vis., pages 3–19, 2018.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inform. Process. Syst., 34:12077–12090, 2021.
- Metaformer is actually what you need for vision. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10819–10829, 2022.
- Shufflenet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6848–6856, 2018.
- Transfuse: Fusing transformers and cnns for medical image segmentation. In Int. Conf. Med. Image Comput. Comput. Assist. Interv., pages 14–24. Springer, 2021.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep Learn. Med. Image Anal. Multimodal Learn. Clin. Decis. Support, pages 3–11. Springer, 2018.