CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection (2404.15451v1)
Abstract: Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection. However, the currently existing models generally focus on the Encoder-side Transformer to extract features, from which decoder improvement can bring further potential with well-designed architecture. We propose CFPFormer, a novel decoder block that integrates feature pyramids and transformers. Specifically, by leveraging patch embedding, cross-layer feature concatenation, and Gaussian attention mechanisms, CFPFormer enhances feature extraction capabilities while promoting generalization across diverse tasks. Benefiting from Transformer structure and U-shaped Connections, our introduced model gains the ability to capture long-range dependencies and effectively up-sample feature maps. Our model achieves superior performance in detecting small objects compared to existing methods. We evaluate CFPFormer on medical image segmentation datasets and object detection benchmarks (VOC 2007, VOC2012, MS-COCO), demonstrating its effectiveness and versatility. On the ACDC Post-2017-MICCAI-Challenge online test set, our model reaches exceptionally impressive accuracy, and performed well compared with the original decoder setting in Synapse multi-organ segmentation dataset.
- End-to-End Object Detection with Transformers, May 2020. arXiv:2005.12872 [cs] version: 3.
- TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation, February 2021. arXiv:2102.04306 [cs].
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. arXiv:2010.11929 [cs].
- Centernet: Keypoint triplets for object detection, 2019.
- CenterNet: Keypoint Triplets for Object Detection, April 2019. arXiv:1904.08189 [cs] version: 3.
- The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
- The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
- Deep Residual Learning for Image Recognition, December 2015. arXiv:1512.03385 [cs].
- Sage Bionetworks [email protected]. Synapse | Sage Bionetworks.
- Adam: A method for stochastic optimization, 2017.
- Microsoft COCO: Common Objects in Context, February 2015. arXiv:1405.0312 [cs].
- Swin transformer: Hierarchical vision transformer using shifted windows, 2021.
- Decoupled weight decay regularization, 2019.
- Attention u-net: Learning where to look for the pancreas, 2018.
- U-Net: Convolutional Networks for Biomedical Image Segmentation, May 2015. arXiv:1505.04597 [cs].
- The fully convolutional transformer for medical image segmentation, 2023.
- Attention Is All You Need, August 2023. arXiv:1706.03762 [cs].
- Uctransnet: Rethinking the skip connections in u-net from a channel-wise perspective with transformer, 2022.
- CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation, March 2021. arXiv:2103.03024 [cs].
- Dilated Residual Networks, May 2017. arXiv:1705.09914 [cs].
- Unet++: A nested u-net architecture for medical image segmentation, 2018.