Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection (2404.15451v1)

Published 23 Apr 2024 in cs.CV

Abstract: Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection. However, the currently existing models generally focus on the Encoder-side Transformer to extract features, from which decoder improvement can bring further potential with well-designed architecture. We propose CFPFormer, a novel decoder block that integrates feature pyramids and transformers. Specifically, by leveraging patch embedding, cross-layer feature concatenation, and Gaussian attention mechanisms, CFPFormer enhances feature extraction capabilities while promoting generalization across diverse tasks. Benefiting from Transformer structure and U-shaped Connections, our introduced model gains the ability to capture long-range dependencies and effectively up-sample feature maps. Our model achieves superior performance in detecting small objects compared to existing methods. We evaluate CFPFormer on medical image segmentation datasets and object detection benchmarks (VOC 2007, VOC2012, MS-COCO), demonstrating its effectiveness and versatility. On the ACDC Post-2017-MICCAI-Challenge online test set, our model reaches exceptionally impressive accuracy, and performed well compared with the original decoder setting in Synapse multi-organ segmentation dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. End-to-End Object Detection with Transformers, May 2020. arXiv:2005.12872 [cs] version: 3.
  2. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation, February 2021. arXiv:2102.04306 [cs].
  3. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  4. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. arXiv:2010.11929 [cs].
  5. Centernet: Keypoint triplets for object detection, 2019.
  6. CenterNet: Keypoint Triplets for Object Detection, April 2019. arXiv:1904.08189 [cs] version: 3.
  7. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
  8. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
  9. Deep Residual Learning for Image Recognition, December 2015. arXiv:1512.03385 [cs].
  10. Sage Bionetworks [email protected]. Synapse | Sage Bionetworks.
  11. Adam: A method for stochastic optimization, 2017.
  12. Microsoft COCO: Common Objects in Context, February 2015. arXiv:1405.0312 [cs].
  13. Swin transformer: Hierarchical vision transformer using shifted windows, 2021.
  14. Decoupled weight decay regularization, 2019.
  15. Attention u-net: Learning where to look for the pancreas, 2018.
  16. U-Net: Convolutional Networks for Biomedical Image Segmentation, May 2015. arXiv:1505.04597 [cs].
  17. The fully convolutional transformer for medical image segmentation, 2023.
  18. Attention Is All You Need, August 2023. arXiv:1706.03762 [cs].
  19. Uctransnet: Rethinking the skip connections in u-net from a channel-wise perspective with transformer, 2022.
  20. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation, March 2021. arXiv:2103.03024 [cs].
  21. Dilated Residual Networks, May 2017. arXiv:1705.09914 [cs].
  22. Unet++: A nested u-net architecture for medical image segmentation, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com