UGCANet: A Unified Global Context-Aware Transformer-based Network with Feature Alignment for Endoscopic Image Analysis (2307.06260v1)
Abstract: Gastrointestinal endoscopy is a medical procedure that utilizes a flexible tube equipped with a camera and other instruments to examine the digestive tract. This minimally invasive technique allows for diagnosing and managing various gastrointestinal conditions, including inflammatory bowel disease, gastrointestinal bleeding, and colon cancer. The early detection and identification of lesions in the upper gastrointestinal tract and the identification of malignant polyps that may pose a risk of cancer development are critical components of gastrointestinal endoscopy's diagnostic and therapeutic applications. Therefore, enhancing the detection rates of gastrointestinal disorders can significantly improve a patient's prognosis by increasing the likelihood of timely medical intervention, which may prolong the patient's lifespan and improve overall health outcomes. This paper presents a novel Transformer-based deep neural network designed to perform multiple tasks simultaneously, thereby enabling accurate identification of both upper gastrointestinal tract lesions and colon polyps. Our approach proposes a unique global context-aware module and leverages the powerful MiT backbone, along with a feature alignment block, to enhance the network's representation capability. This novel design leads to a significant improvement in performance across various endoscopic diagnosis tasks. Extensive experiments demonstrate the superior performance of our method compared to other state-of-the-art approaches.
- Cancer statistics, 2018. CA: a cancer journal for clinicians, 68(1):7–30, 2018.
- Automatic classification of colorectal polyps in ct colonography images. Scientific reports, 7(1):5556, 2017.
- A computer-aided diagnosis system for colonic polyp classification based on deep learning. Journal of medical systems, 41(5):179–187, 2017.
- Deep learning-based automatic polyp detection in colonoscopy videos. Journal of Medical Systems, 42(1):39–48, 2018.
- A deep learning approach for polyp segmentation in colonoscopy images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 689–696. Springer, 2017.
- Regional convolutional networks for accurate object detection and semantic segmentation. arXiv preprint arXiv:1608.06993, 2017.
- Large-scale annotated medical image databases for computer-assisted image analysis: a review. IEEE Transactions on Medical Imaging, 36(7):1515–1525, 2017.
- Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017.
- Fapn: Feature-aligned pyramid network for dense image prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 864–873, 2021.
- Imagenet: A large-scale hierarchical image database. Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009.
- Learning multiple layers of features from tiny images. Computer Science, 2009.
- Densenet: Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1:4700–4708, 2017.
- Resnet: Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1:770–778, 2015.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pages 6105–6114. PMLR, 2019.
- Attention is all you need. Advances in Neural Information Processing Systems, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 568–578, 2021.
- Focal attention for long-range interactions in vision transformers. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 30008–30022. Curran Associates, Inc., 2021.
- Going deeper with image transformers. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 32–42. IEEE, 2021.
- Levit: A Vision Transformer in ConvNet’s Clothing for Faster Inference. In Proceedings of the IEEE International Conference on Computer Vision, 2021.
- Vision transformer with deformable attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4794–4803, June 2022.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015.
- Unet++: A nested u-net architecture for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018.
- Doubleu-net: A multi-level u-net framework for medical image segmentation. In International Joint Conference on Artificial Intelligence. AAAI Press, 2019.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- Transfuse: Fusing transformers and cnns for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 14–24. Springer, 2021.
- Endounet: A unified model for anatomical site classification, lesion categorization and segmentation for upper gastrointestinal endoscopy. In 2022 14th International Conference on Knowledge and Systems Engineering (KSE), pages 1–6. IEEE, 2022.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2017.
- Pranet: Parallel reverse attention network for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 263–273. Springer, 2020.
- Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 385–400, 2018.
- Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7151–7160, 2018.
- Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1857–1866, 2018.
- Ccbanet: Cascading context and balancing attention for polyp segmentation. In Medical Image Computing and Computer Assisted Intervention - MICCAI 2021, volume 12901 of Lecture Notes in Computer Science. Springer, Cham, 2021.
- Automatic anatomical classification of esophagogastroduodenoscopy images using deep convolutional neural networks. Scientific reports, 8(1):1–8, 2018.
- Helicobacter pylori classification based on deep neural network. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–5. IEEE, 2019.
- Randomised controlled trial of wisense, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut, 68(12):2161–2169, 2019.
- Deep learning-based anatomical site classification for upper gastrointestinal endoscopy. International Journal of Computer Assisted Radiology and Surgery, 15(7):1085–1094, 2020.
- Compact generalized non-local network. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7132–7141, 2018.
- Segformer: Simple and efficient design for semantic segmentation with transformers. In Advances in Neural Information Processing Systems, volume 34, pages 12077–12090, 2021.
- Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
- Kvasir-seg: A segmented polyp dataset. In International Conference on Multimedia Modeling, pages 451–462. Springer, 2020.
- Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized Medical Imaging and Graphics, 43:99–111, 2015.
- Automated polyp detection in colonoscopy videos using shape and context information. IEEE transactions on medical imaging, 35(2):630–644, 2015.
- A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of healthcare engineering, 2017, 2017.
- Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. International journal of computer assisted radiology and surgery, 9(2):283–293, 2014.
- Hardnet-mseg: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps. arXiv preprint arXiv:2101.07172, 2021.
- Caranet: Context axial reverse attention network for segmentation of small medical objects. arXiv preprint arXiv:2108.07368, 2021.
- Colonformer: An efficient transformer based method for colon polyp segmentation. IEEE Access, 10:80575–80586, 2022.
- Resunet++: An advanced architecture for medical image segmentation. In 2019 IEEE International Symposium on Multimedia (ISM), pages 225–2255. IEEE, 2019.
- Ddanet: Dual decoder attention network for automatic polyp segmentation. In ICPR International Workshop and Challenges, 2021.
- Real-time polyp detection, localization and segmentation in colonoscopy using deep learning. Ieee Access, 9:40496–40510, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.