Redefining cystoscopy with ai: bladder cancer diagnosis using an efficient hybrid cnn-transformer model (2403.03879v1)
Abstract: Bladder cancer ranks within the top 10 most diagnosed cancers worldwide and is among the most expensive cancers to treat due to the high recurrence rates which require lifetime follow-ups. The primary tool for diagnosis is cystoscopy, which heavily relies on doctors' expertise and interpretation. Therefore, annually, numerous cases are either undiagnosed or misdiagnosed and treated as urinary infections. To address this, we suggest a deep learning approach for bladder cancer detection and segmentation which combines CNNs with a lightweight positional-encoding-free transformer and dual attention gates that fuse self and spatial attention for feature enhancement. The architecture suggested in this paper is efficient making it suitable for medical scenarios that require real time inference. Experiments have proven that this model addresses the critical need for a balance between computational efficiency and diagnostic accuracy in cystoscopic imaging as despite its small size it rivals large models in performance.
- World Health Organization, “Bladder cancer,” https://www.iarc.who.int/cancer-type/bladder-cancer/, Accessed on: [12/22/2023].
- C. A. Chai et al., “Comparing cxbladder to urine cytology as adjunct to cystoscopy in surveillance of non-muscle invasive bladder cancer—a pilot study,” Frontiers in surgery, vol. 8, pp. 659292, 2021.
- I. Kausch et al., “Photodynamic diagnosis in non–muscle-invasive bladder cancer: a systematic review and cumulative analysis of prospective studies,” European urology, vol. 57, no. 4, pp. 595–606, 2010.
- A. Ikeda et al., “Support system of cystoscopic diagnosis for bladder cancer based on artificial intelligence,” Journal of endourology, vol. 34, no. 3, pp. 352–358, 2020.
- I. Lorencin et al., “Using multi-layer perceptron with laplacian edge detector for bladder cancer diagnosis,” Artificial Intelligence in Medicine, vol. 102, pp. 101746, 2020.
- “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
- J. Chen et al., “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021.
- Hatamizadeh.A et al., “Unetr: Transformers for 3d medical image segmentation,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574–584.
- A. Ikeda et al., “Cystoscopic imaging for bladder cancer detection based on stepwise organic transfer learning with a pretrained convolutional neural network,” Journal of Endourology, vol. 35, no. 7, pp. 1030–1035, 2021.
- N. Ali et al., “Deep learning-based classification of blue light cystoscopy imaging during transurethral resection of bladder tumors,” Scientific reports, vol. 11, no. 1, pp. 11629, 2021.
- J. F. Lazo et al., “Semi-supervised bladder tissue classification in multi-domain endoscopic images,” IEEE Transactions on Biomedical Engineering, 2023.
- S. Wu et al., “An artificial intelligence system for the detection of bladder cancer via cystoscopy: a multicenter diagnostic study,” JNCI: Journal of the National Cancer Institute, vol. 114, no. 2, pp. 220–227, 2022.
- J. W. Yoo et al., “Deep learning diagnostics for bladder tumor identification and grade prediction using rgb method,” Scientific Reports, vol. 12, no. 1, pp. 17699, 2022.
- J. Mutaguchi et al., “Artificial intelligence for segmentation of bladder tumor cystoscopic images performed by u-net with dilated convolution,” Journal of Endourology, vol. 36, no. 6, pp. 827–834, 2022.
- Z. Qi et al., “Attention mechanism based image segmentation and its applications in intelligent diagnosis for bladder cancer,” Available at SSRN 4137336, 2022.
- N. Misgana. et al., “3d-reconstruction and semantic segmentation of cystoscopic images,” in Medical Imaging and Computer-Aided Diagnosis: Proceeding of 2020 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2020). Springer, 2020, pp. 46–55.
- L.-C. Chen et al., “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
- Zhao. H. et al., “Pyramid scene parsing network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6230–6239.
- ”O. Oktay et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
- L. Hanchao et al., “Dfanet: Deep feature aggregation for real-time semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9522–9531.
- Yu. C. et al., “Bisenet: Bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 325–341.
- A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- X. Zhu et al., “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
- Liu. Z et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
- “Segformer: Simple and efficient design for semantic segmentation with transformers,” vol. 34, pp. 12077–12090, 2021.
- B. Antoni et al., “A non-local algorithm for image denoising,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). IEEE, 2005, vol. 2, pp. 60–65.
- Wang. X. et al., “Non-local neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7794–7803.
- C. François, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
- A. Haviv et al., “Transformer language models without positional encodings still learn positional information,” arXiv preprint arXiv:2203.16634, 2022.
- R. Olaf et al., “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- P. Shengyuan et al., “Accuracy improvement of unet based on dilated convolution,” in Journal of Physics: Conference Series. IOP Publishing, 2019, vol. 1345, p. 052066.