MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with its Application for Colonic Polyp Image Segmentation (2404.00726v1)
Abstract: Biomedical image segmentation is a very important part in disease diagnosis. The term "colonic polyps" refers to polypoid lesions that occur on the surface of the colonic mucosa within the intestinal lumen. In clinical practice, early detection of polyps is conducted through colonoscopy examinations and biomedical image processing. Therefore, the accurate polyp image segmentation is of great significance in colonoscopy examinations. Convolutional Neural Network (CNN) is a common automatic segmentation method, but its main disadvantage is the long training time. Transformer utilizes a self-attention mechanism, which essentially assigns different importance weights to each piece of information, thus achieving high computational efficiency during segmentation. However, a potential drawback is the risk of information loss. In the study reported in this paper, based on the well-known hybridization principle, we proposed a method to combine CNN and Transformer to retain the strengths of both, and we applied this method to build a system called MugenNet for colonic polyp image segmentation. We conducted a comprehensive experiment to compare MugenNet with other CNN models on five publicly available datasets. The ablation experiment on MugentNet was conducted as well. The experimental results show that MugenNet achieves significantly higher processing speed and accuracy compared with CNN alone. The generalized implication with our work is a method to optimally combine two complimentary methods of machine learning.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39, 2481–2495.
- On a simple and efficient approach to probability distribution function aggregation. IEEE Transactions on Systems, Man, and Cybernetics: Systems 47, 2444–2453.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 .
- Parallel residual bi-fusion feature pyramid network for accurate single-shot object detection. IEEE Transactions on Image Processing 30, 9099–9111.
- Structure-measure: A new way to evaluate foreground maps. International Journal of Computer Vision 129, 2622–2638.
- Modeling and in vitro experimental validation for kinetics of the colonoscope in colonoscopy. Annals of biomedical engineering 41, 1084–1093.
- Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee. pp. 248–255.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .
- Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 .
- Pranet: Parallel reverse attention network for polyp segmentation, in: International conference on medical image computing and computer-assisted intervention, Springer. pp. 263–273.
- Selective feature aggregation network with area-boundary constraints for polyp segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, Springer. pp. 302–310.
- High-grade dysplasia and invasive carcinoma in colorectal adenomas: a multivariate analysis of the impact of adenoma and patient characteristics. European journal of gastroenterology & hepatology 14, 183–188.
- Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- Hardnet-mseg: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps. arXiv preprint arXiv:2101.07172 .
- Kvasir-seg: A segmented polyp dataset, in: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, Springer. pp. 451–462.
- Resunet++: An advanced architecture for medical image segmentation, in: 2019 IEEE international symposium on multimedia (ISM), IEEE. pp. 225–2255.
- How to evaluate foreground maps?, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 248–255.
- Enhanced u-net: A feature enhancement network for polyp segmentation, in: 2021 18th Conference on Robots and Vision (CRV), IEEE. pp. 181–188.
- A new approach to polyp detection by pre-processing of images and enhanced faster r-cnn. IEEE Sensors Journal 21, 11374–11381.
- Basnet: Boundary-aware salient object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7479–7489.
- U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer. pp. 234–241.
- Attention gated networks: Learning to leverage salient regions in medical images. Medical image analysis 53, 197–207.
- Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. International journal of computer assisted radiology and surgery 9, 283–293.
- Automated polyp detection in colonoscopy videos using shape and context information. IEEE transactions on medical imaging 35, 630–644.
- Training data-efficient image transformers & distillation through attention, in: International conference on machine learning, PMLR. pp. 10347–10357.
- Attention is all you need. Advances in neural information processing systems 30.
- A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of healthcare engineering 2017.
- Shallow attention network for polyp segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, Springer. pp. 699–708.
- F33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: fusion, feedback and focus for salient object detection, in: Proceedings of the AAAI conference on artificial intelligence, pp. 12321–12328.
- Cbam: Convolutional block attention module, in: Proceedings of the European conference on computer vision (ECCV), pp. 3–19.
- Microfluidic point-of-care (poc) devices in early diagnosis: A review of opportunities and challenges. Sensors 22, 1620.
- Duplex contextual relation network for polyp segmentation, in: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), IEEE. pp. 1–5.
- Adaptive context selection for polyp segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI 23, Springer. pp. 253–262.
- A novel hybridization design principle for intelligent mechatronics systems, in: The Abstracts of the international conference on advanced mechatronics: Toward evolutionary fusion of IT and mechatronics: ICAM 2010.5, The Japan Society of Mechanical Engineers. pp. 67–74.
- On definition of deep learning, in: 2018 World Automation Congress (WAC), pp. 1–5. doi:10.23919/WAC.2018.8430387.
- Transfuse: Fusing transformers and cnns for medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, Springer. pp. 14–24.
- Unet++: A nested u-net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, Springer. pp. 3–11.