CASCRNet: An Atrous Spatial Pyramid Pooling and Shared Channel Residual based Network for Capsule Endoscopy (2410.17863v2)
Abstract: This manuscript summarizes work on the Capsule Vision Challenge 2024 by MISAHUB. To address the multi-class disease classification task, which is challenging due to the complexity and imbalance in the Capsule Vision challenge dataset, this paper proposes CASCRNet (Capsule endoscopy-Aspp-SCR-Network), a parameter-efficient and novel model that uses Shared Channel Residual (SCR) blocks and Atrous Spatial Pyramid Pooling (ASPP) blocks. Further, the performance of the proposed model is compared with other well-known approaches. The experimental results yield that proposed model provides better disease classification results. The proposed model was successful in classifying diseases with an F1 Score of 78.5% and a Mean AUC of 98.3%, which is promising given its compact architecture.
- Deep learning. Nature, 521(7553):436–444, May 2015. ISSN 1476-4687. doi: 10.1038/nature14539. URL https://doi.org/10.1038/nature14539.
- Biomedical image classification in a big data architecture using machine learning algorithms. Journal of healthcare engineering, 2021:9998819, 2021. ISSN 2040-2295. doi: 10.1155/2021/9998819. URL https://europepmc.org/articles/PMC8191587.
- Capsule vision 2024 challenge: Multi-class abnormality classification for video capsule endoscopy. arXiv preprint arXiv:2408.04940, 2024a.
- Training and Validation Dataset of Capsule Vision 2024 Challenge. Fishare, 7 2024b. doi: 10.6084/m9.figshare.26403469.v1. URL https://figshare.com/articles/dataset/Training_and_Validation_Dataset_of_Capsule_Vision_2024_Challenge/26403469.
- Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. URL http://arxiv.org/abs/1512.03385.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Support vector machines. IEEE Intelligent Systems and their Applications, 13(4):18–28, 1998. doi: 10.1109/5254.708428.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2018. doi: 10.1109/TPAMI.2017.2699184.
- A novel dataset and efficient deep learning framework for automated grading of renal cell carcinoma from kidney histopathology images. Scientific Reports, 13(1):5728, Apr 2023. ISSN 2045-2322. doi: 10.1038/s41598-023-31275-7. URL https://doi.org/10.1038/s41598-023-31275-7.
- Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017. doi: 10.1109/CVPR.2017.243.
- Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2999–3007, 2017. doi: 10.1109/ICCV.2017.324.
- The effect of leaky ReLUs on the training and generalization of overparameterized networks. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proceedings of Machine Learning Research, pages 4393–4401. PMLR, 02–04 May 2024. URL https://proceedings.mlr.press/v238/guo24c.html.
- Diederik P Kingma. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- An image is worth 16x16 words: Transformers for image recognition at scale. 2021.
- Vision-language models for vision tasks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8):5625–5644, 2024. doi: 10.1109/TPAMI.2024.3369699.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.