Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning (2410.18879v2)
Abstract: This report outlines Team Seq2Cure's deep learning approach for the Capsule Vision 2024 Challenge, leveraging an ensemble of convolutional neural networks (CNNs) and transformer-based architectures for multi-class abnormality classification in video capsule endoscopy frames. The dataset comprised over 50,000 frames from three public sources and one private dataset, labeled across 10 abnormality classes. To overcome the limitations of traditional CNNs in capturing global context, we integrated CNN and transformer models within a multi-model ensemble. Our approach achieved a balanced accuracy of 86.34 percent and a mean AUC-ROC score of 0.9908 on the validation set, earning our submission 5th place in the challenge. Code is available at http://github.com/arnavs04/capsule-vision-2024 .
- Capsule vision 2024 challenge: Multi-class abnormality classification for video capsule endoscopy. arXiv preprint arXiv:2408.04940, 2024a.
- Real-time small bowel disease detection and classification using capsule endoscopy with deep learning: A multi-center study. Gastrointestinal Endoscopy, 91(6):AB301, 2020. doi: 10.1016/j.gie.2020.03.1918.
- Deep learning for small bowel capsule endoscopy: A systematic review and meta-analysis. Gastrointestinal Endoscopy, 90(4):668–679, 2019. doi: 10.1016/j.gie.2019.06.018.
- Artificial intelligence system for detection and classification of intestinal ulcers in capsule endoscopy. Digestive Diseases and Sciences, 66(8):2714–2721, 2021. doi: 10.1007/s10620-020-06634-3.
- Training and validation dataset of capsule vision 2024 challenge. Figshare, 7 2024b. doi: 10.6084/m9.figshare.26403469.v1. URL https://figshare.com/articles/dataset/Training_and_Validation_Dataset_of_Capsule_Vision_2024_Challenge/26403469.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 6105–6114. PMLR, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1314–1324, 2019.
- Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10415–10424, 2020.
- Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1492–1500, 2017.
- Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016.
- Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2820–2828, 2019.
- Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7132–7141, 2018.
- Convnext: Revisiting convolutions for vision. arXiv preprint arXiv:2201.03545, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
- Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877, 2021.
- Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.09785, 2021.
- Twins: Revisiting the design of spatial attention in vision transformers. arXiv preprint arXiv:2104.13840, 2021.
- Efficientformer: Vision transformers in the real world. arXiv preprint arXiv:2106.13319, 2021.
- Lutz Prechelt. Automatic early stopping using cross-validation: quantifying the criteria. In Neural Networks: Tricks of the trade, pages 55–69. Springer, 1998.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2019.
- What is the effect of importance weighting in deep learning? In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 872–881. PMLR, 2019.
- Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017.